Patinformatics part 3: a review of the tools.
Anthony Trippe, Sr. Scientist, Chemical Abstracts ServiceIn part 1 (issue 31) we looked at the general idea of patinformatics and discussed various elements of the intelligence cycle with regards to patent analysis. In part 2 (issue 34) we focused on data and text mining as they apply to patents and presented some thoughts on a linear workflow for patinformatics. This series of articles continues with a discussion of various tools that are currently available for conducting patinformatics work.
This article reviews Aurigin/MicroPatent, ClearForest, Search Technologies, and Invention Machine Corporation. The 4
th and final part covers OmniViz, Current-Patents, Patentratings.com, M-CAM DOORS, Wisdomain, and Delphion
Aurigin Systems Inc./MicroPatent LLCAfter existing as a private company with an established customer base within Fortune 100 companies, Aurigin System filed for Chapter 11 bankruptcy in February of 2002. The situation was resolved when MicroPatent acquired the company at auction in May of 2002. Since the acquisition, MicroPatent has continued to develop the online version of the product and in June released Aureka Online System (AOS) 9.0, which merges the original Aureka with content, and searching that was previously only available through MicroPatent.
AOS is a system for conducting Intellectual Property Asset Management (IPAM) and, as the name implies, this system allows you to organize and manage intellectual property (not just patents, but corporate documents as well). Besides allowing for smart IP management the system also contains tools for conducting a number of different types of patent analysis. While a very powerful and flexible platform, Aureka is a big-ticket item.
&
Aurigin has pre-loaded their platform with patent data taken from the four major patent authorities (US, EP, JP, and PCT) and includes a search engine for identifying relevant references. These references can be saved, creating sets for further analysis and sharing with colleagues. Another nice feature of the Aureka platform is the ability to annotate documents. One of the key strengths of the IPAM system is the ability for individuals within an organization to create sets of patents, analyze them, annotate them, generally create intelligence from them, and save all of this knowledge in a single place.
One of the analytical tools built into the Aureka system is the ThemeScape themematic, text-mining tool. ThemeScape employs a concept mapping method of creating technology landscapes. The program reads full-text documents, identifies themes that occur throughout the references, and employs clustering algorithms to organize documents by co-occurrence of the identified themes.
Another analytical tool within the Aureka platform is the citation tool. Licensed from InXight, this technology incorporates a hyperbolic tree viewer. The citation tree tool creates a hyperbolic tree of citation information from within the U.S. patents covered in Aureka. Select a single U.S. patent and it will become the root of the tree with subsequent citations to that document forming branches moving forward one generation to the next. Backward citations can also appear visualized in this tree format. One can label trees in a number of different ways, including by assignee, publication date, or inventor. Trees can also be colored based on date or assignee. Citation trees can support a rapid visual review of the citation history for a single U.S. patent.
The Aureka system also contains a reporting tool that supports statistical analysis of the patent data. Additional information on AOS can be found on the MicroPatent Web site:
http://www.micropatent.com.
ClearForest Inc.The products from ClearForest Inc. are amongst the most powerful text mining tools available. Most text mining tools begin by performing what is called term extraction, the process whereby the application selects relevant terms from within the text and extracts them for subsequent analysis. Term extraction works similar to the process used to create a full text, inverted index of a particular document. Once the terms have been extracted from the text, they can be analyzed in several ways.
Information extraction extends the term extraction operation: it not only selects terms, but subsequently categorizes the terms automatically into pre-defined categories or taxonomies. It works on unstructured text. There are no inventor fields or assignee fields with data specifically tagged and classified. Information extraction techniques can analyze unstructured text and automatically extract and categorize such information as people's names, their positions, their companies, or various other attributes.
Once the information extraction process is complete the ClearResearch module supports the analysis of the classified information. The tool allows for a number of different analyses. One of the most powerful involves the use of circle graphs to visualize the relationships between one collection of taxonomies and another. A taxonomy could cover all the companies named within a document collection.
Imagine a circle displaying technological terms on the left-hand side and company names on the right. Lines of varying thickness drawn from one side of the circle to the other would represent relationships between a company and the technological terms associated with it. Variations in thickness and color of the lines represent the intensity of the relationship based on the total number of documents that support it. By double clicking on a line, users see the documents.
Double clicking on an individual technology term or company name along the edge of the circle will cause a new window to open with the clicked upon object as the center and the subsequent terms displayed around it as spokes. For instance, clicking on a company name will open a new window with the company named at the center and spokes leading off to the technology terms associated with the company. Right clicking on one of the technology terms in this window will bring up a contextual menu that will allow an additional distribution on any of the taxonomies available to the analyst. In this fashion, one can distribute the company’s inventors by the corresponding technology terms associated with them.
The ClearForest suite contains many powerful text-mining features. Additional information can be found at the Web site, http://
www.clearforest.com.
Search TechnologiesSearch Technologies produces VantagePoint, a data-mining tool that, for the most part, deals with the statistical analysis of values within fielded data. If the field happens to contain written text, then the tool applies natural language processing algorithms to parse out topics. The first step in using VantagePoint involves importing and parsing data from online records. Using the import editor, fielded data from almost any source can be correctly parsed and imported into the system for analysis.
After creating a database with the fielded values, the system provides tools for conducting list clean up. Using fuzzy logic routines, the system can help the user identify values within the field that should probably be grouped together since they are synonymous with one another. Two of the most common uses for this feature are the company name and inventor name clean ups. As mentioned, good statistical analysis needs good clean data. Often a time-consuming and laborious process, the list clean up features in VantagePoint can make the process easier.
The major statistical paradigm used by VantagePoint is the co-occurrency matrix. One attribute is placed on the Y-axis while another goes on the X-axis. Numbers show up within the matrix indicating the number of documents that incorporate the corresponding values on the X and Y-axis. Clicking on a cell produces a list of the titles of the documents that support this relationship.
Synonymous values can be collected in a group and compared to another field within the matrix. For instance, organizations can be grouped by their general affiliation: industrial, governmental, or educational. The users can then compare the number of documents produced by each of the different organizational sectors within certain key technologies. Along with co-occurrency matrices, the system can also perform principal components decomposition and create factor maps for any of the fields. The system also provides pre-defined macros that allow the automatic selection and exporting of a matrix into Microsoft Excel for visualization using 3-D graphs, line graphs, and various other charts. See their Web site,
http://thevantagepoint.com for additional information.
Invention Machine CorporationInvention Machine Corporation produces a number of applications that assist in the computer aided invention process. With regards to patent analysis, however, their two most relevant products are Co-Brain and Knowledgist.
Both programs do basically the same thing; extract subject/action/object (SAO) functions from full-text data. The company has recently begun to refer to these functions as problem/solution paradigms. The idea behind this approach is that patents are designed to instruct readers on how to solve a practical problem.
Think of the subject and action as the solution and the object as the problem. For example, if the object were to have clean clothes, the solution to the problem would be washing with soap, the action and the subject. Once the software has extracted the subject/action/object functions from documents, it puts together the problems and solutions and groups similar problems together, so that users may compare different ways to solve a problem by viewing them next to one another.
The two programs differ in their scope and scale. Knowledgist is a desktop application that can be used on personal data sets, while Co-Brain is designed to work from a corporate server and act as a corporate knowledge portal.
Both systems come with a synonym tool that can greatly reduce the number of problem solution sets created and greatly increase the system's ability to understand when two different solutions solve the same problem. Both programs can often create large problem/solution functions that are difficult to navigate by scrolling up and down the list. To assist in identifying relevant functions, a search button allows the user to find problems or solutions quickly. Please see the Web site
http://www.invention-machine.com for additional details.
Background:Anthony Trippe currently holds the position of Senior Scientist at Chemical Abstracts Service. He wrote this collection of articles while he was Sr. Staff Investigator, Intellectual Property at Vertex Pharmaceuticals. He was responsible for designing and implementing patent intelligence and mapping activities at Vertex and for assisting with the leveraging of IP within and external to the company. Previously, Mr. Trippe was Practice Director, Intellectual Property Consulting for Aurigin Systems Inc. and was Technical Intelligence Manager for the Procter and Gamble Co.
copyright Society of Competitive Intelligence Professionals
scip.online. issue 37, August 8. 2003