
|
|
|
|
Friday, February 10, 2012
|
ISSUE 31
|
|
|
Patinformatics: identifying haystacks from space.
Anthony J. Trippe, Sr. Staff Investigator, Intellectual Property Vertex Pharmaceuticals Inc.
Generally, when individuals think about patent information they conjure up an image of a diligent searcher, poring over reams and reams of information, looking for the one reference out of hundreds, maybe thousands, that will satisfy their client. The idea of searching for a “needle in a haystack” comes readily to mind when referring to the activities in which these professionals commonly find themselves.
More recently however, information researchers, especially technical intelligence professionals, find themselves being asked to look at the bigger picture. Instead of trying to identify a single grain of sand on a vast beach, business decision makers are asking researchers to identify trends and provide general overviews.
This places information in context when compared to a much larger collection of materials. Instead of finding a needle in a haystack, today’s analysts are being asked to identify haystacks from space and then forecast whether the haystack is the beginning of a new field or the remainder from last year’s harvest.
The concept of patinformatics
The title of this article introduces the notion of patinformatics. This term is borrowed from the more common fields of bioinformatics or cheminformatics. By definition, bioinformatics is the science of analyzing large amounts of biological data using computational methods. For example, researchers use genomic data to discover relationships or trends between different genes or biological pathways where looking at smaller datasets could mean missing a connection.
In a similar fashion, patinformatics describes the science of analyzing patent information to discover relationships and trends, which would be difficult to see when working with patent documents on a one-on-one basis. The term encompasses all forms of analyzing patent information including:
- patent intelligence: the use of patent information to identify the technical capabilities of an organization and the use of that intelligence to develop strategic technical planning.
- patent mapping: sometimes described as white space mapping, using published patent data to create a graphical or physical representation of the relevant art pertaining to a particular subject area or novel invention.
- patent citation analysis: the study of patent citations for potentially determining a patent’s value or, perhaps more reliably, the identification of potential licensing partners or leads based on the citation of an organization’s patents by another company in the same or a completely different market space.
Patinformatics can also cover additional applications of patent information involving a subsequent analysis step. The key in each of these diverse areas is the analysis step.
Search, review/analysis, and presentation
One might imagine that the same rules apply to conducting patinformatics as apply to patent searching. This is not entirely the case. Just as in physics, where quantum mechanics outline the principles for understanding the microscopic world, and Newtonian principles apply to the macromolecular world of large bodies in motion, one can distinguish two different approaches to patent information.
Traditional patent searching deals with the micro level where very small changes become extremely important and details and precision are imperatives. Patinformatics, by comparison, deals with thousands of documents. Since small details will not be seen across such a vast landscape, it takes a more macroscopic view of the data, using different methods, and reaching different conclusions. This point can be illustrated by looking at how a patinformatics project is put together: performing a search, conducting the review and analysis of the information, and presenting the results.
Searching
In patinformatics patent analysts may put together a complicated search strategy and try to be as directed as possible in their searching, but generally they want to create a comprehensive data set as the basis for subsequent analytical steps.
Analysts will use large collections of key words and database-specific indexing, but they will also try to keep their strategies broad and not narrow results to a fine point. As long as the data discovered is more or less on target, having some irrelevant answers in the search set results may not bother them, since small inconsistencies will not be seen above the baseline. Statistically speaking, analysis requires the presence of enough data to discover trends and relationships; so patent analysts prefer an overabundance of data as opposed to a lack of it. Making the search too specific can bias the data. It is important to let the data speak for itself, and not have the analysis directed by search results that have been biased by the analyst’s preconceived notions. Building a data set free from bias and subjectivity is key.
Under these circumstances, search result data sets may grow to several thousand records. In the past patent analysts would ordinarily stay away from data sets this large, since working with so much information was difficult for end-users to grasp. Using computerized analytical tools, however, the task of working with large data sets has become much less complicated and should not deter an aggressive search strategy.
Review/analysis
A patent analyst uses review and analysis as separate steps with different objectives and methods. In the review steps, the analyst is building a data warehouse, examining the integrity of the data, and making certain that it is clean. The first step may involve a relevance review, which does not have to be terribly detailed but does eliminate results that are widely off topic. Once again, precision is not the issue here, so the review process goes fairly quickly.
After the analyst is more or less convinced that they have accrued data generally on topic, they begin the process of building the data warehouse. This typically involves importing the data into a software tool and checking to make sure that the process has gone smoothly and that the data is ready for the subsequent analysis phase.
The analyst will scan the data warehouse, occasionally taking samples of the data, and making certain the information has ended up in the proper fields and formatted correctly. Depending on the size of the data set, this process may take quite some time. A few hundred documents may go quickly, but when the data set expands to include several thousand documents, this step can become very time consuming.
After building the data warehouse, the review process is complete and the data analysis can begin. Specific details on performing patent analysis will be discussed in part two of this article. The process requires a clear understanding of the business objective and the desired use of the intelligence produced by the analysis. It is less a judgment call based on the analyst’s understanding of the subject matter, as an experiment with conclusions drawn based on the results.
When doing an analysis, people may find it difficult identifying trends or patterns within data, since they have a different perspective looking at the 100th record than they did when looking at the 4th. It is also difficult for the human brain to keep track of several variables while examining hundreds of documents. A computer, on the other hand, can objectively weigh a set of variables regardless of which document they came from and identify patterns within the data. Due to the large volume of potential data a patent analyst can draw from, computerized analytical tools can be used to produce valuable intelligence from patent data. The analyst will typically have a number of computational tools (which will be covered briefly in Part 3 of this article) available that are designed to identify patterns and trends within their data sets.
Presentation
In most cases, business decision makers are not interested in data. They want data to be compiled and analyzed into intelligence. They want different scenarios and their corresponding advantages and challenges laid out so they can draw rapid conclusions and act on them. The analyst’s results, therefore, are generally limited to a few slides outlining the business need, the hypothesis under investigation, the results of the analysis, and, finally, some opinions on the potential conclusions of following different courses of action.
With such a limited amount of time and attention available to deliver their message the analyst needs a tool to quickly draw their client to the proposed conclusions. Nothing makes a bigger impact that a powerful visualization to showcase gleaned intelligence. In part three of this article as the computerized tools that have proliferated for conducting patinformatics studies are covered, the visualization engines that have been developed for representing the results of the analytic techniques will also be discussed.
Background:
Anthony Trippe currently holds the position of Senior Staff Investigator, Intellectual Property at Vertex Pharmaceuticals. He is responsible for designing and implementing patent intelligence and mapping activities at Vertex and for assisting with the leveraging of IP within and external to the company. Previously, Mr. Trippe was Practice Director, Intellectual Property Consulting for Aurigin Systems Inc. and was Technical Intelligence Manager for the Procter and Gamble Co.
Copyright Society of Competitive Intelligence Professionals.
scip.online, issue 31, May 8, 2003.
[PRINTER FRIENDLY VERSION]
|
|
|
There are no letters available.
|
|
[POST]
|
|
| |