
|
|
|
|
Monday, November 23, 2009
|
ISSUE 34
|
|
|
Patinformatics: identifying haystacks from space, part 2.
Anthony Trippe, Sr. Staff Investigator, Intellectual Property, Vertex Pharmaceuticals Inc.
In part 1 we looked at the general idea of patinformatics and discussed various elements of the intelligence cycle with regards to patent analysis. In part 2 we’ll focus on data and text mining as they apply to patents and present some thoughts on a linear workflow for patinformatics. Patinformatic principles
When dealing with the more detailed analysis involved with patinformatics, we can divide the different type of analyses into two broad categories:
Data mining involves the extraction of fielded data and its analysis. Normally this means analyzing the bibliographic information contained within patents, such as examining the relationship between patent assignees and International Patent Classification (IPC) codes for a specific area of technology. Mining or mapping this information can help identify the major players in a technology area and what type of work they generally focus on. When using Derwent data, a similar analysis could replace IPC codes with Derwent manual codes.
Text mining or mapping typically involves clustering or categorizing documents based on the major concepts they contain. The data source is unstructured text, it is not fielded, and the only structure within the material comes from the author who wrote the document and built relationships between different concepts and ideas.
For example, you could collect patents from a specific patent assignee and analyze the text of those documents. In a cluster map the software would extract the major concepts found and create clusters of documents, concept by concept. The software would then visualize these clusters in some fashion, creating a map. By looking at the clusters created (and subsequently the documents themselves, but now with an organized method), you can quickly get a general idea of the concepts that this organization is working on and how they interrelate.
Law of linear patent analysis
Success in either data or text mining often depends on the analyst’s familiarity with the data source being analyzed and the methods used to prepare and analyze the data. A full discussion of the sub-methods and potential pratfalls of different mining exercises is, unfortunately, beyond the scope of this article.
With this general background, I would like to propose a law for the linear analysis of patent information. The components of Trippe’s Law of Linear Patent Analysis are:
- Create a tool kit of patinformatics tools
- Understand the business need and the need behind the need
- The need drives the question
- The question drives the data
- The data drives the tool
Create a group of patinformatics tools
As mentioned earlier, patinformatics can include patent mapping, citation analysis, co-occurrency analysis, themematic mapping, temporal visualization and various other techniques beyond the scope of this article. Clearly no one tool will accomplish all of these types of analyses.
To succeed in the overall field of patinformatics, the practitioner needs the maximum flexibility to pursue questions based on business needs. The patinformatics practitioner should thus invest in a collection of tools and resources. This approach can get expensive quickly. So one must understand the types of questions that are likely to be asked and arrange for tools that will satisfy the corresponding analysis needs.
Understand the business need and the need behind the need
When it comes to starting an ad-hoc project, the analyst typically starts by understanding as much as possible about the analysis needed. As researchers well know, it is often difficult to get a client to express their true need when making an information request.
Frequently a client will say, “We need to know everything about Company Y.” As strange as this might sound, the response to that request ought to be, “No, you don’t and if you did it would take a forklift to cart in all of the data. It would take six months for you to get through all of it. And, in the end, you might be no closer to the intelligence you’re seeking than when you first started.”
In patinformatics it is absolutely essential that the business need for intelligence is clearly understood before anything else begins. It is also critical to know all of the “hidden” needs behind the original question:
- how the data will be used
- who will use it.
- what type of story will represent their intelligence work best so that the person receiving the analysis will understand it and will stand the greatest chance of putting it into business practice.
While these principles are important to all researchers, they are absolutely essential to analysts. Improper assumptions about the scope and goal of the project can lead the analysis astray, producing inappropriate or (in extreme cases) misleading information in the context of the business decision at hand. The analyst assumes the role of a trusted advisor in these cases and they need to be as close to the decision making process as possible, so they can integrate a thorough understanding of the business need into their work.
The need drives the question
In a true linear sense, once the need is understood then the analyst and client can work together to formulate questions that lead to intelligence impacting the underlying business decision. For example, a business may need to gain additional insight on how the research and development progress works for Company Y. In such a case, understanding what research and development projects Company Y conducts in their 10 different research facilities in the United States becomes an interesting question.
Additional questions might include:
- where do the inventors on their U.S. patents live?
- what patenting topics are closer to basic science?
- which apply more to process technologies?
By asking a number of specific questions and compiling intelligence on each of them, an analyst can begin to paint a mosaic of the dynamics associated with the business need. Examining all the dynamics will lead the analyst to begin drawing conclusions on what the different options for the decision maker are. With this knowledge the analyst can begin to gather and analyze data that will identify the strengths and weaknesses of each option.
The question drives the data
Once analysts decide on the questions that need to be answered, they can begin collecting relevant data, just as a scientist investigates a scientific question. It involves the:
- formation of a hypothesis.
- experimentation to determine the validity of the hypothesis.
- verification of the validity of the experimentation and of the conclusions drawn based on experimental results.
In the realm of patinformatics, the gathering of data is directly analogous to the idea of preparing an experiment to support or dispute a hypothesis. Selection of the appropriate tool is also important to the process.
The data drives the tool
Some questions require very specific types of data. In these circumstances, the tool selected must not only support the analysis necessary to provide the insight, but must also work with the data source most appropriate for answering the questions.
Continuing with the example initiated above, if the question posed asks where do the inventors on Company Y’s U.S. patents live, then the data will have to include the inventor’s address information which appears on the front page of all U.S. patents. Perhaps more importantly, this data must be available in an electronic format for importing into the appropriate analysis tool. If a tool cannot handle the data format for the file that includes inventor address data, then it cannot answer the question.
Once again, it is important to follow these steps in a linear fashion since deviation from this path leads to a situation where the questions asked are biased by the tools available to the analyst. If an organization focuses on a single analysis tool, then all subsequent analysis may be overshadowed by the strengths and weaknesses of that particular tool.
Actionable intelligence
Another principle that should be applied while conducting a patinformatics analysis is the idea of actionable intelligence, a principle well known to CI practitioners. Analysis work should not be done for its own sake. If a report will simply collect dust on the decision maker's desk, then it was not worth doing in the first place.
Analysts must not get trapped in the novelty or cleverness of the work they do. They must stay focused on creating analysis that allows the decision maker to definitively see the various options available to them and deliver intelligence on those options good enough to support a clear and relatively unambiguous decision on a course of action. When intelligence is applied to a business decision, then it becomes actionable.
In part 3 of this article a brief description will be given of the tools that are currently available for conducting patinformatics experiments.
Background:
Anthony Trippe currently holds the position of Senior Staff Investigator, Intellectual Property at Vertex Pharmaceuticals. He is responsible for designing and implementing patent intelligence and mapping activities at Vertex and for assisting with the leveraging of IP within and external to the company. Previously, Mr. Trippe was Practice Director, Intellectual Property Consulting for Aurigin Systems Inc. and was Technical Intelligence Manager for the Procter and Gamble Co.
Copyright Society of Competitive Intelligence Professionals
scip.online, issue 34, June 20, 2003
[PRINTER FRIENDLY VERSION]
|
|
|
There are no letters available.
|
|
[POST]
|
|
| |