Regular articlePatent citation spectroscopy (PCS): Online retrieval of landmark patents based on an algorithmic approach
Introduction
Amongst the various components of a patent landscape, identifying seminal patents in an innovation area requires substantial investment from specialists(e.g., Schmidt, 2007). Hitherto, subject matter experts review a large corpus of patents and patent applications within their historical context to render a judgment of the most technologically important patents. This method is time-consuming, difficult to replicate, and predicated on the availability of subject matter experts (Cockburn, Kortum, & Stern, 2002) – and yet, there is a requirement for patent examiners and historians of science. Thus, there is a need for computer-assisted methods for uncovering insights about landmark patents in technology areas (Jensen & Murray, 2005; Konski & Spielthenner, 2009).
Clinical advances depend upon a sound understanding of biomedical research and development. The importance of maintaining situational awareness of biomedical R&D activities for businesses and policy-makers is best exemplified by the proliferation of patent landscapes produced by subject matter experts covering a wide range of topics (e.g., CRISPR: Egelie, Graff, Strand, & Johansen, 2016; Induced Pluripotent Stem Cells: Roberts et al., 2014; Bergman & Graff, 2007; Prenatal Testing: Agarwal, Sayres, Cho, Cook‐Deegan, & Chandrasekharan, 2013; Carbon Nanotubes: Harris & Bawa, 2007; Nanomedicine: Wagner, Dullaart, Bock, & Zweck, 2006; Gene Sequences: Jensen & Murray, 2005). Given the enormous growth in the number of annual patent applications filed (United States Patent & Trademark Office - Patent Technology Monitoring Team, 2016), particularly in the life and biomedical sciences (Moses et al., 2015; cf. Agarwal & Searls, 2009), there is increasing demand for patent landscapes across a panoply of technologies (e.g., Breitzman & Thomas, 2015; Jaffe & Trajtenberg, 2002).
In the final section of their comprehensive review of the literature about patent citations, Sharma and Tripathi (2017) conclude that citation analysis among patents is able to retrieve the patents and publications which play a vital role in the growth a technology. Because of the requirement of citing state-of-the-art literature and the possibility for the examiner to add further citation to an application (Alcácer, Gittelman, & Sampat, 2009), the citation field is controlled in patenting more rigorously than in the case of publishing scholarly literature where citation traditions vary with disciplinary backgrounds (de Solla Price, 1970). Patentometry accordingly has become a flourishing field during the last decades, in relation also to the possibility to retrieve non-patent literature references (Narin, Hamilton, & Olivastro, 1997) and thus to relate the different knowledge flows and sources. Sharma and Tripathi (2017, p. 40) also found 23 software patents filed at USPTO dealing with valuing patents economically through citation analysis.
The development towards integrating historiography with network analysis of citation patterns develops in parallel both among patents and scholarly publications (Leydesdorff, Bornmann, Comins, Marx, & Thor, 2016; Liu & Lu, 2012). Abbas, Zhang, and Khan (2014) provide a review of the computational strategies and software relevant for data mining patents. They note that visualization techniques play an important role because these can stimulate the use of patent information in business and other organizations. Tools should be developed which offer multiple suggestions for the reconstruction and for devising strategies (Rotolo, Rafols, Hopkins, & Leydesdorff, 2017).
In light of this growing need, we introduce an algorithmic approach and corresponding web-application for identifying landmark patents, a key component of patent landscapes, across user-specific biomedical areas. Our approach is data-oriented and historiographic, and hence based on descriptive statistics (Anderson, 2008). Different from case-study-based models of patent structures (e.g., Ma & Porter, 2015; Chang, Wu, & Leu, 2010), however, our method is generic: it can be used with any retrieval from USPTO based on keywords or more advanced search parameters (e.g., CPC subclasses; Leydesdorff, Alkemade, Heimeriks, & Hoekstra, 2015; in addition, for an application of PCS to photovoltaic patents, see Comins & Leydesdorff (2018). Unlike model-based approaches assuming, for example, evolutionary mechanisms (e.g., Breitzman & Thomas, 2015; Valverde, 2014), we access the data each round without theoretical assumptions other than retrieval and visualization optimizations that are based on the literature about Referenced Publication Year Spectroscopy (RPYS) developed for similar purposes using scientific literature (Marx & Bornmann, 2014; Thor, Marx, Leydesdorff, & Bornmann, 2016). Like RPYS, PCS assumes an accumulation of citations in the case of important discoveries and contributions (Kaplan, 1965). Patent Citation Spectroscopy (PCS) operates over the cited references of large sets of patents to determine the seminal prior works within a given field, as well as an openly available web-application for performing PCS.
In this contribution, we apply PCS to three areas of biomedical innovation: (1) RNA interference (RNAi), (2) cholesterol and (3) cloning. RNAi was selected to examine the efficacy of PCS for understanding the origins of an emerging technology with well-documented expert reviews to ground our findings (Schmidt, 2007). Cholesterol was selected to consider how PCS performs for searches on broad areas of biomedical innovation and clinical relevance that reflect the less sophisticated kind of searches conducted by users who are not library and information scientists or patent experts. Finally, our third case study was selected to reveal the advantages conferred by using the PCS algorithm to identify a seminal patent.
Section snippets
Methods
PCS can be performed over any set of US patent data that includes a list of referenced patents, or backward citations. Our routines utilize data from PatentsView, a data platform sponsored by the USPTO. PatentsView provides backward citation information for all US patents from 1976 through July 2016 via an Application Programming Interface (API). We leverage this API both in demonstrating the utility of PCS to identify seminal patents and in creating a tool that makes PCS available for
Results
To demonstrate the utility of this tool, we applied PCS to an area of biomedical innovation: RNA interference. We selected RNA interference (hereto RNAi) as a use case for two reasons: (1) RNAi represents a burgeoning domain of biomedical innovation with potentially therapeutic applications for the treatment of viral infections and cancer; and (2) the patent landscape of RNAi has been studied by subject matter experts, which allows us to compare the results of PCS with their conclusions.
In
Discussion
Identifying intellectual pathways within biomedical science and technology is an important component of patent landscapes required by businesses and policy-makers. The possibility of online identification of landmark patents by PCS supports the generation of data-driven patent landscapes. Using the PCS methodology and application described here, it may be easier for users to understand the fundamental patents of myriad biotechnologies as well as the companies (assignees) and people (inventors)
Author contributions
Jordan A. Comins: Concieved and designed the analysis; Collected the data; Contributed data or analysis tool; Performed the analysis; Wrote the paper.
Stephanie A. Carmack: Concieved and designed the analysis.
Loet Leydesdorff: Concieved and designed the analysis; Wrote the paper.
Acknowledgment
We thank three anonymous referees for comments.
References (48)
- et al.
A literature review on the state-of-the-art in patent analysis
World Patent Information
(2014) - et al.
Applicant and examiner citations in U.S. patents: An overview and analysis
Research Policy
(2009) - et al.
The emerging Clusters Model: A tool for identifying emerging technologies across multiple patent systems
Research Policy
(2015) - et al.
Compressing multiple scales of impact detection by Reference Publication Year Spectroscopy
Journal of Informetrics
(2015) - et al.
Referenced publication years spectroscopy applied to iMetrics: Scientometrics, journal of informetrics, and a relevant subset of JASIST
Journal of Informetrics
(2014) - et al.
Citations: Indicators of quality? The impact fallacy
Frontiers in Research Metrics and Analytics
(2016) - et al.
The increasing link between U.S. technology and public science
Research Policy
(1997) - et al.
Patent citation: A technique for measuring the knowledge flow of information and innovation
World Patent Information
(2017) - et al.
Introducing CitedReferencesExplorer : A program for Reference Publication Year Spectroscopy with Cited References Disambiguation
Journal of Informetrics
(2016) - et al.
Can literature analysis identify innovation drivers in drug discovery?
Nature Reviews Drug Discovery
(2009)
Commercial landscape of noninvasive prenatal testing in the United States
Prenatal Diagnosis
The end of theory: The data deluge makes the scientific method obsolete
Wired magazine
The story of the Cohen–Boyer patents
Current Science
The global stem cell patent landscape: Implications for efficient technology transfer and commercial development
Nature Biotechnology
The roots—A short history of industrial microbiology and biotechnology
Applied Microbiology and Biotechnology
Using patent analyses to monitor the technological trends in an emerging field of technology: A case of carbon nanotube field emission display
Scientometrics
Are all patent examiners equal?: The impact of characteristics on patent statistics and litigation outcomes
Detecting seminal research contributions to the development and use of the global positioning system by reference publication year spectroscopy
Scientometrics
RPYS i/o: Software demonstration of a web-based tool for the historiography and visualization of citation classics, sleeping beauties and research fronts
Scientometrics
Citation algorithms for identifying research milestones driving biomedical innovation
Scientometrics
Citation measures of Hard science, soft science, technology, and nonscience
The emerging patent landscape of CRISPR-Cas gene editing technology
Nature Biotechnology
Detecting the historical roots of tribology research: A bibliometric analysis
Scientometrics
U.S. Patent No. 6,506,559
Cited by (0)
- 1
The author’s affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE’s concurrence with, or support for, the positions, opinions or viewpoints expressed by the author. Approved for Public Release; Distribution Unlimited Case #17-0951.