Exploring Contextual Models in Chemical Patent Search

Urbain, Jay; Frieder, Ophir

doi:10.1007/978-3-642-13084-7_6

Jay Urbain¹⁹ &
Ophir Frieder²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6107))

Included in the following conference series:

Information Retrieval Facility Conference

410 Accesses

Abstract

We explore the development of probabilistic retrieval models for integrating term statistics with entity search using multiple levels of document context to improve the performance of chemical patent search. A distributed indexing model was developed to enable efficient named entity search and aggregation of term statistics at multiple levels of patent structure including individual words, sentences, claims, descriptions, abstracts, and titles. The system can be scaled to an arbitrary number of compute instances in a cloud computing environment to support concurrent indexing and query processing operations on large patent collections.

The query processing algorithm for patent prior art search uses information extraction techniques to identify candidate entities and distinctive terms from the query patent’s title, abstract, description, and claim sections. Structured queries integrating terms and entities in context are automatically generated to test the validity of each section of potentially relevant patents.

The system was deployed across 15 Amazon Web Services (AWS) Elastic Cloud Compute (EC2) instances to support efficient indexing and query processing of the relatively large 100G+ collection of chemical patent documents. We evaluated several retrieval models for integrating statistics of candidate entities with term statistics at multiple levels of patent structure to identify relevant patents for prior art search. Our top performing retrieval model integrating contextual evidence from multiple levels of patent structure resulted in bpref measurements of 0.8929 for the prior art search task, exceeding the top results reported from the 2009 TREC Chemistry track.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lupu, M., Piroi, F., Tait, J.: Overview of the TREC 2009 Chemical IR Track. In: The Eighteenth Text REtrieval Conference Proceedings (TREC 2009), Gaithersburg, Maryland (2009)
Google Scholar
Adams, S.: The text, the full text and nothing but the text: Part 1 – Standards for creating textual information in patent documents and general search implications. In: World Patent Information, vol. 32, pp. 22–29. Elsevier Ltd., Amsterdam (2010)
Google Scholar
Fujii, Atsushi, Iwayama, M., Kando, N.: Introduction to the special issue on patent processing. Information Processing and Management 43, 149–1153 (2007)
Article Google Scholar
Urbain, J., Frieder, O., Goharian, N.: Probabilistic Passage Models for Semantic Search of Genomics Literature. Journal of the American Society of Information Science and Technology (2008)
Google Scholar
Urbain, J., Frieder, O., Goharian, N.: A Dimensional Retrieval Model for Integrating Semantics and Statistical Evidence in Context for Genomics Literature Search. Computers in Biology and Medicine (2008)
Google Scholar
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venckatrao, M., Pells, F.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1(1) (1997)
Google Scholar
Kimball, R.: The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley, Ralph (1996)
Google Scholar
Amazon Web Services, http://aws.amazon.com/documentation/PubChem , National Center for Biotechnology Information (NCBI), http://pubchem.ncbi.nlm.nih.gov
Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Google Scholar
Robertson, S., Walker, S.: Okapi/Keenbow at TREC-8, pp. 246–500. NIST Special Publication (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Engineering & Computer Science Department, Milwaukee School of Engineering, Milwaukee, WI
Jay Urbain
Department of Computer Science, Georgetown University, Washington, DC
Ophir Frieder

Authors

Jay Urbain
View author publications
You can also search for this author in PubMed Google Scholar
Ophir Frieder
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, University of Sheffield, Regent Court, 211 Portobello St., S1 4DP, Sheffield, UK
Hamish Cunningham
Information Retrieval Facility, Operngasse 20b, 1040, Vienna, Austria
Allan Hanbury
Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
Stefan Rüger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Urbain, J., Frieder, O. (2010). Exploring Contextual Models in Chemical Patent Search. In: Cunningham, H., Hanbury, A., Rüger, S. (eds) Advances in Multidisciplinary Retrieval. IRFC 2010. Lecture Notes in Computer Science, vol 6107. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13084-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-13084-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13083-0
Online ISBN: 978-3-642-13084-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics