ABSTRACT
Patent prior art search is a task in patent retrieval where the goal is to rank documents which describe prior art work related to a patent application. One of the main properties of patent retrieval is that the query topic is a full patent application and does not represent a focused information need. This query by document nature of patent retrieval introduces new challenges and requires new investigations specific to this problem. Researchers have addressed this problem by considering different information resources for query reduction and query disambiguation. However, previous work has not fully studied the effect of using proximity information and exploiting domain specific resources for performing query disambiguation.
In this paper, we first reduce the query document by taking the first claim of the document itself. We then build a query-specific patent lexicon based on definitions of the International Patent Classification (IPC). We study how to expand queries by selecting expansion terms from the lexicon that are focused on the query topic. The key problem is how to capture whether an expansion term is focused on the query topic or not. We address this problem by exploiting proximity information. We assign high weights to expansion terms appearing closer to query terms based on the intuition that terms closer to query terms are more likely to be related to the query topic.
Experimental results on two patent retrieval datasets show that the proposed method is effective and robust for query expansion, significantly outperforming the standard pseudo relevance feedback (PRF) and existing baselines in patent retrieval.
- A. Arampatzis and J. Kamps. A signal-to-noise approach to score normalization. In CIKM, pages 797--806, 2009. Google ScholarDigital Library
- L. Azzopardi and V. Vinay. Retrievability: an evaluation measure for higher order information access tasks. In CIKM, pages 561--570, 2008. Google ScholarDigital Library
- S. Bashir and A. Rauber. Improving retrievability of patents in prior-art search. In ECIR, pages 457--470, 2010. Google ScholarDigital Library
- S. Cetintas and L. Si. Effective query generation and postprocessing strategies for prior art patent search. JASIST, 63(3):512--527, 2012. Google ScholarDigital Library
- D. Ganguly, J. Leveling, W. Magdy, and G. J. F. Jones. Patent query reduction based on pseudo-relevant documents. In CIKM, pages 1953--1956, 2011. Google ScholarDigital Library
- S. Gerani, M. J. Carman, and F. Crestani. Aggregation methods for proximity-based opinion retrieval. TOIS, 30(4):26, 2012. Google ScholarDigital Library
- J.-H. Lee. Analyses of multiple evidence combination. In SIGIR, pages 267--276, 1997. Google ScholarDigital Library
- P. Lopez and L. Romary. Patatras: Retrieval model combination and regression models for prior art search. In CLEF (Notebook Papers/LABs/Workshops), pages 430--437, 2009. Google ScholarDigital Library
- P. Lopez and L. Romary. Experiments with citation mining and key-term extraction for prior art search. CLEF (Notebook Papers/LABs/Workshops), 2010.Google Scholar
- M. Lupu and A. Hanbury. Patent retrieval. Foundations and Trends® in Information Retrieval, 7(1):1--97, 2013.Google ScholarDigital Library
- M. Lupu, K. Mayer, J. Tait, and A. Trippe. Current Challenges in Patent Information Retrieval. Springer, 2011. Google ScholarDigital Library
- Y. Lv and C. Zhai. Positional language models for information retrieval. In SIGIR, pages 299--306, 2009. Google ScholarDigital Library
- Y. Lv and C. Zhai. Positional relevance model for pseudo-relevance feedback. In SIGIR, pages 579--586, 2010. Google ScholarDigital Library
- W. Magdy and G. J. F. Jones. PRES: A score metric for evaluating recall-oriented information retrieval applications. In SIGIR, pages 611--618, 2010. Google ScholarDigital Library
- W. Magdy and G. J. F. Jones. A study on query expansion methods for patent retrieval. In PAIR 2011 - CIKM, pages 19--24, 2011. Google ScholarDigital Library
- P. Mahdabi, L. Andersson, M. Keikha, and F. Crestani. Automatic refinement of patent queries using concept importance predictors. In SIGIR, pages 505--514, 2012. Google ScholarDigital Library
- P. Sondhi, V. G. V. Vydiswaran, and C. Zhai. Reliability prediction of webpages in the medical domain. In ECIR, pages 219--231, 2012. Google ScholarDigital Library
- X. Xue and W. B. Croft. Automatic query generation for patent search. CKIM, pages 2037--2040, 2009. Google ScholarDigital Library
- X. Yin, X. Huang, and Z. Li. Promoting ranking diversity for biomedical information retrieval using wikipedia. In ECIR, pages 495--507, 2010. Google ScholarDigital Library
- C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334--342, 2001. Google ScholarDigital Library
Index Terms
- Leveraging conceptual lexicon: query disambiguation using proximity information for patent retrieval
Recommendations
On Term Selection Techniques for Patent Prior Art Search
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalIn this paper, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection, using the Description section of the patent query with Language Model (LM) and BM25 scoring functions. We find that an ...
Patent Query Formulation by Synthesizing Multiple Sources of Relevance Evidence
Patent prior art search is a task in patent retrieval with the goal of finding documents which describe prior art work related to a query patent. A query patent is a full patent application composed of hundreds of terms which does not represent a single ...
A User-Friendly Patent Search Paradigm
As an important operation for finding existing relevant patents and validating a new patent application, patent search has attracted considerable attention recently. However, many users have limited knowledge about the underlying patents, and they have ...
Comments