skip to main content
research-article

Patent Query Formulation by Synthesizing Multiple Sources of Relevance Evidence

Published: 28 October 2014 Publication History

Abstract

Patent prior art search is a task in patent retrieval with the goal of finding documents which describe prior art work related to a query patent. A query patent is a full patent application composed of hundreds of terms which does not represent a single focused information need. Fortunately, other relevance evidence sources (i.e., classification tags and bibliographical data) provide additional details about the underlying information need. In this article, we propose a unified framework that integrates multiple relevance evidence components for query formulation. We first build a query model from the textual fields of a query patent. To overcome the term mismatch, we expand this initial query model with the term distribution of documents in the citation graph, modeling old and recent domain terminology. We build an IPC lexicon and perform query expansion using this lexicon incorporating proximity information. We performed an empirical evaluation on two patent datasets. Our results show that employing the temporal features of documents has a precision enhancing effect, while query expansion using IPC lexicon improves the recall of the final rank list.

References

[1]
G. Amati, G. Amodeo, and C. Gaibisso. 2012. Survival analysis for freshness in microblogging search. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 2483--2486.
[2]
A. Arampatzis and J. Kamps. 2009. A signal-to-noise approach to score normalization. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 797--806.
[3]
K. H. Atkinson. 2008. Toward a more rational patent search paradigm. In Proceedings of the ACM Workshop on Patent Information Retrieval (PaIR). 37--40.
[4]
L. Azzopardi and V. Vinay. 2008. Retrievability: An evaluation measure for higher order information access tasks. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 561--570.
[5]
S. Bashir and A. Rauber. 2010. Improving retrievability of patents in prior-art search. In Proceedings of the European Conference on Information Retrieval (ECIR). 457--470.
[6]
S. Brin and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30, 1--7 (1998), 107--117.
[7]
S. Calegari, E. Panzeri, and G. Pasi. 2012. PatentLight: A patent search application. In Proceedings of the 2nd Symposium on Information Interaction in Context (IIiX).
[8]
S. Cetintas and L. Si. 2012. Effective query generation and postprocessing strategies for prior art patent search. J. Am. Soc. Inf. Sci. Technol. 63, 3 (2012), 512--527.
[9]
E. D'hondt, S. Verberne, W. Alink, and R. Cornacchia. 2011. Combining document representations for prior-art retrieval. In Proceedings of CLEF (Notebook Papers/LABs/Workshops).
[10]
A. Fujii. 2007. Enhancing patent retrieval by citation analysis. In Proceedings of the ACM SIGIR Conference on Research and Developement in Information Retrieval. 793--794.
[11]
A. Fujii, M. Iwayama, and N. Kando. 2004. Overview of patent retrieval task at NTCIR-4. In Proceedings of the 4th NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization.
[12]
S. Fujita. 2004. Revisiting the document length hypotheses- NTCIR-4 CLIR and patent experiments at Patolis. In Proceedings of the 4th NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization.
[13]
D. Ganguly, J. Leveling, W. Magdy, and G. J. F. Jones. 2011. Patent query reduction based on pseudo-relevant documents. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 1953--1956.
[14]
S. Gerani, M. J. Carman, and F. Crestani. 2012. Aggregation methods for proximity-based opinion retrieval. ACM Trans. Inf. Syst. 30, 4 (2012), 26.
[15]
J. Gobeill, E. Pasche, D. Teodoro, and P. Ruch. 2009. Simple pre and post processing strategies for patent searching in CLEF intellectual property track 2009. In Proceedings of CLEF (Notebook Papers/LABs/Workshops). 444--451.
[16]
C. G. Harris, R. Arens, and P. Srinivasan. 2011. Using classification code hierarchies for patent prior art searches. In Current Challenges in Patent Retrieval. The Information Retrieval Services, Vol. 9, Springer, Berlin, 287--304.
[17]
M. Iwayama, A. Fujii, N. Kando, and A. Takano. 2003. Overview of the 3rd NTCIR Workshop. In Proceedings of the ACL Workshop on Patent Corpus Processing. 24--32.
[18]
H. Joho, L. A. Azzopardi, and W. Vanderbauwhede. 2010. A survey of patent users: An analysis of tasks, behavior, search functionality and system requirements. In Proceedings of the 3rd Symposium on Information Interaction in Context (IIiX). 13--24.
[19]
J. M. Kleinberg. 1999. Authoritative Sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604--632.
[20]
C. H. A. Koster, M. Seutter, and J. Beney. 2003. Multi-classification of patent applications with Winnow. In Proceedings of the 5th International Ershov Memorial Conference on Perspectives of System Informatics. Vol. 2890, Lecture Notes in Computer Science. Springer, 546--555.
[21]
J. H. Lee. 1997. Analyses of multiple evidence combination. In Proceedings of the ACM SIGIR Conference on Research and Developement in Information Retrieval. 267--276.
[22]
X. Li and B. Croft. 2003. Time-based language models. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM). 469--475.
[23]
P. Lopez and L. Romary. 2009. PATATRAS: Retrieval model combination and regression models for prior art search. In Proceedings of CLEF (Notebook Papers/LABs/Workshops). 430--437.
[24]
P. Lopez and L. Romary. 2010. Experiments with citation mining and key-term extraction for prior art search. In Proceedings of CLEF (Notebook Papers/LABs/Workshops).
[25]
M. Lupu and A. Hanbury. 2013. Patent retrieval. Found. Trends Inf. Retr. 7, 1 (2013), 1--97.
[26]
M. Lupu, K. Mayer, J. Tait, and A. J. Trippe. 2011. Current Challenges in Patent Information Retrieval. Springer.
[27]
Y. Lv and C. Zhai. 2009. Positional language models for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Developement in Information Retrieval. 299--306.
[28]
Y. Lv and C. Zhai. 2010. Positional relevance model for pseudo-relevance feedback. In Proceedings of the ACM SIGIR Conference on Research and Developement in Information Retrieval. 579--586.
[29]
W. Magdy and G. J. F. Jones. 2010a. Applying the KISS principle for the CLEF-IP 2010 prior art candidate patent search task. In Proceedings of CLEF (Notebook Papers/LABs/Workshops).
[30]
W. Magdy and G. J. F. Jones. 2010b. PRES: A score metric for evaluating recall-oriented information retrieval applications. In Proceedings of the ACM SIGIR Conference on Research and Developement in Information Retrieval. 611--618.
[31]
W. Magdy and G. J. F. Jones. 2011. A study on query expansion methods for patent retrieval. In Proceedings of the ACM Workshop on Patent Information Retrieval (PaIR). 19--24.
[32]
W. Magdy, P. Lopez, and G. J. F. Jones. 2010. Simple vs. sophisticated approaches for patent prior-art search. In Proceedings of the European Conference on Information Retrieval (ECIR). 725--728.
[33]
P. Mahdabi, L. Andersson, A. Hanbury, and F. Crestani. 2011a. Report on the CLEF-IP 2011 experiments: Exploring patent summarization. In Proceedings of CLEF (Notebook Papers/LABs/Workshops).
[34]
P. Mahdabi and F. Crestani. 2013. The effect of citation analysis on query expansion for patent retrieval. J. Inf. Retr.
[35]
P. Mahdabi, S. Gerani, J. X. Huang, and F. Crestani. 2013. Leveraging conceptual lexicon: Query disambiguation using proximity information for patent retrieval. In Proceedings of the ACM SIGIR Conference on Research and Developement in Information Retrieval. 113--122.
[36]
P. Mahdabi, M. Keikha, S. Gerani, M. Landoni, and F. Crestani. 2011b. Building queries for prior-art search. In Proceedings of the Information Retrieval Facility Conference (IRFC). 3--15.
[37]
H. Mase, T. Matsubayashi, Y. Ogawa, and M. Wayama. 2005. Proposal of two-stage patent retrieval method considering the claim structure. ACM Trans. Asian Lang. Inf. Process. 4, 2 (2005), 190--206.
[38]
M. H. Peetz and M. de Rijke. 2013. Cognitive temporal document priors. In Proceedings of the European Conference on Information Retrieval (ECIR). 318--330.
[39]
F. Piroi. 2010. CLEF-IP 2010: Retrieval experiments in the intellectual property domain. In Proceedings of CLEF (Notebook Papers/LABs/Workshops).
[40]
F. Piroi, M. Lupu, A. Hanbury, and V. Zenz. 2011. CLEF-IP 2011: Retrieval in the intellectual property domain. In Proceedings of CLEF (Notebook Papers/Labs/Workshop).
[41]
M. Salampasis, G. Paltoglou, and A. Giahanou. 2012. Report on the CLEF-IP 2012 experiments: Search of topically organized patents. In Proceedings of CLEF (Notebook Papers/LABs/Workshops).
[42]
M. D. Smucker, J. Allan, and B. Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM). 623--632.
[43]
P. Sondhi, V. G. V. Vydiswaran, and C. Zhai. 2012. Reliability prediction of webpages in the medical domain. In Proceedings of the European Conference on Information Retrieval (ECIR). 219--231.
[44]
T. Takaki, A. Fujii, and T. Ishikawa. 2004. Associative document retrieval by query subtopic analysis and Its application to invalidity patent search. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM). 399--405.
[45]
D. Teodoro, E. Pasche, D. Vishnyakova, C. Lovis, J. Gobeill, and P. Ruch. 2010. Automatic IPC encoding and novelty tracking for effective patent mining. In Proceedings of the 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access. 309--317.
[46]
M. Verma and V. Varma. 2011. Exploring key-phrase extraction and IPC classification vectors for prior art search. In Proceedings of CLEF (Notebook Papers/LABs/Workshops).
[47]
X. Xue and W. B. Croft. 2009a. Automatic query generation for patent search. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 2037--2040.
[48]
X. Xue and W. B. Croft. 2009b. Transforming patents into prior-art queries. In Proceedings of the ACM SIGIR Conference on Research and Developement in Information Retrieval. 808--809.
[49]
X. Yin, X. Huang, and Z. Li. 2010. Promoting ranking diversity for biomedical information retrieval using Wikipedia. In Proceedings of the European Conference on Information Retrieval (ECIR). 495--507.
[50]
C. Zhai and J. D. Lafferty. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Developement in Information Retrieval. 334--342.

Cited By

View all

Index Terms

  1. Patent Query Formulation by Synthesizing Multiple Sources of Relevance Evidence

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 32, Issue 4
    October 2014
    198 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/2684820
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2014
    Accepted: 01 July 2014
    Revised: 01 March 2014
    Received: 01 October 2013
    Published in TOIS Volume 32, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Patent search
    2. citation analysis
    3. proximity
    4. query expansion

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Innovating Patent Retrieval: A Comprehensive Review of Techniques, Trends, and Challenges in Prior Art SearchesApplied System Innovation10.3390/asi70500917:5(91)Online publication date: 26-Sep-2024
    • (2021)PQPSMobile Information Systems10.1155/2021/24977702021Online publication date: 1-Jan-2021
    • (2020)Patent Analytic Citation-Based VSM: Challenges and ApplicationsIEEE Access10.1109/ACCESS.2020.29678178(17464-17476)Online publication date: 2020
    • (2019)Query Oriented Extractive-Abstractive Summarization System (QEASS)Proceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3297001.3297046(301-305)Online publication date: 3-Jan-2019
    • (2019)Patent expanded retrieval via word embedding under composite-domain perspectivesFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-7056-613:5(1048-1061)Online publication date: 1-Oct-2019
    • (2019)Patent retrieval: a literature reviewKnowledge and Information Systems10.1007/s10115-018-1322-7Online publication date: 14-Jan-2019
    • (2018)PatSearchKnowledge and Information Systems10.1007/s10115-017-1127-057:1(135-158)Online publication date: 1-Oct-2018
    • (2018)Semantic Query-Based Patent Summarization System (SQPSS)Advances in Data Science10.1007/978-981-13-3582-2_13(169-179)Online publication date: 29-Nov-2018
    • (2017)Patent citation: A technique for measuring the knowledge flow of information and innovationWorld Patent Information10.1016/j.wpi.2017.11.00251(31-42)Online publication date: Dec-2017
    • (2015)On Term Selection Techniques for Patent Prior Art SearchProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767801(803-806)Online publication date: 9-Aug-2015
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media