Learning-Based Pseudo-Relevance Feedback for Patent Retrieval

Mahdabi, Parvaz; Crestani, Fabio

doi:10.1007/978-3-642-31274-8_1

Parvaz Mahdabi¹⁸ &
Fabio Crestani¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7356))

Included in the following conference series:

Information Retrieval Facility Conference

1016 Accesses
6 Citations

Abstract

Pseudo-relevance feedback (PRF) is an effective approach in Information Retrieval but unfortunately many experiments have shown that PRF is ineffective in patent retrieval. This is because the quality of initial results in the patent retrieval is poor and therefore estimating a relevance model via PRF often hurts the retrieval performance due to off-topic terms. We propose a learning to rank framework for estimating the effectiveness of a patent document in terms of its performance in PRF. Specifically, the knowledge of effective feedback documents on past queries is used to estimate effective feedback documents for new queries. This is achieved by introducing features correlated with feedback document effectiveness. We use patent-specific contents to define such features. We then apply regression to predict document effectiveness given the proposed features. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IP 2010. Our experimental results show significantly improved retrieval accuracy over a PRF baseline which expands the query using all top-ranked documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cao, G., Nie, J.-Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of SIGIR, pp. 243–250 (2008)
Google Scholar
Collins-Thompson, K.: Reducing the risk of query expansion via robust constrained optimization. In: Proceedings of CIKM, pp. 837–846 (2009)
Google Scholar
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR, pp. 299–306 (2002)
Google Scholar
Dillon, J.V., Collins-Thompson, K.: A unified optimization framework for robust pseudo-relevance feedback algorithms. In: Proceedings of CIKM, pp. 1069–1078 (2010)
Google Scholar
Friedman, J.H.: Stochastic gradient boosting. Computational Statistics and Data Analysis 38, 367–378 (1999)
Article Google Scholar
Fujii, A.: Enhancing patent retrieval by citation analysis. In: Proceedings of SIGIR, pp. 793–794 (2007)
Google Scholar
Ganguly, D., Leveling, J., Magdy, W., Jones, G.J.F.: Patent query reduction based on pseudo-relevant documents. In: Proceedings of CIKM, pp. 1953–1956 (2011)
Google Scholar
He, B., Ounis, I.: Finding good feedback documents. In: Proceedings of CIKM, pp. 2011–2014 (2009)
Google Scholar
Itoh, H., Mano, H., Ogawa, Y.: Term distillation in patent retrieval. In: Proceedings of the ACL 2003 Workshop on Patent Corpus Processing, pp. 41–45 (2003)
Google Scholar
Iwayama, M., Fujii, A., Kando, N., Takano, A.: Overview of the third NTCIR workshop. In: Proceedings of the ACL 2003 Workshop on Patent Corpus Processing, pp. 24–32 (2003)
Google Scholar
Keikha, M., Seo, J., Croft, W.B., Crestani, F.: Predicting document effectiveness in pseudo relevance feedback. In: Proceedings of CIKM, pp. 2061–2064 (2011)
Google Scholar
Konishi, K.: Query terms extraction from patent document for invalidity search. In: Proceedings of NTCIR 2005 (2005)
Google Scholar
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of SIGIR, pp. 120–127 (2001)
Google Scholar
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of SIGIR, pp. 299–306 (2009)
Google Scholar
Magdy, W., Jones, G.J.F.: PRES: A score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of SIGIR, pp. 611–618 (2010)
Google Scholar
Magdy, W., Leveling, J., Jones, G.J.F.: Exploring Structured Documents and Query Formulation Techniques for Patent Retrieval. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mostefa, D., Penas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 410–417. Springer, Heidelberg (2010)
Chapter Google Scholar
Magdy, W., Lopez, P., Jones, G.J.F.: Simple vs. sophisticated approaches for patent prior-art search. In: Proceedings of ECIR, pp. 725–728 (2010)
Google Scholar
Mahdabi, P., Keikha, M., Gerani, S., Landoni, M., Crestani, F.: Building Queries for Prior-Art Search. In: Hanbury, A., Rauber, A., de Vries, A.P. (eds.) IRFC 2011. LNCS, vol. 6653, pp. 3–15. Springer, Heidelberg (2011)
Chapter Google Scholar
Piroi, F., Tait, J.: CLEF-IP 2010: Retrieval experiments in the intellectual property domain. In: Workshop of the Cross-Language Evaluation Forum, LABs and Workshops, Notebook Papers, CLEF 2010 (2010)
Google Scholar
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of SIGIR, pp. 232–241 (1994)
Google Scholar
Takeuchi, H., Uramoto, N., Takeda, K.: Experiments on Patent Retrieval at NTCIR-5 Workshop (2005)
Google Scholar
Xu, J., Croft, B.: Query expansion using local and global document analysis. In: Proceedings of SIGIR, pp. 4–11 (1996)
Google Scholar
Xue, X., Croft, W.B.: Transforming patents into prior-art queries. In: Proceedings of SIGIR, pp. 808–809 (2009)
Google Scholar
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334–342 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, University of Lugano, Switzerland
Parvaz Mahdabi & Fabio Crestani

Authors

Parvaz Mahdabi
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Crestani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Vienna, Austria
Michail Salampasis
Royal School of Library and Information Science, 2300, Copenhagen, Denmark
Birger Larsen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahdabi, P., Crestani, F. (2012). Learning-Based Pseudo-Relevance Feedback for Patent Retrieval. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-31274-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31273-1
Online ISBN: 978-3-642-31274-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics