Abstract
Pseudo-relevance feedback (PRF) is an effective approach in Information Retrieval but unfortunately many experiments have shown that PRF is ineffective in patent retrieval. This is because the quality of initial results in the patent retrieval is poor and therefore estimating a relevance model via PRF often hurts the retrieval performance due to off-topic terms. We propose a learning to rank framework for estimating the effectiveness of a patent document in terms of its performance in PRF. Specifically, the knowledge of effective feedback documents on past queries is used to estimate effective feedback documents for new queries. This is achieved by introducing features correlated with feedback document effectiveness. We use patent-specific contents to define such features. We then apply regression to predict document effectiveness given the proposed features. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IPĀ 2010. Our experimental results show significantly improved retrieval accuracy over a PRF baseline which expands the query using all top-ranked documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cao, G., Nie, J.-Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of SIGIR, pp. 243ā250 (2008)
Collins-Thompson, K.: Reducing the risk of query expansion via robust constrained optimization. In: Proceedings of CIKM, pp. 837ā846 (2009)
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR, pp. 299ā306 (2002)
Dillon, J.V., Collins-Thompson, K.: A unified optimization framework for robust pseudo-relevance feedback algorithms. In: Proceedings of CIKM, pp. 1069ā1078 (2010)
Friedman, J.H.: Stochastic gradient boosting. Computational Statistics and Data AnalysisĀ 38, 367ā378 (1999)
Fujii, A.: Enhancing patent retrieval by citation analysis. In: Proceedings of SIGIR, pp. 793ā794 (2007)
Ganguly, D., Leveling, J., Magdy, W., Jones, G.J.F.: Patent query reduction based on pseudo-relevant documents. In: Proceedings of CIKM, pp. 1953ā1956 (2011)
He, B., Ounis, I.: Finding good feedback documents. In: Proceedings of CIKM, pp. 2011ā2014 (2009)
Itoh, H., Mano, H., Ogawa, Y.: Term distillation in patent retrieval. In: Proceedings of the ACL 2003 Workshop on Patent Corpus Processing, pp. 41ā45 (2003)
Iwayama, M., Fujii, A., Kando, N., Takano, A.: Overview of the third NTCIR workshop. In: Proceedings of the ACL 2003 Workshop on Patent Corpus Processing, pp. 24ā32 (2003)
Keikha, M., Seo, J., Croft, W.B., Crestani, F.: Predicting document effectiveness in pseudo relevance feedback. In: Proceedings of CIKM, pp. 2061ā2064 (2011)
Konishi, K.: Query terms extraction from patent document for invalidity search. In: Proceedings of NTCIR 2005 (2005)
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of SIGIR, pp. 120ā127 (2001)
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of SIGIR, pp. 299ā306 (2009)
Magdy, W., Jones, G.J.F.: PRES: A score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of SIGIR, pp. 611ā618 (2010)
Magdy, W., Leveling, J., Jones, G.J.F.: Exploring Structured Documents and Query Formulation Techniques for Patent Retrieval. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mostefa, D., Penas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol.Ā 6241, pp. 410ā417. Springer, Heidelberg (2010)
Magdy, W., Lopez, P., Jones, G.J.F.: Simple vs. sophisticated approaches for patent prior-art search. In: Proceedings of ECIR, pp. 725ā728 (2010)
Mahdabi, P., Keikha, M., Gerani, S., Landoni, M., Crestani, F.: Building Queries for Prior-Art Search. In: Hanbury, A., Rauber, A., de Vries, A.P. (eds.) IRFC 2011. LNCS, vol.Ā 6653, pp. 3ā15. Springer, Heidelberg (2011)
Piroi, F., Tait, J.: CLEF-IP 2010: Retrieval experiments in the intellectual property domain. In: Workshop of the Cross-Language Evaluation Forum, LABs and Workshops, Notebook Papers, CLEF 2010 (2010)
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of SIGIR, pp. 232ā241 (1994)
Takeuchi, H., Uramoto, N., Takeda, K.: Experiments on Patent Retrieval at NTCIR-5 Workshop (2005)
Xu, J., Croft, B.: Query expansion using local and global document analysis. In: Proceedings of SIGIR, pp. 4ā11 (1996)
Xue, X., Croft, W.B.: Transforming patents into prior-art queries. In: Proceedings of SIGIR, pp. 808ā809 (2009)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334ā342 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mahdabi, P., Crestani, F. (2012). Learning-Based Pseudo-Relevance Feedback for Patent Retrieval. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-31274-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31273-1
Online ISBN: 978-3-642-31274-8
eBook Packages: Computer ScienceComputer Science (R0)