Skip to main content

Learning-Based Pseudo-Relevance Feedback for Patent Retrieval

  • Conference paper
Multidisciplinary Information Retrieval (IRFC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7356))

Included in the following conference series:

Abstract

Pseudo-relevance feedback (PRF) is an effective approach in Information Retrieval but unfortunately many experiments have shown that PRF is ineffective in patent retrieval. This is because the quality of initial results in the patent retrieval is poor and therefore estimating a relevance model via PRF often hurts the retrieval performance due to off-topic terms. We propose a learning to rank framework for estimating the effectiveness of a patent document in terms of its performance in PRF. Specifically, the knowledge of effective feedback documents on past queries is used to estimate effective feedback documents for new queries. This is achieved by introducing features correlated with feedback document effectiveness. We use patent-specific contents to define such features. We then apply regression to predict document effectiveness given the proposed features. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IPĀ 2010. Our experimental results show significantly improved retrieval accuracy over a PRF baseline which expands the query using all top-ranked documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cao, G., Nie, J.-Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of SIGIR, pp. 243ā€“250 (2008)

    Google ScholarĀ 

  2. Collins-Thompson, K.: Reducing the risk of query expansion via robust constrained optimization. In: Proceedings of CIKM, pp. 837ā€“846 (2009)

    Google ScholarĀ 

  3. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR, pp. 299ā€“306 (2002)

    Google ScholarĀ 

  4. Dillon, J.V., Collins-Thompson, K.: A unified optimization framework for robust pseudo-relevance feedback algorithms. In: Proceedings of CIKM, pp. 1069ā€“1078 (2010)

    Google ScholarĀ 

  5. Friedman, J.H.: Stochastic gradient boosting. Computational Statistics and Data AnalysisĀ 38, 367ā€“378 (1999)

    ArticleĀ  Google ScholarĀ 

  6. Fujii, A.: Enhancing patent retrieval by citation analysis. In: Proceedings of SIGIR, pp. 793ā€“794 (2007)

    Google ScholarĀ 

  7. Ganguly, D., Leveling, J., Magdy, W., Jones, G.J.F.: Patent query reduction based on pseudo-relevant documents. In: Proceedings of CIKM, pp. 1953ā€“1956 (2011)

    Google ScholarĀ 

  8. He, B., Ounis, I.: Finding good feedback documents. In: Proceedings of CIKM, pp. 2011ā€“2014 (2009)

    Google ScholarĀ 

  9. Itoh, H., Mano, H., Ogawa, Y.: Term distillation in patent retrieval. In: Proceedings of the ACL 2003 Workshop on Patent Corpus Processing, pp. 41ā€“45 (2003)

    Google ScholarĀ 

  10. Iwayama, M., Fujii, A., Kando, N., Takano, A.: Overview of the third NTCIR workshop. In: Proceedings of the ACL 2003 Workshop on Patent Corpus Processing, pp. 24ā€“32 (2003)

    Google ScholarĀ 

  11. Keikha, M., Seo, J., Croft, W.B., Crestani, F.: Predicting document effectiveness in pseudo relevance feedback. In: Proceedings of CIKM, pp. 2061ā€“2064 (2011)

    Google ScholarĀ 

  12. Konishi, K.: Query terms extraction from patent document for invalidity search. In: Proceedings of NTCIR 2005 (2005)

    Google ScholarĀ 

  13. Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of SIGIR, pp. 120ā€“127 (2001)

    Google ScholarĀ 

  14. Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of SIGIR, pp. 299ā€“306 (2009)

    Google ScholarĀ 

  15. Magdy, W., Jones, G.J.F.: PRES: A score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of SIGIR, pp. 611ā€“618 (2010)

    Google ScholarĀ 

  16. Magdy, W., Leveling, J., Jones, G.J.F.: Exploring Structured Documents and Query Formulation Techniques for Patent Retrieval. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mostefa, D., Penas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol.Ā 6241, pp. 410ā€“417. Springer, Heidelberg (2010)

    ChapterĀ  Google ScholarĀ 

  17. Magdy, W., Lopez, P., Jones, G.J.F.: Simple vs. sophisticated approaches for patent prior-art search. In: Proceedings of ECIR, pp. 725ā€“728 (2010)

    Google ScholarĀ 

  18. Mahdabi, P., Keikha, M., Gerani, S., Landoni, M., Crestani, F.: Building Queries for Prior-Art Search. In: Hanbury, A., Rauber, A., de Vries, A.P. (eds.) IRFC 2011. LNCS, vol.Ā 6653, pp. 3ā€“15. Springer, Heidelberg (2011)

    ChapterĀ  Google ScholarĀ 

  19. Piroi, F., Tait, J.: CLEF-IP 2010: Retrieval experiments in the intellectual property domain. In: Workshop of the Cross-Language Evaluation Forum, LABs and Workshops, Notebook Papers, CLEF 2010 (2010)

    Google ScholarĀ 

  20. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of SIGIR, pp. 232ā€“241 (1994)

    Google ScholarĀ 

  21. Takeuchi, H., Uramoto, N., Takeda, K.: Experiments on Patent Retrieval at NTCIR-5 Workshop (2005)

    Google ScholarĀ 

  22. Xu, J., Croft, B.: Query expansion using local and global document analysis. In: Proceedings of SIGIR, pp. 4ā€“11 (1996)

    Google ScholarĀ 

  23. Xue, X., Croft, W.B.: Transforming patents into prior-art queries. In: Proceedings of SIGIR, pp. 808ā€“809 (2009)

    Google ScholarĀ 

  24. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334ā€“342 (2001)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mahdabi, P., Crestani, F. (2012). Learning-Based Pseudo-Relevance Feedback for Patent Retrieval. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31274-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31273-1

  • Online ISBN: 978-3-642-31274-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics