Skip to main content

Improving Access to Large Patent Corpora

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 6380))

Abstract

Retrievability is a measure of access that quantifies how easily documents can be found using a retrieval system. Such a measure is of particular interest within the patent domain, because if a retrieval system makes some patents hard to find, then patent searchers will have a difficult time retrieving these patents. This may mean that a patent searcher could miss important and relevant patents because of the retrieval system. In this paper, we describe measures of retrievability and how they can be applied to measure the overall access to a collection given a retrieval system. We then identify three features of best-match retrieval models that are hypothesized to lead to an improvement in access to all documents in the collection: sensitivity to term frequency, length normalization and convexity. Since patent searchers tend to favor Boolean models over best-match models, hybrid retrieval models are proposed that incorporate these features while preserving the desirable aspects of the traditional Boolean model. An empirical study conducted on four large patent corpora demonstrates that these hybrid models provide better access to the corpus of patents than the traditional Boolean model.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The lemur toolkit, http://trec.nist.gov/data.html (Last visited 2010)

  2. Matrixware research collection (2010), http://www.ir-facility.org/research/data/matrixware-research-collection

  3. Arampatzis, A., Kamps, J., Koolen, M., Nussbaum, N.: Access to legal documents: Exact match, best match and combinations. In: TREC 2007: NIST Special Publication 500-274: The Sixteenth Text Retrieval Conference Proceedings, Gaithersburg, MD, USA. NIST (2007)

    Google Scholar 

  4. Azzopardi, L., Bache, R.: On the relationship between effectiveness and accessibility. In: Proceedings of the 33th Annual ACM Conference on Research and Development in Information Retrieval, SIGIR 2010 (to appear, 2010)

    Google Scholar 

  5. Azzopardi, L., Vanderbauwhede, W., Joho, H.: A survey of patent analysts’ search requirements. In: Proceedings of the 33th Annual ACM Conference on Research and Development in Information Retrieval, SIGIR 2010 (to appear, 2010)

    Google Scholar 

  6. Azzopardi, L., Vinay, V.: Accessibility in information retrieval. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 482–489. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Azzopardi, L., Vinay, V.: Document accessibility: Evaluating the access afforded to a document by the retrieval system. In: Evaluation Workshop at the European Conference in Information Retrieval, Glasgow, UK (March 30-April 3, 2008)

    Google Scholar 

  8. Azzopardi, L., Vinay, V.: Evaluation methods for information access tasks. In: CIKM 2008 Proceedings of the 17th ACM International Conference on Information and Knowledge Management, California, US, October 26-30. ACM Press, New York (2008)

    Google Scholar 

  9. Bache, R., Azzopardi, L.: Identifying retrievability-improving model features to enhance boolean search for patent retrieval. In: Proceedings of the 1st International Workshop on the Advances in Patent Information Retrieval (2010)

    Google Scholar 

  10. Bashir, S., Rauber, A.: Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In: CIKM, pp. 1863–1866 (2009)

    Google Scholar 

  11. Bashir, S., Rauber, A.: Improving retrievability of patents in prior-art search. To appear ECIR2010, Milton Keynes, England (2010)

    Google Scholar 

  12. Bonino, D., Ciaramella, A., Corno, F.: Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Patent Information 32(1), 30–38 (2010)

    Article  Google Scholar 

  13. Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: SIGIR ’04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–56. ACM, New York (2004)

    Google Scholar 

  14. Gastwirth, J.: The estimation of the lorenz curve and gini index. The Review of Economics and Statistics 54, 306–316 (1972)

    Article  MathSciNet  Google Scholar 

  15. Hunt, D., Nguyen, L., Rodgers, M.: Patent Searching: Tools and Techniques. John Wiley and Sons, Chichester (2007)

    Google Scholar 

  16. Joho, H., Azzopardi, L., Vanderbauwhede, W.: A survey of patent users: An analysis of tasks, behavior, search functionality and system requirements. In: Proceedings of the 3rd Symposium on Information Interaction in Context, IIiX 2010 (to appear, 2010)

    Google Scholar 

  17. Ma, H., Chandrasekar, R., Quirk, C., Gupta, A.: Improving search engines using human computation games. In: CIKM ’09: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 275–284 (2009)

    Google Scholar 

  18. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  19. Salton, G., Fox, E., Wu, H.: Extended boolean information retrieval. Communications of ACM, 1022–1036 (1983)

    Google Scholar 

  20. Spärk Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 60(5), 779–840 (2004)

    Google Scholar 

  21. Spärk Jones, K., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: Development and comparative experiments (parts 1 and 2). Information Processing and Management 36(6), 493–502 (2000)

    Google Scholar 

  22. Tseng, Y.H., Wu, Y.J.: A study of search tactics for patentability search: a case study on patent engineers. In: PaIR ’08: Proceeding of the 1st ACM Workshop on Patent Information Retrieval, pp. 33–36. ACM Press, New York (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bache, R., Azzopardi, L. (2010). Improving Access to Large Patent Corpora. In: Hameurlain, A., Küng, J., Wagner, R., Bach Pedersen, T., Tjoa, A.M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems II. Lecture Notes in Computer Science, vol 6380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16175-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16175-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16174-2

  • Online ISBN: 978-3-642-16175-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics