Skip to main content

Focus and Element Length for Book and Wikipedia Retrieval

  • Conference paper
Comparative Evaluation of Focused Retrieval (INEX 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6932))

  • 398 Accesses

Abstract

In this paper we describe our participation in INEX 2010 in the Ad Hoc Track and the Book Track. In the Ad Hoc track we investigate the impact of propagated anchor-text on article level precision and the impact of an element length prior on the within-document precision and recall. Using the article ranking of an document level run for both document and focused retrieval techniques, we find that focused retrieval techniques clearly outperform document retrieval, especially for the Focused and Restricted Relevant in Context Tasks, which limit the amount of text than can be returned per topic and per article respectively. Somewhat surprisingly, an element length prior increases within-document precision even when we restrict the amount of retrieved text to only 1000 characters per topic. The query-independent evidence of the length prior can help locate elements with a large fraction of relevant text. For the Book Track we look at the relative impact of retrieval units based on whole books, individual pages and multiple pages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fachry, K.N., Kamps, J., Koolen, M., Zhang, J.: Using and detecting links in wikipedia. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 388–403. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  2. Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185. ACM Press, New York (2004)

    Google Scholar 

  3. Kamps, J., Koolen, M.: The impact of document level ranking on focused retrieval. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 140–151. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Kamps, J., Koolen, M.: On the relation between relevant passages and XML document structure. In: Trotman, A., Geva, S., Kamps, J. (eds.) SIGIR 2007 Workshop on Focused Retrieval, pp. 28–32. University of Otago, Dunedin (2007)

    Google Scholar 

  5. Kamps, J., Koolen, M., Sigurbjörnsson, B.: Filtering and clustering XML retrieval results. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 121–136. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Kamps, J., Koolen, M., Lalmas, M.: Locating relevant text within XML documents. In: Proceedings of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 847–849. ACM Press, New York (2008)

    Google Scholar 

  7. Kamps, J., Geva, S., Trotman, A., Woodley, A., Koolen, M.: Overview of the INEX 2008 ad hoc track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 1–28. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  8. Kaptein, R., Koolen, M., Kamps, J.: Using Wikipedia categories for ad hoc search. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, New York (2009)

    Google Scholar 

  9. Kazai, G., Milic-Frayling, N., Costello, J.: Towards methods for the collective gathering and quality control of relevance assessments. In: SIGIR 2009: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 452–459. ACM, New York (2009), doi: http://doi.acm.org/10.1145/1571941.1572019 ISBN 978-1-60558-483-6

    Google Scholar 

  10. Koolen, M., Kamps, J.: The importance of anchor-text for ad hoc search revisited. In: Chen, H.-H., Efthimiadis, E.N., Savoy, J., Crestani, F., Marchand-Maillet, S. (eds.) Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 122–129. ACM Press, New York (2010)

    Google Scholar 

  11. Koolen, M., Kaptein, R., Kamps, J.: Focused search in books and wikipedia: Categories, links and relevance feedback. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009. LNCS, vol. 6203, pp. 273–291. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Schenkel, R., Suchanek, F., Kasneci, G.: Yawn: A semantically annotated wikipedia xml corpus (2007), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.5501

  13. Sigurbjörnsson, B., Kamps, J.: The effect of structured queries and selective indexing on XML retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 104–118. Springer, Heidelberg (2006)

    Google Scholar 

  14. Sigurbjörnsson, B., Kamps, J., de Rijke, M.: An Element-Based Approach to XML Retrieval. In: INEX 2003 Workshop Proceedings, pp. 19–26 (2004)

    Google Scholar 

  15. Sigurbjörnsson, B., Kamps, J., de Rijke, M.: Mixture models, overlap, and structural hints in XML element retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 196–210. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: a language-model based search engine for complex queries. In: Proceedings of the International Conference on Intelligent Analysis (2005)

    Google Scholar 

  17. Vercoustre, A.-M., Pehcevski, J., Thom, J.A.: Using Wikipedia categories and links in entity ranking. In: Focused Access to XML Documents, pp. 321–335 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kamps, J., Koolen, M. (2011). Focus and Element Length for Book and Wikipedia Retrieval. In: Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds) Comparative Evaluation of Focused Retrieval. INEX 2010. Lecture Notes in Computer Science, vol 6932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23577-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23577-1_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23576-4

  • Online ISBN: 978-3-642-23577-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics