Skip to main content

Wikipedia Ad Hoc Passage Retrieval and Wikipedia Document Linking

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4862))

Abstract

Ad hoc passage retrieval within the Wikipedia is examined in the context of INEX 2007. An analysis of the INEX 2006 assessments suggests that fixed sized window of about 300 terms is consistently seen and that this might be a good retrieval strategy. In runs submitted to INEX, potentially relevant documents were identified using BM25 (trained on INEX 2006 data). For each potentially relevant document the location of every search term was identified and the center (mean) located. A fixed sized window was then centered on this location. A method of removing outliers was examined in which all terms occurring outside one standard deviation of the center were considered outliers and the center recomputed without them. Both techniques were examined with and without stemming.

For Wikipedia linking we identified terms within the document that were over-represented and from the top few generated queries of different lengths. A BM25 ranking search engine was used to identify potentially relevant documents. Links from the source document to the potentially relevant documents (and back) were constructed (at a granularity of whole document). The best performing run used the 4 most over-represented search terms to retrieve 200 documents, and the next 4 to retrieve 50 more.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Holland, J.H.: Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor (1975)

    Google Scholar 

  2. Huang, W., Trotman, A., O’Keefe, R.A.: Element retrieval using a passage retrieval approach. Australian Journal of Intelligent Information Processing Systems (AJIIPS) 9(2), 80–83 (2006)

    Google Scholar 

  3. Huang, W.C., Trotman, A., Geva, S.: Collaborative knowledge management: Evaluation of automated link discovery in the Wikipedia. In: Proceedings of the SIGIR 2007 Workshop on Focused Retrieval, pp. 9–16 (2007)

    Google Scholar 

  4. Kamps, J., Koolen, M.: On the relation between relevant passages and XML document structure. In: Proceedings of the SIGIR 2007 Workshop on Focused Retrieval, pp. 28–32 (2007)

    Google Scholar 

  5. Kamps, J., Koolen, M., Lalmas, M.: Where to start reading a textual XML document? In: Proceedings of the 30th ACM SIGIR Conference on Information Retrieval (2007)

    Google Scholar 

  6. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  7. Robertson, S.E., Walker, S., Beaulieu, M.M., Gatford, M., Payne, A.: Okapi at TREC-4. In: Proceedings of the 4th Text REtrieval Conference (TREC-4), pp. 73–96 (1995)

    Google Scholar 

  8. Shatkay, H., Wilbur, W.J.: Finding themes in medline documents probabilistic similarity search. In: Proceedings of the Advances in Digital Libraries, pp. 183–192 (2000)

    Google Scholar 

  9. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th ACM SIGIR Conference on Information Retrieval, pp. 21–29 (1996)

    Google Scholar 

  10. Tombros, A., Larsen, B., Malik, S.: The interactive track at INEX 2004. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 410–423. Springer, Heidelberg (2005)

    Google Scholar 

  11. Trotman, A., Geva, S., Kamps, J.: Proceedings of the SIGIR 2007 workshop on focused retrieval (2007)

    Google Scholar 

  12. Trotman, A., Pharo, N., Jenkinson, D.: Can we at least agree on something? In: Proceedings of the SIGIR 2007 Workshop on Focused Retrieval, pp. 49–56 (2007)

    Google Scholar 

  13. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. Transactions on Information Systems 22(2), 179–214 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Norbert Fuhr Jaap Kamps Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jenkinson, D., Trotman, A. (2008). Wikipedia Ad Hoc Passage Retrieval and Wikipedia Document Linking. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds) Focused Access to XML Documents. INEX 2007. Lecture Notes in Computer Science, vol 4862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85902-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85902-4_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85901-7

  • Online ISBN: 978-3-540-85902-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics