Skip to main content

Identifying Novel Information Using Latent Semantic Analysis in the WiQA Task at CLEF 2006

  • Conference paper
Evaluation of Multilingual and Multi-modal Information Retrieval (CLEF 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4730))

Included in the following conference series:

  • 534 Accesses

Abstract

In our two-stage system for the English monolingual WiQA Task, snippets were first retrieved if they contained an exact match with the title. Candidates were then passed to the Latent Semantic Analysis component which judged them Novel if their match with the article text was less than a threshold. In Run1, the ten best snippets were returned and in Run 2 the twenty best. Run 1 was superior, with Average Yield per Topic 2.46 and Precision 0.37. Compared to other groups, our performance was in the middle of the range except for Precision where our system was the best. We attribute this to our use of exact title matches in the IR stage. In future work we will vary the approach used depending on the topic type, exploit co-references in conjunction with exact matches and make use of the elaborate hyperlink structure which is a unique and most interesting aspect of the Wikipedia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wikipedia (2006), http://en.wikipedia.org

  2. Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum 40(1) (2006)

    Google Scholar 

  3. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  4. Foltz, P.W., Dumais, S.T.: Personalized information delivery: An analysis of information filtering methods. Communications of the Association for Computing Machinery 35, 51–60 (1992)

    Google Scholar 

  5. Littman, M.L., Dumais, S.T., Landauer, T.K.: Automatic cross-language information retrieval using Latent Semantic Indexing. In: Grefenstette, G. (ed.) Cross Language Information Retrieval, pp. 51–62. Kluwer Academic Publishers, Norwell, MA (1998)

    Google Scholar 

  6. Jones, M.P., Martin, J.H.: Contextual spelling correction using Latent Semantic Analysis. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP 1997), pp. 166–173 (1997)

    Google Scholar 

  7. Schone, P., Jurafsky, D.: Knowledge-free induction of morphology using Latent Semantic Analysis. In: Proceedings of the Fourth Conference on Computational Natural Language Learning (CoNLL-2000) and the Second Learning Language in Logic Workshop (LLL-2000), pp. 67–72 (2000)

    Google Scholar 

  8. Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.D.: Latent Semantic Analysis for Text Segmentation. In: Proceedings of EMNLP, Pittsburgh (2001)

    Google Scholar 

  9. Cederberg, S., Widdows, D.: Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In: Proceedings of the Seventh Conference on Computational Natural Language Learning (CoNLL 2003), pp. 111–118 (2003)

    Google Scholar 

  10. Steinberger, J., Kabadjov, M.A., Poesio, M., Sanchez-Graillet, O.: Improving LSA-based Summarization with Anaphora Resolution. In: Proceedings of Human Language Technology Conference / Conference on Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 1–8 (October 2005)

    Google Scholar 

  11. Buckeridge, A. M.: Latent Semantic Indexing as a Measure of Conceptual Association for the Unsupervised Resolution of Attachment Ambiguities. Ph.D. Thesis, University of Limerick (2005)

    Google Scholar 

  12. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proceedings of ACM SIGIR 2006 Workshop on Open Source Information Retrieval (OSIR 2006), Seattle, Washington, USA (August 10, 2006)

    Google Scholar 

  13. Terrier (2006), http://ir.dcs.gla.ac.uk/terrier/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Carol Peters Paul Clough Fredric C. Gey Jussi Karlgren Bernardo Magnini Douglas W. Oard Maarten de Rijke Maximilian Stempfhuber

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sutcliffe, R.F.E., Steinberger, J., Kruschwitz, U., Alexandrov-Kabadjov, M., Poesio, M. (2007). Identifying Novel Information Using Latent Semantic Analysis in the WiQA Task at CLEF 2006. In: Peters, C., et al. Evaluation of Multilingual and Multi-modal Information Retrieval. CLEF 2006. Lecture Notes in Computer Science, vol 4730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74999-8_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74999-8_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74998-1

  • Online ISBN: 978-3-540-74999-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics