Identifying Novel Information Using Latent Semantic Analysis in the WiQA Task at CLEF 2006

Sutcliffe, Richard F. E.; Steinberger, Josef; Kruschwitz, Udo; Alexandrov-Kabadjov, Mijail; Poesio, Massimo

doi:10.1007/978-3-540-74999-8_66

Richard F. E. Sutcliffe¹,
Josef Steinberger²,
Udo Kruschwitz³,
Mijail Alexandrov-Kabadjov³ &
…
Massimo Poesio³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4730))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

534 Accesses

Abstract

In our two-stage system for the English monolingual WiQA Task, snippets were first retrieved if they contained an exact match with the title. Candidates were then passed to the Latent Semantic Analysis component which judged them Novel if their match with the article text was less than a threshold. In Run1, the ten best snippets were returned and in Run 2 the twenty best. Run 1 was superior, with Average Yield per Topic 2.46 and Precision 0.37. Compared to other groups, our performance was in the middle of the range except for Precision where our system was the best. We attribute this to our use of exact title matches in the IR stage. In future work we will vary the approach used depending on the topic type, exploit co-references in conjunction with exact matches and make use of the elaborate hyperlink structure which is a unique and most interesting aspect of the Wikipedia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wikipedia (2006), http://en.wikipedia.org
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum 40(1) (2006)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Article Google Scholar
Foltz, P.W., Dumais, S.T.: Personalized information delivery: An analysis of information filtering methods. Communications of the Association for Computing Machinery 35, 51–60 (1992)
Google Scholar
Littman, M.L., Dumais, S.T., Landauer, T.K.: Automatic cross-language information retrieval using Latent Semantic Indexing. In: Grefenstette, G. (ed.) Cross Language Information Retrieval, pp. 51–62. Kluwer Academic Publishers, Norwell, MA (1998)
Google Scholar
Jones, M.P., Martin, J.H.: Contextual spelling correction using Latent Semantic Analysis. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP 1997), pp. 166–173 (1997)
Google Scholar
Schone, P., Jurafsky, D.: Knowledge-free induction of morphology using Latent Semantic Analysis. In: Proceedings of the Fourth Conference on Computational Natural Language Learning (CoNLL-2000) and the Second Learning Language in Logic Workshop (LLL-2000), pp. 67–72 (2000)
Google Scholar
Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.D.: Latent Semantic Analysis for Text Segmentation. In: Proceedings of EMNLP, Pittsburgh (2001)
Google Scholar
Cederberg, S., Widdows, D.: Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In: Proceedings of the Seventh Conference on Computational Natural Language Learning (CoNLL 2003), pp. 111–118 (2003)
Google Scholar
Steinberger, J., Kabadjov, M.A., Poesio, M., Sanchez-Graillet, O.: Improving LSA-based Summarization with Anaphora Resolution. In: Proceedings of Human Language Technology Conference / Conference on Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 1–8 (October 2005)
Google Scholar
Buckeridge, A. M.: Latent Semantic Indexing as a Measure of Conceptual Association for the Unsupervised Resolution of Attachment Ambiguities. Ph.D. Thesis, University of Limerick (2005)
Google Scholar
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proceedings of ACM SIGIR 2006 Workshop on Open Source Information Retrieval (OSIR 2006), Seattle, Washington, USA (August 10, 2006)
Google Scholar
Terrier (2006), http://ir.dcs.gla.ac.uk/terrier/

Download references

Author information

Authors and Affiliations

Documents and Linguistic Technology Group, Department of Computer Science, and Information Systems, University of Limerick, Limerick, Ireland
Richard F. E. Sutcliffe
Department of Computer Science and Engineering, University of West Bohemia, Univerzitni 8, 306 14 Plzen, Czech Republic
Josef Steinberger
Department of Computer Science, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
Udo Kruschwitz, Mijail Alexandrov-Kabadjov & Massimo Poesio

Authors

Richard F. E. Sutcliffe
View author publications
You can also search for this author in PubMed Google Scholar
Josef Steinberger
View author publications
You can also search for this author in PubMed Google Scholar
Udo Kruschwitz
View author publications
You can also search for this author in PubMed Google Scholar
Mijail Alexandrov-Kabadjov
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Poesio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Carol Peters Paul Clough Fredric C. Gey Jussi Karlgren Bernardo Magnini Douglas W. Oard Maarten de Rijke Maximilian Stempfhuber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sutcliffe, R.F.E., Steinberger, J., Kruschwitz, U., Alexandrov-Kabadjov, M., Poesio, M. (2007). Identifying Novel Information Using Latent Semantic Analysis in the WiQA Task at CLEF 2006. In: Peters, C., et al. Evaluation of Multilingual and Multi-modal Information Retrieval. CLEF 2006. Lecture Notes in Computer Science, vol 4730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74999-8_66

Download citation

DOI: https://doi.org/10.1007/978-3-540-74999-8_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74998-1
Online ISBN: 978-3-540-74999-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics