Abstract
In our two-stage system for the English monolingual WiQA Task, snippets were first retrieved if they contained an exact match with the title. Candidates were then passed to the Latent Semantic Analysis component which judged them Novel if their match with the article text was less than a threshold. In Run1, the ten best snippets were returned and in Run 2 the twenty best. Run 1 was superior, with Average Yield per Topic 2.46 and Precision 0.37. Compared to other groups, our performance was in the middle of the range except for Precision where our system was the best. We attribute this to our use of exact title matches in the IR stage. In future work we will vary the approach used depending on the topic type, exploit co-references in conjunction with exact matches and make use of the elaborate hyperlink structure which is a unique and most interesting aspect of the Wikipedia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wikipedia (2006), http://en.wikipedia.org
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum 40(1) (2006)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Foltz, P.W., Dumais, S.T.: Personalized information delivery: An analysis of information filtering methods. Communications of the Association for Computing Machinery 35, 51–60 (1992)
Littman, M.L., Dumais, S.T., Landauer, T.K.: Automatic cross-language information retrieval using Latent Semantic Indexing. In: Grefenstette, G. (ed.) Cross Language Information Retrieval, pp. 51–62. Kluwer Academic Publishers, Norwell, MA (1998)
Jones, M.P., Martin, J.H.: Contextual spelling correction using Latent Semantic Analysis. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP 1997), pp. 166–173 (1997)
Schone, P., Jurafsky, D.: Knowledge-free induction of morphology using Latent Semantic Analysis. In: Proceedings of the Fourth Conference on Computational Natural Language Learning (CoNLL-2000) and the Second Learning Language in Logic Workshop (LLL-2000), pp. 67–72 (2000)
Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.D.: Latent Semantic Analysis for Text Segmentation. In: Proceedings of EMNLP, Pittsburgh (2001)
Cederberg, S., Widdows, D.: Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In: Proceedings of the Seventh Conference on Computational Natural Language Learning (CoNLL 2003), pp. 111–118 (2003)
Steinberger, J., Kabadjov, M.A., Poesio, M., Sanchez-Graillet, O.: Improving LSA-based Summarization with Anaphora Resolution. In: Proceedings of Human Language Technology Conference / Conference on Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 1–8 (October 2005)
Buckeridge, A. M.: Latent Semantic Indexing as a Measure of Conceptual Association for the Unsupervised Resolution of Attachment Ambiguities. Ph.D. Thesis, University of Limerick (2005)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proceedings of ACM SIGIR 2006 Workshop on Open Source Information Retrieval (OSIR 2006), Seattle, Washington, USA (August 10, 2006)
Terrier (2006), http://ir.dcs.gla.ac.uk/terrier/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sutcliffe, R.F.E., Steinberger, J., Kruschwitz, U., Alexandrov-Kabadjov, M., Poesio, M. (2007). Identifying Novel Information Using Latent Semantic Analysis in the WiQA Task at CLEF 2006. In: Peters, C., et al. Evaluation of Multilingual and Multi-modal Information Retrieval. CLEF 2006. Lecture Notes in Computer Science, vol 4730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74999-8_66
Download citation
DOI: https://doi.org/10.1007/978-3-540-74999-8_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74998-1
Online ISBN: 978-3-540-74999-8
eBook Packages: Computer ScienceComputer Science (R0)