Crosslanguage Retrieval Based on Wikipedia Statistics

Juffinger, Andreas; Kern, Roman; Granitzer, Michael

doi:10.1007/978-3-642-04447-2_19

Andreas Juffinger²⁴,
Roman Kern²⁴ &
Michael Granitzer²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5706))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

Abstract

In this paper we present the methodology, implementations and evaluation results of the crosslanguage retrieval system we have developed for the Robust WSD Task at CLEF 2008. Our system is based on query preprocessing for translation and homogenisation of queries. The presented preprocessing of queries includes two stages: Firstly, a query translation step based on term statistics of cooccuring articles in Wikipedia. Secondly, different disjunct query composition techniques to search in the CLEF corpus. We apply the same preprocessing steps for the monolingual as well as the crosslingual task and thereby acting fair and in a similar way across these tasks. The evaluation revealed that the similar processing comes at nearly no costs for monolingual retrieval but enables us to do crosslanguage retrieval and also a feasible comparison of our system performance on these two tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Experiments on Cross-Language Information Retrieval Using Comparable Corpora of Chinese, Japanese, and Korean Languages

The CLEF Monolingual Grid of Points

Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles

References

Agiree, E., de Lacall, O.L.: UBC-ALM: Combining k-NN with SVD for WSD. In: Proc. of the 4th Int. Workshop on Semantic Evaluations, pp. 341–345 (2007)
Google Scholar
Agirre, E., Giorgio, M., Di Nunzio, Ferro, N., Mandl, T., Peters, C.: Clef 2008: Ad hoc track overview (2008)
Google Scholar
Chang, Y., Ng, H.T., Zhong, Z.: NUS-PT: Exploiting parallel texts for word sense disambiguation in the english all-words tasks. In: Proc. of the 4th Int. Workshop on Semantic Evaluations (2007)
Google Scholar
Juffinger, A., Kern, R., Granitzer, M.: Exploiting cooccurrence on corpus and document level for fair crosslanguage retrieval. In: Working Notes for the CLEF 2008 Workshop, Aarhus, Denmark, September 17-19 (2008)
Google Scholar
Anderka, M., Potthast, M., Stein, B.: A wikipedia-based multilingual retrieval model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008)
Chapter Google Scholar
Miller, G.: Wordnet: A lexical database for english. Comm. ACM (1995)
Google Scholar
Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: Proc. of the 13th ACM international conference on Information and knowledge management (2004)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management (1988)
Google Scholar

Download references

Author information

Authors and Affiliations

Know-Center, Graz, Austria
Andreas Juffinger, Roman Kern & Michael Granitzer

Authors

Andreas Juffinger
View author publications
You can also search for this author in PubMed Google Scholar
Roman Kern
View author publications
You can also search for this author in PubMed Google Scholar
Michael Granitzer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Istituto di Scienza e Tecnologie dell’Informazione, CNR, Pisa, Italy
Carol Peters
RWTH Aachen University, Aachen, Germany
Thomas Deselaers
University of Padua, Padua, Italy
Nicola Ferro
LSI-UNED, Madrid, Spain
Julio Gonzalo & Anselmo Peñas &
Dublin City University, Dublin 9, Ireland
Gareth J. F. Jones
Helsinki University of Technology, Espoo, Finland
Mikko Kurimo
University of Hildesheim, Hildesheim, Germany
Thomas Mandl
Humboldt University Berlin, Germany
Vivien Petras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Juffinger, A., Kern, R., Granitzer, M. (2009). Crosslanguage Retrieval Based on Wikipedia Statistics. In: Peters, C., et al. Evaluating Systems for Multilingual and Multimodal Information Access. CLEF 2008. Lecture Notes in Computer Science, vol 5706. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04447-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-04447-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04446-5
Online ISBN: 978-3-642-04447-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics