Abstract
Our approach to cross-lingual document retrieval starts from the assumption that effective monolingual retrieval is at the core of any cross-language retrieval system. We devote particular attention to three crucial ingredients of our approach to cross-lingual retrieval. First, effective tokenization techniques are essential to cope with morphological variations common in many European languages. Second, effective combination methods allow us to combine the best of different strategies. Finally, effective translation methods for translating queries or documents turn a monolingual retrieval system into a cross-lingual retrieval system proper. The viability of our approach is shown by a series of experiments in monolingual, bilingual, and multilingual retrieval.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Peters, C., Braschler, M., Di Nunzio, G., Ferro, N.: CLEF 2004: Ad hoc track overview and results analysis. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 10–26. Springer, Heidelberg (2005)
Monz, C., de Rijke, M.: Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)
Kamps, J., Monz, C., de Rijke, M.: Combining evidence for cross-language information retrieval. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 111–126. Springer, Heidelberg (2003)
Kamps, J., Monz, C., de Rijke, M., Sigurbjörnsson, B.: Language-dependent and language-independent approaches to cross-lingual text retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 152–165. Springer, Heidelberg (2004)
Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual document retrieval for European languages. Information Retrieval 7, 33–52 (2004)
Snowball: Stemming algorithms for use in information retrieval (2004), http://www.snowball.tartarus.org/
Kamps, J., de Rijke, M.: The effectiveness of combining information retrieval strategies for European languages. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 1073–1077. ACM Press, New York (2004)
Worldlingo: Online translator (2004), http://www.worldlingo.com/
Koehn, P.: European parliament proceedings parallel corpus 1996-2003 (2004), http://people.csail.mit.edu/people/koehn/publications/europarl/
Buckley, C., Singhal, A., Mitra, M.: New retrieval approaches using SMART: TREC 4. In: The Fourth Text REtrieval Conference (TREC-4), National Institute for Standards and Technology, pp. 25–48. NIST Special Publication 500-236 (1996)
Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, Center for Telematics and Information Technology, University of Twente (2001)
Rocchio Jr., J.: Relevance feedback in information retrieval. In: The SMART Retrieval System, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)
Efron, B.: Bootstrap methods: Another look at the jackknife. Annals of Statistics 7, 1–26 (1979)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993)
CLEF-Neuchâtel: CLEF resources at the University of Neuchâtel (2004), http://www.unine.ch/info/clef
Lee, J.: Combining multiple evidence from different properties of weighting schemes. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 180–188. ACM Press, New York (1995)
Fox, E., Shaw, J.: Combination of multiple searches. In: The Second Text REtrieval Conference (TREC-2), National Institute for Standards and Technology, pp. 243–252. NIST Special Publication 500-215 (1994)
Nega, A.: Development of Stemming Algorithm for Amharic Text Retrieval. PhD thesis, University of Sheffield (1999)
European Union: Official Journal of the European Union (2004), http://europa.eu.int/eur-lex/
Mediascape: English-Finnish-English on-line dictionary (2004), http://efe.scape.net/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kamps, J., Adafre, S.F., de Rijke, M. (2005). Effective Translation, Tokenization and Combination for Cross-Lingual Retrieval. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds) Multilingual Information Access for Text, Speech and Images. CLEF 2004. Lecture Notes in Computer Science, vol 3491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11519645_12
Download citation
DOI: https://doi.org/10.1007/11519645_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27420-9
Online ISBN: 978-3-540-32051-7
eBook Packages: Computer ScienceComputer Science (R0)