Skip to main content

Effective Translation, Tokenization and Combination for Cross-Lingual Retrieval

  • Conference paper
Multilingual Information Access for Text, Speech and Images (CLEF 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3491))

Included in the following conference series:

  • 639 Accesses

Abstract

Our approach to cross-lingual document retrieval starts from the assumption that effective monolingual retrieval is at the core of any cross-language retrieval system. We devote particular attention to three crucial ingredients of our approach to cross-lingual retrieval. First, effective tokenization techniques are essential to cope with morphological variations common in many European languages. Second, effective combination methods allow us to combine the best of different strategies. Finally, effective translation methods for translating queries or documents turn a monolingual retrieval system into a cross-lingual retrieval system proper. The viability of our approach is shown by a series of experiments in monolingual, bilingual, and multilingual retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Peters, C., Braschler, M., Di Nunzio, G., Ferro, N.: CLEF 2004: Ad hoc track overview and results analysis. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 10–26. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Monz, C., de Rijke, M.: Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. Kamps, J., Monz, C., de Rijke, M.: Combining evidence for cross-language information retrieval. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 111–126. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Kamps, J., Monz, C., de Rijke, M., Sigurbjörnsson, B.: Language-dependent and language-independent approaches to cross-lingual text retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 152–165. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  5. Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual document retrieval for European languages. Information Retrieval 7, 33–52 (2004)

    Article  Google Scholar 

  6. Snowball: Stemming algorithms for use in information retrieval (2004), http://www.snowball.tartarus.org/

  7. Kamps, J., de Rijke, M.: The effectiveness of combining information retrieval strategies for European languages. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 1073–1077. ACM Press, New York (2004)

    Chapter  Google Scholar 

  8. Worldlingo: Online translator (2004), http://www.worldlingo.com/

  9. Koehn, P.: European parliament proceedings parallel corpus 1996-2003 (2004), http://people.csail.mit.edu/people/koehn/publications/europarl/

  10. Buckley, C., Singhal, A., Mitra, M.: New retrieval approaches using SMART: TREC 4. In: The Fourth Text REtrieval Conference (TREC-4), National Institute for Standards and Technology, pp. 25–48. NIST Special Publication 500-236 (1996)

    Google Scholar 

  11. Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, Center for Telematics and Information Technology, University of Twente (2001)

    Google Scholar 

  12. Rocchio Jr., J.: Relevance feedback in information retrieval. In: The SMART Retrieval System, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)

    Google Scholar 

  13. Efron, B.: Bootstrap methods: Another look at the jackknife. Annals of Statistics 7, 1–26 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  14. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993)

    MATH  Google Scholar 

  15. CLEF-Neuchâtel: CLEF resources at the University of Neuchâtel (2004), http://www.unine.ch/info/clef

  16. Lee, J.: Combining multiple evidence from different properties of weighting schemes. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 180–188. ACM Press, New York (1995)

    Chapter  Google Scholar 

  17. Fox, E., Shaw, J.: Combination of multiple searches. In: The Second Text REtrieval Conference (TREC-2), National Institute for Standards and Technology, pp. 243–252. NIST Special Publication 500-215 (1994)

    Google Scholar 

  18. Nega, A.: Development of Stemming Algorithm for Amharic Text Retrieval. PhD thesis, University of Sheffield (1999)

    Google Scholar 

  19. European Union: Official Journal of the European Union (2004), http://europa.eu.int/eur-lex/

  20. Mediascape: English-Finnish-English on-line dictionary (2004), http://efe.scape.net/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kamps, J., Adafre, S.F., de Rijke, M. (2005). Effective Translation, Tokenization and Combination for Cross-Lingual Retrieval. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds) Multilingual Information Access for Text, Speech and Images. CLEF 2004. Lecture Notes in Computer Science, vol 3491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11519645_12

Download citation

  • DOI: https://doi.org/10.1007/11519645_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27420-9

  • Online ISBN: 978-3-540-32051-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics