Skip to main content

Dictionary-Based CLIR Loses Highly Relevant Documents

  • Conference paper
Advances in Information Retrieval (ECIR 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3408))

Included in the following conference series:

  • 4745 Accesses

Abstract

Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this paper, we present evaluation results for dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. First, monolingual baseline queries were automatically formed from the topics. Secondly, source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish), using both structured and unstructured queries. Effectiveness of the translated queries was compared to that of the monolingual queries. CLIR performance was evaluated using three relevance criteria: stringent, regular, and liberal. When regular or liberal criteria were used, a reasonable performance was achieved. Adopting stringent criteria caused a considerable loss of performance, when compared to monolingual Finnish performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ballesteros, L., Croft, W.B.: Resolving Ambiguity for Cross-language Retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64–71. ACM Press, New York (1998)

    Chapter  Google Scholar 

  2. Broglio, J., Callan, J., Croft, W.B.: INQUERY system overview. In: Proceedings of the TIPSTER text program (Phase I). Morgan Kaufmann Publishers, San Francisco (1994)

    Google Scholar 

  3. CLEF Homepage. Available, http://clef.iei.pi.cnr.it

  4. Fujii, A., Ishikawa, T.: Cross-Language IR at University of Tsukuba: Automatic Transliteration for Japanese, English, and Korean. In: Working Notes of NTCIR-4, Tokyo, June 2-4 (2004), Available, http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/index.html

  5. Hedlund, T., Keskustalo, H., Pirkola, A., Sepponen, M., Järvelin, K.: Bilingual tests with Swedish, Finnish and German queries: dealing with morphology, compound words and query structure. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 210–223. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  6. Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41–48. ACM Press, New York (2000)

    Chapter  Google Scholar 

  7. Kekäläinen, J., Järvelin, K.: Using Graded Relevance Assessments in IR Evaluation. Journal of the American Society for Information Science and Technology 53(13), 1120–1129 (2002)

    Article  Google Scholar 

  8. Kishida, K., et al.: Overview of CLIR Task at the Fourth NTCIR Workshop. In: Working Notes of NTCIR-4, Tokyo, June 2-4 (2004), Available, http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/index.html

  9. Lee, S., et al.: Characteristics of the Korean Test Collection for CLIR in NTCIR-3. In: Working Notes of NTCIR-3, Tokyo, October 8-10 (2002), Available, http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings3/index.html

  10. NTCIR Homepage. Available, http://research.nii.ac.jp/ntcir/index-en.html

  11. Pirkola, A., Hedlund, T., Keskustalo, H., Järvelin, K.: Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings. Information Retrieval 4(3/4), 209–230 (2001)

    Article  MATH  Google Scholar 

  12. Pirkola, A., Keskustalo, H., Leppänen, E., Känsälä, A.-P., Järvelin, K.: Targeted s-Gram Matching: a Novel n-Gram Matching Technique for Cross- and Monolingual Word Form Variants. Information Research 7(2) (2002), Available, http://InformationR.net/ir/7-2/paper126.html

  13. Sormunen, E.: A Method for Measuring Wide Range Pefrormance of Boolean Queries in Full-Text Databases. Dissertation. Tampere, University of Tampere (2000)

    Google Scholar 

  14. Sormunen, E.: Liberal Relevance Criteria of TREC - Counting on Negligible Documents? In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 320–330. ACM Press, New York (2002)

    Google Scholar 

  15. TREC Homepage. Available, http://trec.nist.gov/

  16. Vorhees, E.: Evaluation by Highly Relevant Documents. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 74–82. ACM Press, New York (2001)

    Chapter  Google Scholar 

  17. Zhou, Y., Qin, J., Chau, M., Chen, H.: Experiments on Chinese-English Cross-language Retrieval at NTCIR-4. In: Working Notes of NTCIR-4, Tokyo, June 2-4 (2004), Available, http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/index.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lehtokangas, R., Keskustalo, H., Järvelin, K. (2005). Dictionary-Based CLIR Loses Highly Relevant Documents. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31865-1_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25295-5

  • Online ISBN: 978-3-540-31865-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics