Abstract
Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this paper, we present evaluation results for dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. First, monolingual baseline queries were automatically formed from the topics. Secondly, source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish), using both structured and unstructured queries. Effectiveness of the translated queries was compared to that of the monolingual queries. CLIR performance was evaluated using three relevance criteria: stringent, regular, and liberal. When regular or liberal criteria were used, a reasonable performance was achieved. Adopting stringent criteria caused a considerable loss of performance, when compared to monolingual Finnish performance.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ballesteros, L., Croft, W.B.: Resolving Ambiguity for Cross-language Retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64–71. ACM Press, New York (1998)
Broglio, J., Callan, J., Croft, W.B.: INQUERY system overview. In: Proceedings of the TIPSTER text program (Phase I). Morgan Kaufmann Publishers, San Francisco (1994)
CLEF Homepage. Available, http://clef.iei.pi.cnr.it
Fujii, A., Ishikawa, T.: Cross-Language IR at University of Tsukuba: Automatic Transliteration for Japanese, English, and Korean. In: Working Notes of NTCIR-4, Tokyo, June 2-4 (2004), Available, http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/index.html
Hedlund, T., Keskustalo, H., Pirkola, A., Sepponen, M., Järvelin, K.: Bilingual tests with Swedish, Finnish and German queries: dealing with morphology, compound words and query structure. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 210–223. Springer, Heidelberg (2001)
Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41–48. ACM Press, New York (2000)
Kekäläinen, J., Järvelin, K.: Using Graded Relevance Assessments in IR Evaluation. Journal of the American Society for Information Science and Technology 53(13), 1120–1129 (2002)
Kishida, K., et al.: Overview of CLIR Task at the Fourth NTCIR Workshop. In: Working Notes of NTCIR-4, Tokyo, June 2-4 (2004), Available, http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/index.html
Lee, S., et al.: Characteristics of the Korean Test Collection for CLIR in NTCIR-3. In: Working Notes of NTCIR-3, Tokyo, October 8-10 (2002), Available, http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings3/index.html
NTCIR Homepage. Available, http://research.nii.ac.jp/ntcir/index-en.html
Pirkola, A., Hedlund, T., Keskustalo, H., Järvelin, K.: Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings. Information Retrieval 4(3/4), 209–230 (2001)
Pirkola, A., Keskustalo, H., Leppänen, E., Känsälä, A.-P., Järvelin, K.: Targeted s-Gram Matching: a Novel n-Gram Matching Technique for Cross- and Monolingual Word Form Variants. Information Research 7(2) (2002), Available, http://InformationR.net/ir/7-2/paper126.html
Sormunen, E.: A Method for Measuring Wide Range Pefrormance of Boolean Queries in Full-Text Databases. Dissertation. Tampere, University of Tampere (2000)
Sormunen, E.: Liberal Relevance Criteria of TREC - Counting on Negligible Documents? In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 320–330. ACM Press, New York (2002)
TREC Homepage. Available, http://trec.nist.gov/
Vorhees, E.: Evaluation by Highly Relevant Documents. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 74–82. ACM Press, New York (2001)
Zhou, Y., Qin, J., Chau, M., Chen, H.: Experiments on Chinese-English Cross-language Retrieval at NTCIR-4. In: Working Notes of NTCIR-4, Tokyo, June 2-4 (2004), Available, http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/index.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lehtokangas, R., Keskustalo, H., Järvelin, K. (2005). Dictionary-Based CLIR Loses Highly Relevant Documents. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-31865-1_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)