Experiments with dictionary-based CLIR using graded relevance assessments: Improving effectiveness by pseudo-relevance feedback

Lehtokangas, Raija; Keskustalo, Heikki; Järvelin, Kalervo

doi:10.1007/s10791-006-6389-1

Experiments with dictionary-based CLIR using graded relevance assessments: Improving effectiveness by pseudo-relevance feedback

Published: September 2006

Volume 9, pages 421–433, (2006)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Experiments with dictionary-based CLIR using graded relevance assessments: Improving effectiveness by pseudo-relevance feedback

Download PDF

Raija Lehtokangas¹,
Heikki Keskustalo¹ &
Kalervo Järvelin¹

158 Accesses
3 Citations
Explore all metrics

Abstract

Research on cross-language information retrieval (CLIR) has typically been restricted to settings using binary relevance assessments. In this paper, we present evaluation results for dictionary-based CLIR using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. First, monolingual baseline queries were automatically formed from the topics. Secondly, source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish), using structured target queries. The effectiveness of the translated queries was compared to that of the monolingual queries. Thirdly, pseudo-relevance feedback was used to expand the original target queries. CLIR performance was evaluated using three relevance thresholds: stringent, regular, and liberal. When regular or liberal threshold was used, a reasonable performance was achieved. Using stringent threshold, equally high performance could not be achieved. On all the relevance thresholds the performance of the translated queries was successfully raised by pseudo-relevance feedback based query expansion. However, the performance of the stringent threshold in relation to the other thresholds could not be raised by this method.

Article PDF

Term Selection for Query Expansion in Medical Cross-Lingual Information Retrieval

An axiomatic approach to corpus-based cross-language information retrieval

Article 09 April 2020

Term Conflation and Blind Relevance Feedback for Information Retrieval on Indian Languages

References

Ballesteros, L., & Croft W. B. (1998). Resolving ambiguity for cross-language retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 64–71). New York, NY: ACM Press.
Google Scholar
Broglio, J., Callan, J., & Croft, W. B. (1994). INQUERY system overview. In: Proceedings of the TIPSTER text program (Phase I), San Francisco, CA: Morgan Kaufmann Publishers.
Google Scholar
Efthimiadis, E. N. (1996). Query expansion. In: Williams, M. E. (Ed.), Annual review of information science and technology, vol. 31 (ARIST 31) (pp. 121–187). Medford, NJ: Information Today, Inc.
Google Scholar
Fujii, A., & Ishikawa, T. (2004). Cross-Language IR at University of Tsukuba: Automatic Transliteration for Japanese, English, and Korean. In: Working Notes of NTCIR-4, Tokyo, 2–4 June, 2004. Available: http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/index.html.
Harman, D. K. (1992). Relevance feedback revisited. In: Proceedings of the 15th ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1–10). New York: ACM Press.
Google Scholar
Hedlund, T., Keskustalo, H., Pirkola, A., Sepponen, M., & Järvelin, K. (2001). Bilingual tests with Swedish, Finnish and German queries: dealing with morphology, compound words and query structure. In: Peters, C. (Ed.), Cross-language information retrieval and evaluation. Proceedings of the CLEF 2000 workshop, Lecture Notes in Computer Science, 2069 (pp. 210–223). Berlin: Springer.
Google Scholar
Järvelin, K., & Kekäläinen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 41–48). New York, NY: ACM Press.
Google Scholar
Kekäläinen, J. (1999). The effects of query complexity, expansion and structure on retrieval performance in probabilistic text retrieval. Tampere, Finland: University of Tampere, Department of Information Studies. Ph.D. Thesis. Acta Universitatis Tamperensis 678. Available: http://www.info.uta.fi/tutkimus/fire/archive/QCES.pdf.
Kekäläinen, J., & Järvelin, K. (2002). Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology, 53(13), 1120–1129.
Article Google Scholar
Kishida, K., Chen, K., Lee, S., Kuriyama, K., Kando, N., Chen, H. H. et al. (2004). Overview of CLIR Task at the Fourth NTCIR Workshop. In: Working Notes of NTCIR-4, Tokyo, 2–4 June, 2004. Available: http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/index.html.
Lee, S., Myaeng, S. H., Kim, H., Seo, J. H., Lee, B., & Cho, S. (2002). Characteristics of the Korean test collection for CLIR in NTCIR-3. In: Working Notes of NTCIR-3, Tokyo, October 8–10, 2002. Available: http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings3/index.html.
Lehtokangas, R., Keskustalo, H., & Järvelin, K. (2005). Dictionary-based CLIR Loses Highly Relevant Documents. In: Losada, D., & Fernandez-Luna, J. (Eds.), Advances in information retrieval, Proceedings of the 27th European Conference on IR Research, ECIR 2005 (pp. 421–432). Santiago de Compostela, Spain. Lecture Notes in Computer Science, 3408. Berlin: Springer.
Google Scholar
McNamee, P., & Mayfield, J. (2002). Comparing cross-language query expansion techniques by degrading translation resources. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 159–166). New York, NY: ACM Press.
Google Scholar
Mitra, M., Singhal, A., & Buckley, C. (1998). Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 206–214). New York, NY: ACM Press.
Google Scholar
Pirkola, A., Keskustalo, H., Leppänen, E., Känsälä, A. P., & Järvelin, K. (2002a). Targeted s-gram matching: A novel n-gram matching technique for cross- and monolingual word form variants. Information Research, 7(2). Available: http://informationr.net/ir/7-2/infres72.html.
Pirkola, A., Leppänen, E., & Järvelin, K. (2002b). The RATF formula (Kwok’s formula): exploiting average term frequency in cross-language retrieval. Information Research, 7(2). Available: http://informationr.net/ir/7-2/infres72.html.
Ruthven, I., & Lalmas, M. (2003). A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(2), 95–145.
Article Google Scholar
Sormunen, E. (2000). A method for measuring wide range performance of boolean queries in full-text databases. Tampere, Finland: University of Tampere, Department of Information Studies. Ph.D. Thesis. Available: http://acta.uta.fi/pdf/951-44-4732-8.pdf.
Sormunen, E. (2002). Liberal relevance criteria of TREC—counting on negligible documents? In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 320–330). New York, NY: ACM Press.
Google Scholar
Voorhees, E. (2001). Evaluation by highly relevant documents. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 74–82). New York, NY: ACM Press.
Google Scholar
Xu, J., & Croft, W. B. (1996). Query expansion using local and global document analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 4–11). New York, NY: ACM Press.
Google Scholar
Zhou, Y., Qin, J., Chau, M., & Chen, H. (2004). Experiments on Chinese-English cross-language retrieval at NTCIR-4. In: Working Notes of NTCIR-4, Tokyo, 2–4 June, 2004. Available: http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/index.html.

Download references

Author information

Authors and Affiliations

Department of Information Studies, University of Tampere, FIN-33014, Tampere, Finland
Raija Lehtokangas, Heikki Keskustalo & Kalervo Järvelin

Authors

Raija Lehtokangas
View author publications
You can also search for this author in PubMed Google Scholar
Heikki Keskustalo
View author publications
You can also search for this author in PubMed Google Scholar
Kalervo Järvelin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raija Lehtokangas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lehtokangas, R., Keskustalo, H. & Järvelin, K. Experiments with dictionary-based CLIR using graded relevance assessments: Improving effectiveness by pseudo-relevance feedback. Inf Retrieval 9, 421–433 (2006). https://doi.org/10.1007/s10791-006-6389-1

Download citation

Received: 01 July 2005
Revised: 15 November 2005
Accepted: 29 November 2005
Issue Date: September 2006
DOI: https://doi.org/10.1007/s10791-006-6389-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Experiments with dictionary-based CLIR using graded relevance assessments: Improving effectiveness by pseudo-relevance feedback

Abstract

Article PDF

Similar content being viewed by others

Term Selection for Query Expansion in Medical Cross-Lingual Information Retrieval

An axiomatic approach to corpus-based cross-language information retrieval

Term Conflation and Blind Relevance Feedback for Information Retrieval on Indian Languages

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Experiments with dictionary-based CLIR using graded relevance assessments: Improving effectiveness by pseudo-relevance feedback

Abstract

Article PDF

Similar content being viewed by others

Term Selection for Query Expansion in Medical Cross-Lingual Information Retrieval

An axiomatic approach to corpus-based cross-language information retrieval

Term Conflation and Blind Relevance Feedback for Information Retrieval on Indian Languages

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation