Abstract
In this study the basic framework and performance analysis results are presented for the three year long development process of the dictionary-based UTACLIR system. The tests expand from bilingual CLIR for three language pairs Swedish, Finnish and German to English, to six language pairs, from English to French, German, Spanish, Italian, Dutch and Finnish, and from bilingual to multilingual. In addition, transitive translation tests are reported. The development process of the UTACLIR query translation system will be regarded from the point of view of a learning process. The contribution of the individual components, the effectiveness of compound handling, proper name matching and structuring of queries are analyzed. The results and the fault analysis have been valuable in the development process. Overall the results indicate that the process is robust and can be extended to other languages. The individual effects of the different components are in general positive. However, performance also depends on the topic set and the number of compounds and proper names in the topic, and to some extent on the source and target language. The dictionaries used affect the performance significantly.
Article PDF
Similar content being viewed by others
References
Airio E, Keskustalo H, Hedlund T and Pirkola A (2002) UTACLIR @ CLEF 2002: Towards a unified translation process model. In: Peters C, Ed. Working Notes for the CLEF 2002 Workshop, 19-20 September, Rome, Italy 2002, pp. 51-58. http://clef.iei.pi.cnr.it/ (accessed March 8th, 2003)
Ballesteros L (2000) Cross language retrieval via transitive translation. In: Croft WB, Ed. Advances in information retrieval: Recent research from the CIIR, Kluwer Academic Publishers, Boston, pp. 203–234.
Ballesteros L and Croft B (1997) Phrasal translation and query expansion techniques for cross-language information retrieval. In: Proceedings of the 20th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, USA, July 27-31, 1997, pp. 84-91.
Davis M and Ogden WC (1997) QUILT: Implementing a large-scale cross-language text retrieval system. In: Proceedings of the 20th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, USA, July 27-31, pp. 92-98.
Gachot D, Lange E and Yang J (1998) In: Grefenstette G, Ed. Cross-Language Information Retrieval, Kluwer Academic Publishers, Boston, pp. 105–118.
Gollins T and Sanderson M (2001a) Improving cross language information retrieval with triangulated translation. In: Proceedings of the 24th ACM/SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, USA, September 9-13, pp. 90-95.
Gollins T and Sanderson M (2001b) Sheffield University CLEF 2000 submission-bilingual track: German to English. In: Peters C, Ed. cross-language information retrieval and evaluation: Proceedings of the CLEF 2000 Workshop, Lecture Notes in Computer Science 2069, Springer-Verlag, Berlin, pp. 245–252.
Hedlund T (2002) Compounds in dictionary-based cross-language information retrieval. Information Research, 7(2). http://InformationR.net/ir/7-2/paper128.html. (accessed March 8th, 2003).
HedlundT,Keskustalo H, AirioEand PirkolaA(2002a)UTACLIR-anextendable query translation system. Paper presented at the ACM SIGIR Workshop for Cross-Language Information Retrieval, August 15th in Tampere, Finland.
Hedlund T, Keskustalo H, Pirkola A, Airio E and Järvelin K (2002b) UTACLIR @ CLEF 2001-Effects of compound splitting and n-gram techniques. In Peters C, Braschler M, Gonzalo J and Kluck M, Eds. Evaluation of Cross-Language Information Retrieval Systems. Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001. Lecture Notes in Computer Science 2406, Springer-Verlag, Berlin, pp. 118–136.
Hedlund T, Pirkola A and Järvelin K (2001a) Aspects of Swedish morphology and semantics from the perspective of mono-and cross-language information retrieval. Information Processing and Management, 37(1): 147–161.
Hedlund T, Keskustalo H, Pirkola A, Sepponen M and Järvelin K (2001b) Bilingual tests with Swedish, Finnish and German queries: Dealing with morphology, compound words and query structuring. In: Peters C, Ed. Cross-Language Information Retrieval and Evaluation: Proceedings of the CLEF 2000 Workshop, Lecture Notes in Computer Science 2069, Springer-Verlag, Berlin, pp. 211–225.
Hiemstra D, Kraaij W, Pohlmann R and Westerveld T (2001) Translation resources, merging strategies, and relevance feedback for cross-language information retrieval. In: Peters C, Ed. Cross-Language Information Retrieval and Evaluation: Proceedings of the CLEF 2000Workshop, Lecture Notes in Computer Science 2069. Springer-Verlag, Berlin, pp. 102–115.
Hull D and Grefenstette G (1996) Querying across languages: A dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Zürich, Switzerland, August 18-22. pp. 49-57.
Keskustalo H, Hedlund T and Airio E (2002) UTACLIR-General query translation framework for several language pairs. Demoposter In: Proceedings of the 25th Annual InternationalACM/SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, August 11-15th, pp. 448. Demonstration.
Malmgren SG (1994) Svensk lexikologi. Ord, ordbildning, ordböcker och orddatabaser. [Swedish lexicology. Words, word formation, dictionaries and word databases.] Lund: Studentlitteratur.
Mitra M, BuckleyC, SinghalAand CardieC(1997) An analysis of statistical and syntactic phrases. In: Proceedings of RIAO'97, Computer Assisted Information Searching on the Internet, Montreal, Canada, pp. 200-214.
Nie J-Y and Jin F (2002) Merging different languages in a single document collection. In Peters C, Ed. Working Notes for the CLEF 2002 Workshop, September 19-20th, Rome, Italy, pp. 59-62. http://clef.iei.pi.cnr.it/ (accessed March 8th, 2003)
Oard D and Diekema A (1998) Cross-language information retrieval. Annual Review of Information Science and Technology (ARIST), 33:223–256.
Pfeifer U, Poersch T and Fuhr N (1996) Retrieval effectiveness of proper name search methods. Information Processing & Management, 32:667–679.
Peters C, (2001), Ed. Cross-language information retrieval and evaluation. Proceedings of the CLEF 2000 Workshop, Lecture Notes in Computer Science 2069, Springer-Verlag, Berlin.
Peters C, Braschler M, Gonzalo J, Kluck M, eds. (2002) Evaluation of cross-language information retrieval systems. Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001. Lecture Notes in Computer Science 2406, Springer-Verlag, Berlin.
Pirkola A (1998) The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proceedings of the 21st Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 24-28th, pp. 55-63.
PirkolaA(1999) Studies on linguistic problems and methods in text retrieval. Ph.D. Thesis, University of Tampere. Acta Universitatis Tamperensis 672.
Pirkola A, Hedlund T, Keskustalo H and Järvelin K (2001) Dictionary-based cross-language information retrieval: problems, methods, and research findings. Information Retrieval, 4(3/4):209–230.
Pirkola A, Puolamäki D and Järvelin K (2003) Applying query structuring in cross-language retrieval. Information Processing & Management, 39:391–402.
Robertson AM and Willett P (1998) Applications of n-grams in textual information systems. Journal of Documentation, 54(1):48–69.
Sheridan P, Ballerini JP and Schäuble P (1998) Building a large multilingual test collection from comparable news documents. In Grefenstette G, Ed. Cross-Language Information Retrieval, Kluwer Academic Publishers, Boston, pp. 137–150.
Sperer R and Oard DW (2000). Structure translation for cross-language IR. In Proceedings of the 23rd Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 24-28, pp.120-127.
Warren B (1978) Semantic patterns of noun-noun compounds. Göteborg: Acta Universitatis Gothoburgensis. (Gothenburg studies in English 41).
Zoebel J and Dart P (1995) Finding approximate matches in large lexicons. Software-practice and experience, 25(3):331–345.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hedlund, T., Airio, E., Keskustalo, H. et al. Dictionary-Based Cross-Language Information Retrieval: Learning Experiences from CLEF 2000–2002. Information Retrieval 7, 99–119 (2004). https://doi.org/10.1023/B:INRT.0000009442.34054.55
Issue Date:
DOI: https://doi.org/10.1023/B:INRT.0000009442.34054.55