Abstract
This experiment tests a simple, scalable, and effective approach to building a domain-specific translation lexicon using distributional statistics over parallellized bilingual corpora. A bilingual lexicon is extracted from aligned Swedish-French data, used to translate CLEF topics from Swedish to French, which resulting French queries are then in turn used to retrieve documents from the French language CLEF collection. The results give 34 of fifty queries on or above median for the “precision at 1000 documents” recall oriented score; with many of the errors possible to handle by the use of string-matching and cognate search. We conclude that the approach presented here is a simple and efficient component in an automatic query translation system.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Karlgren, H.: Term-tuning, a method for the computer-aided revision of multi-lingual texts. International Forum for Information and Documentation 13, 7–13 (1988)
Melamed, D.: Models of translational equivalence among words. Computational Linguistics 26, 221–249 (2000)
Brown, P., Cocke, S., Della Pietra, V., Della Pietra, F., Jelinek, F., Mercer, R., Roossin, P.: A statistical approach to language translation. In: Proceedings of the 12th Annual Conference on Computational Linguistics (COLING 1988), International Committee on Computational Linguistics (1988)
Kanerva, P., Kristofersson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, Erlbaum, p. 1036 (2000)
Karlgren, J., Sahlgren, M.: From words to understanding. In: Uesaka, Y., Kanerva, P., Asoh, H. (eds.) Foundations of Real-World Intelligence, pp. 294–308. CSLI Publications, Stanford (2001)
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41, 391–407 (1990)
Landauer, T., Dumais, S.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104, 211–240 (1997)
Sahlgren, M.: Automatic bilingual lexicon acquisition using random indexing of aligned bilingual data. In: Proceedings of the fourth international conference on Language Resources and Evaluation, LREC 2004 (2004)
Sahlgren, M., Karlgren, J.: Automatic bilingual lexicon acquisition using random indexing of parallel corpora. Natural Language Engineering (forthcoming)
Koehn, P.: Europarl: A multilingual corpus for evaluation of machine translation (2002), http://people.csail.mit.edu/people/koehn/publications/europarl/
Sahlgren, M., Karlgren, J., Cöster, R., Järvinen, T.: Automatic query expansion using random indexing. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 311–320. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karlgren, J., Sahlgren, M., Järvinen, T., Cöster, R. (2005). Dynamic Lexica for Query Translation. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds) Multilingual Information Access for Text, Speech and Images. CLEF 2004. Lecture Notes in Computer Science, vol 3491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11519645_15
Download citation
DOI: https://doi.org/10.1007/11519645_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27420-9
Online ISBN: 978-3-540-32051-7
eBook Packages: Computer ScienceComputer Science (R0)