Abstract
Word Translation Disambiguation is the task of selecting the best translation(s) for a source word in a certain context, given a set of translation candidates. Most approaches to this problem rely on large word-aligned parallel corpora, resources that are scarce and expensive to build. In contrast, the method presented in this paper requires only large monolingual corpora to build vector space models encoding sentence-level contexts of translation candidates as feature vectors in high-dimensional word space. Experimental evaluation shows positive contributions of the models to overall quality in German-English translation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tambouratzis, G., Sofianopoulos, S., Vassiliou, M., Simistira, F., Tsimboukakis, N.: A resource-light phrase scheme for language-portable MT. In: Forcada, M.L., Depraetere, H., Vandeghinste, V. (eds.) Proceedings of the 15th Conference of the European Association for Machine Translation, Leuven, Belgium, pp. 185–192 (2011)
Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Carpuat, M., Wu, D.: Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 61–72. ACL, Prague (2007)
van Gompel, M.: UvT-WSD1: A cross-lingual word sense disambiguation system. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 238–241. ACL (2010)
Vilariño Ayala, D., Balderas Posada, C., Pinto Avendaño, D.E., Rodríguez Hernández, M., León Silverio, S.: FCC: Modeling probabilities with GIZA++ for Task 2 and 3 of SemEval-2. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 5th International Workshop on Semantic Evaluation, pp. 112–116. ACL, Uppsala (2010)
Silberer, C., Ponzetto, S.P.: UHD: Cross-lingual word sense disambiguation using multilingual co-occurrence graphs. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 5th International Workshop on Semantic Evaluation, pp. 134–137. ACL, Uppsala (2010)
Koehn, P., Knight, K.: Estimating word translation probabilities from unrelated monolingual corpora using the em algorithm. In: National Conference on Artificial Intelligence (AAAI 2000), Langkilde, pp. 711–715 (2000)
Koehn, P., Knight, K.: Knowledge sources for word-level translation models. In: Lee, L., Harman, D. (eds.) Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pp. 27–35. ACL, Pittsburgh (2001)
Monz, C., Dorr, B.J.: Iterative translation disambiguation for cross-language information retrieval. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 520–527. ACM, New York (2005)
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37thAnnual Meeting of the Association for Computational Linguistics, ACL 1999, pp. 519–526. ACL, College Park (1999)
Pomikálek, J., Rychlỳ, P., Kilgarriff, A.: Scaling to billion-plus word corpora. Advances in Computational Linguistics 41, 3–13 (2009)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK, vol. 12, pp. 44–49 (1994)
Rychlý, P.: Manatee/Bonito – a modular corpus manager. In: 1st Workshop on Recent Advances in Slavonic Natural Language Processing, Brno, Masaryk University, pp. 65–70 (2007)
Marsi, E., Lynum, A., Bungum, L., Gambäck, B.: Word translation disambiguation without parallel texts. In: International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011), Barcelona, Spain, pp. 66–74 (2011)
Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Gupta, A., Shmueli, O., Widom, J. (eds.) Proceedings of 24rd International Conference on Very Large Data Bases, VLDB 1998, August 24-27, pp. 194–205. Morgan Kaufmann, New York City (1998)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145. Morgan Kaufmann Publishers Inc. (2002)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL, pp. 311–318 (2002)
Cohen, T., Schvaneveldt, R., Widdows, D.: Reflective random indexing and indirect inference: A scalable method for discovery of implicit connections. Journal of Biomedical Informatics 43(2), 240–256 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lynum, A., Marsi, E., Bungum, L., Gambäck, B. (2012). Disambiguating Word Translations with Target Language Models. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-32790-2_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32789-6
Online ISBN: 978-3-642-32790-2
eBook Packages: Computer ScienceComputer Science (R0)