Abstract
This paper outlines a strategy to build new bilingual dictionaries from existing resources. The method is based on two main tasks: first, a new set of bilingual correspondences is generated from two available bilingual dictionaries. Second, the generated correspondences are validated by making use of a bilingual lexicon automatically extracted from non-parallel, and comparable corpora. The quality of the entries of the derived dictionary is very high, similar to that of hand-crafted dictionaries. We report a case study where a new, non noisy, English-Galician dictionary with about 12,000 correct bilingual correspondences was automatically generated.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ahn, K., Frampotn, M.: Automataic generation of translation dictionaries using intermediary languages. In: Cross-Language Knowledge Induction Workshop of EACL 2006, Trento, Italy, pp. 41–44 (2006)
Armentano-Oller, C., Carrasco, R.C., Corb-Bellot, A.M., Forcada, M.L., Ginest-Rosell, M., Ortiz-Rojas, S., Prez-Ortiz, J.A., Ramrez-Snchez, G., Snchez-Martnez, F., Scalco, M.A.: Open-source portuguese-spanish machine translation. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 50–59. Springer, Heidelberg (2006)
Carreras, X., Chao, I., Padró, L., Padró, M.: An open-source suite of language analyzers. In: 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal (2004)
Chiao, Y.-C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: 19th COLING 2002 (2002)
Curran, J.R., Moens, M.: Improvements in automatic thesaurus extraction. In: ACL Workshop on Unsupervised Lexical Acquisition, Philadelphia, pp. 59–66 (2002)
Fung, P., McKeown, K.: Finding terminology translation from non-parallel corpora. In: 5th Annual Workshop on Very Large Corpora, Hong Kong, pp. 192–202 (1997)
Fung, P., Yee, L.Y.: An IR approach for translating new words from nonparallel, comparable texts. In: Coling 1998, Montreal, Canada, pp. 414–420 (1998)
Gamallo, P.: Learning bilingual lexicons from comparable english and spanish corpora. In: Machine Translation SUMMIT XI, Copenhagen, Denmark (2007)
Gamallo, P., Pichel, J.-R.: Learning spanish-galician translation equivalents using a comparable corpus and a bilingual dictionary. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 413–423. Springer, Heidelberg (2008)
Nerima, L., Wehrli, E.: Generating bilingual dictionaries by transitivity. In: LREC 2008, pp. 2584–2587 (2008)
Paik, K., Shirai, S., Nakaiwa, H.: Automatic construction of a transfer dictionary considering directionality. In: COLING 2004 Multilingual Linguistic Resources Workshop, Geneva, pp. 25–32 (2004)
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: ACL 1999, pp. 519–526 (1999)
Saralegui, X., San Vicente, I., Gurrutxaga, A.: Automatic generation of bilingual lexicons from comparable corpora in a popular science domain. In: LREC 2008 Workshop on Building and Using Comparable Corpora (2008)
Shao, L., Ng, H.T.: Mining new word translations from comparable corpora. In: 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 618–624 (2004)
Wehrli, E., Nerma, L., Scherrer, Y.: Deep linguistic multilingual translation and bilingual dictionaries. In: Foruth Workshop on Statistical Machine Translation, Athens, Greece, pp. 90–94 (2009)
Zhang, Y., Ma, Q., Isahara, H.: Building japanese-chinese translation dictionary based on EDR japanese-english bilingual dictionary. In: MT Summit XI, Copenhagen, pp. 551–557 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gamallo Otero, P., Pichel Campos, J.R. (2010). Automatic Generation of Bilingual Dictionaries Using Intermediary Languages and Comparable Corpora . In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-12116-6_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)