Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-01-20T22:34:11.328Z Has data issue: false hasContentIssue false

End-to-end statistical machine translation with zero or small parallel texts

Published online by Cambridge University Press:  15 June 2016

ANN IRVINE
Affiliation:
Johns Hopkins University e-mail: annirvine@gmail.com
CHRIS CALLISON-BURCH
Affiliation:
University of Pennsylvania e-mail: ccb@cis.upenn.edu

Abstract

We use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various signals of translation equivalence (like contextual similarity, temporal similarity, orthographic similarity and topic similarity). Our discriminative model produces higher accuracy translations than previous bilingual lexicon induction techniques. We reuse these signals of translation equivalence as features on a phrase-based SMT system. These monolingually estimated features enhance low resource SMT systems in addition to allowing end-to-end machine translation without parallel corpora.

Type
Articles
Copyright
Copyright © Cambridge University Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This material is based on research sponsored by DARPA under contract HR0011-09-1-0044 and by the Johns Hopkins University Human Language Technology Center of Excellence. The views and conclusions contained in this publication are those of the authors and should not be interpreted as representing official policies or endorsements of DARPA or the U.S. Government. We would like to thank David Yarowsky for his tremendous support, and for his inspiring work on – and continued ideas about – learning translations from monolingual texts. We would like to thank Alex Klementiev for his substantial contributions to this research and his comments on a draft of this article. We would like to thank Manaal Faruqui and Sneha Jha for providing the reference translations for the two Hindi paragraphs. Thank you to the two anonymous reviewers who provided valuable feedback on the first draft of this manuscript.

References

Alfonseca, E., Ciaramita, M. and Hall, K. 2009. Gazpacho and summer rash: lexical relationships from temporal patterns of web search queries. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Computational Linguistics 16 (2): 7985, June.Google Scholar
Brown, P. F., Della Pietra, V. J., Della Pietra, S. A., and Mercer, R. L. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19 (2): 263311, June.Google Scholar
Cherry, C. and Foster, G. 2012. Batch tuning strategies for statistical machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Chu, C., Nakazawa, T. and Kurohashi, S. 2014. Iterative bilingual lexicon extraction from comparable corpora with topical and contextual knowledge. In Gelbukh, A. (ed.), Computational Linguistics and Intelligent Text Processing, pp. 296–309. Lecture Notes in Computer Science, vol. 8404. Berlin, Heidelberg: Springer.Google Scholar
Church, K. W. and Gale, W. A. 1995. Poisson mixtures. Natural Language Engineering 1 (2): 163–90.Google Scholar
Church, K. W. and Gale, W. A. 1999. Inverse document frequency (IDF): a measure of deviations from Poisson. In Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., and Yarowsky, D. (eds.), Natural Language Processing Using Very Large Corpora, pp. 283–95. Text, Speech and Language Technology, vol. 11. Netherlands: Springer.Google Scholar
Church, K. W. and Hovy, E. H. 1993. Good applications for crummy machine translation. Machine Translation 8 (4): 239–58.Google Scholar
Clark, J. H., Dyer, C., Lavie, A. and Smith, N. A. 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Daumé, H. and Jagarlamudi, J. 2011. Domain adaptation for machine translation by mining unseen words. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Dou, Q. and Knight, K. 2013. Dependency-based decipherment for resource-limited machine translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, October, Association for Computational Linguistics, pp. 1668–76.Google Scholar
Dou, Q., Vaswani, A. and Knight, K. 2014. Beyond parallel data: joint word alignment and decipherment improves machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, October, Association for Computational Linguistics, pp. 557–65.Google Scholar
Fung, P. 1995. Compiling bilingual lexicon entries from a non-parallel english-chinese corpus. In Proceedings of the Workshop on Very Large Corpora, Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Fung, P. and Yee, L. Y. 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Hermjakob, U., Knight, K. and Daumé, H. III 2008. Name translation in statistical machine translation - learning when to transliterate. In Proceedings of ACL-08: HLT, Columbus, Ohio, June, Association for Computational Linguistics, pp. 389–97.Google Scholar
Irvine, A. 2014. Using Comparable Corpora to Augment Low Resource SMT Models. PhD Thesis, Johns Hopkins University, Department of Computer Science, Baltimore, Maryland.Google Scholar
Irvine, A. and Callison-Burch, C. 2013a. Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the Workshop on Statistical Machine Translation (WMT), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Irvine, A. and Callison-Burch, C. 2013b. Supervised bilingual lexicon induction with multiple monolingual signals. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Irvine, A. and Callison-Burch, C. In submission. A Comprehensive Analysis of Bilingual Lexicon Induction.Google Scholar
Irvine, A., Callison-Burch, C., and Klementiev, A. 2010. Transliterating from all languages. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Klementiev, A., Irvine, A., Callison-Burch, C., and Yarowsky, D. 2012. Toward statistical machine translation without parallel corpora. In Proceedings of the Conference of the European Association for Computational Linguistics (EACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Klementiev, A. and Roth, D. 2006. Weakly supervised named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the Conference of the Association for Computational Linguistics (ACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Koehn, P. and Knight, K. 2002. Learning a translation lexicon from monolingual corpora. In ACL Workshop on Unsupervised Lexical Acquisition, Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Li, H., Kumaran, A., Pervouchine, V. and Zhang, M. 2009. Report of news 2009 machine transliteration shared task. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009), Suntec, Singapore, August, Association for Computational Linguistics, pp. 1–18.Google Scholar
Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Pavlick, E., Post, M., Irvine, A., Kachaev, D., and Callison-Burch, C. 2014. The language demographics of Amazon Mechanical Turk. Transactions of the Association for Computational Linguistics (TACL), 2 (Feb): 7992.Google Scholar
Peirsman, Y. and Padó, S. 2010. Cross-lingual induction of selectional preferences with bilingual vector spaces. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, California, June, Association for Computational Linguistics, pp. 921–29.Google Scholar
Pekar, V., Mitkov, R., Blagoev, D., and Mulloni, A. 2006. Finding translations for low-frequency words in comparable corpora. Machine Translation, 20 (4): 247266.Google Scholar
Pierrehumbert, J. B. 2012. Burstiness of verbs and derived nouns. In Santos, D., Lindén, K., and Nganga, W. (eds.), Shall We Play the Festschrift Game?, pp. 99115. Berlin Heidelberg: Springer.Google Scholar
Post, M., Callison-Burch, C., and Osborne, M. 2012. Constructing parallel corpora for six Indian languages via crowdsourcing. In Proceedings of the Workshop on Statistical Machine Translation (WMT), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Rapp, R. 1995. Identifying word translations in non-parallel texts. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Rapp, R. 1999. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Ravi, S. and Knight, K. 2011. Deciphering foreign language. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, June, Association for Computational Linguistics, pp. 12–21.Google Scholar
Schafer, C. and Yarowsky, D. 2002. Inducing translation lexicons via diverse similarity measures and bridge languages. In Proceedings of the Conference on Natural Language Learning (CoNLL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar
Turney, P. D. and Pantel, P. 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research (JAIR) 37 (1): 141–88.Google Scholar
Virga, P. and Khudanpur, S. 2003. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition, Sapporo, Japan, July, Association for Computational Linguistics, pp. 57–64.Google Scholar
Vulić, I., De Smet, W., and Moens, M.-F. 2011. Identifying word translations from comparable corpora using latent topic models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, June, Association for Computational Linguistics, pp. 479–84.Google Scholar
Vulić, I. and Moens, M.-F. 2013. A study on bootstrapping bilingual vector spaces from non-parallel data (and nothing else). In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, October, Association for Computational Linguistics, pp. 1613–24.Google Scholar