Cunei: open-source machine translation with relevance-based models of each translation instance

Phillips, Aaron B.

doi:10.1007/s10590-011-9109-6

Cunei: open-source machine translation with relevance-based models of each translation instance

Published: 11 September 2011

Volume 25, pages 161–177, (2011)
Cite this article

Machine Translation

Aaron B. Phillips¹

168 Accesses
7 Citations
Explore all metrics

Abstract

The Cunei machine translation platform is an open-source system for data-driven machine translation. Our platform is a synthesis of the traditional example-based MT (EBMT) and statistical MT (SMT) paradigms. What makes Cunei unique is that it measures the relevance of each translation instance with a distance function. This distance function, represented as a log-linear model, operates over one translation instance at a time and enables us to score the translation instance relative to the specified input and/or the current target hypothesis. We describe how our system, Cunei, scores features individually for each translation instance and how it efficiently performs parameter tuning over the entire feature space. We also compare Cunei with three other open-source MT systems (Moses, CMU-EBMT, and Marclator). In our experiments involving Korean–English and Czech–English translation Cunei clearly outperforms the traditional EBMT and SMT systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bojar O., Žabokrtský Z (2009) CzEng 0.9: large parallel treebank with rich annotation. Prague Bull Math Linguist 92: 7–16
Google Scholar
Brown RD (1996) Example-based machine translation in the Pangloss system. In: Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark, pp 169–174
Brown RD (2004) A modified Burrows-Wheeler transform for highly scalable example-based translation. In: Frederking RE, Taylor K (eds) Machine translation: from real users to research, 6th conference of the Association for Machine Translation in the Americas. Washington, DC, pp 27–36
Callison-Burch C, Bannard C, Schroeder J (2005) Scaling phrase-based statistical machine translation to larger corpora and longer phrases. In: Proceedings of the 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, USA, pp 255–262
Chiang D, Marton Y, Resnik P (2008) Online large-margin training of syntactic and structural translation features. In: 2008 conference on Empirical Methods in Natural Language Processing, Honolulu, USA, pp 224–233
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Proceedings of the human language technology conference, San Diego, CA, pp 128–132
Green T (1979) The necessity of syntax markers: two experiments with artificial languages. J Verbal Learn Behav 18: 481–496
Article Google Scholar
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Machine Translation Summit X: Proceedings, Phuket, Thailand, pp 79–86
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational linguistics, Prague, Czech Republic, pp 177–180
Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Proceedings of the human language technology conference of the North American Chapter of the Association for Computational Linguistics, New York City, USA, pp 104–111
Lopez A (2008) Tera-scale translation models via pattern matching. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK, pp 505–512
Manber U, Myers G (1990) Suffix arrays: a new method for on-line string searches. In: SODA ’90: proceedings of the first annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 319–327
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 160–167
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Article Google Scholar
Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) Final report of Johns Hopkins 2003 summer workshop on syntax for statistical machine translation. Tech. Rep., Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, Philadelphia, USA, pp 311–318
Shen L, Xu J, Zhang B, Matsoukas S, Weischedel R (2009) Effective use of linguistic and contextual information for statistical machine translation. In: 2009 conference on Empirical Methods in Natural Language Processing, Suntec, Singapore, pp 72–80
Smith DA, Eisner J (2006) Minimum risk annealing for training log-linear models. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney, Australia, pp 787–794
Stolcke A (2002) SRILM - an extensible language modeling toolkit. In: 7th international conference on spoken language processing, Denver, USA, pp 901–904
Stroppa N, Way A (2006) MaTrEx: DCU machine translation system for IWSLT 2006. In: Proceedings of the International Workshop on Spoken Language Translation, Kyoto, Japan, pp 31–36
Vogel S (2005) PESA: Phrase pair extraction as sentence splitting. In: Machine Translation Summit X: Proceedings, Phuket, Thailand, pp 251–258
Yamamoto M, Church KW (2001) Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Comput. Linguist. 27(1):1–30. doi:10.1162/089120101300346787
Zhang Y, Vogel S (2005) An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora. In: Proceedings of the tenth annual conference of the European Association for Machine Translation, Budapest, Hungary, pp 294–301

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Aaron B. Phillips

Authors

Aaron B. Phillips
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aaron B. Phillips.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Phillips, A.B. Cunei: open-source machine translation with relevance-based models of each translation instance. Machine Translation 25, 161–177 (2011). https://doi.org/10.1007/s10590-011-9109-6

Download citation

Received: 14 October 2010
Accepted: 19 August 2011
Published: 11 September 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s10590-011-9109-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cunei: open-source machine translation with relevance-based models of each translation instance

Abstract

Access this article

Similar content being viewed by others

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

K-Translate - Interactive Multi-system Machine Translation

Online discriminative learning for machine translation with binary-valued feedback

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cunei: open-source machine translation with relevance-based models of each translation instance

Abstract

Access this article

Similar content being viewed by others

Machine Translation Performance Prediction System: Optimal Prediction for Optimal Translation

K-Translate - Interactive Multi-system Machine Translation

Online discriminative learning for machine translation with binary-valued feedback

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation