Skip to main content
Log in

Cunei: open-source machine translation with relevance-based models of each translation instance

  • Published:
Machine Translation

Abstract

The Cunei machine translation platform is an open-source system for data-driven machine translation. Our platform is a synthesis of the traditional example-based MT (EBMT) and statistical MT (SMT) paradigms. What makes Cunei unique is that it measures the relevance of each translation instance with a distance function. This distance function, represented as a log-linear model, operates over one translation instance at a time and enables us to score the translation instance relative to the specified input and/or the current target hypothesis. We describe how our system, Cunei, scores features individually for each translation instance and how it efficiently performs parameter tuning over the entire feature space. We also compare Cunei with three other open-source MT systems (Moses, CMU-EBMT, and Marclator). In our experiments involving Korean–English and Czech–English translation Cunei clearly outperforms the traditional EBMT and SMT systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bojar O., Žabokrtský Z (2009) CzEng 0.9: large parallel treebank with rich annotation. Prague Bull Math Linguist 92: 7–16

    Google Scholar 

  • Brown RD (1996) Example-based machine translation in the Pangloss system. In: Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, Denmark, pp 169–174

  • Brown RD (2004) A modified Burrows-Wheeler transform for highly scalable example-based translation. In: Frederking RE, Taylor K (eds) Machine translation: from real users to research, 6th conference of the Association for Machine Translation in the Americas. Washington, DC, pp 27–36

  • Callison-Burch C, Bannard C, Schroeder J (2005) Scaling phrase-based statistical machine translation to larger corpora and longer phrases. In: Proceedings of the 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, USA, pp 255–262

  • Chiang D, Marton Y, Resnik P (2008) Online large-margin training of syntactic and structural translation features. In: 2008 conference on Empirical Methods in Natural Language Processing, Honolulu, USA, pp 224–233

  • Doddington G (2002) Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Proceedings of the human language technology conference, San Diego, CA, pp 128–132

  • Green T (1979) The necessity of syntax markers: two experiments with artificial languages. J Verbal Learn Behav 18: 481–496

    Article  Google Scholar 

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Machine Translation Summit X: Proceedings, Phuket, Thailand, pp 79–86

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational linguistics, Prague, Czech Republic, pp 177–180

  • Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Proceedings of the human language technology conference of the North American Chapter of the Association for Computational Linguistics, New York City, USA, pp 104–111

  • Lopez A (2008) Tera-scale translation models via pattern matching. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK, pp 505–512

  • Manber U, Myers G (1990) Suffix arrays: a new method for on-line string searches. In: SODA ’90: proceedings of the first annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 319–327

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 160–167

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51

    Article  Google Scholar 

  • Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) Final report of Johns Hopkins 2003 summer workshop on syntax for statistical machine translation. Tech. Rep., Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, Philadelphia, USA, pp 311–318

  • Shen L, Xu J, Zhang B, Matsoukas S, Weischedel R (2009) Effective use of linguistic and contextual information for statistical machine translation. In: 2009 conference on Empirical Methods in Natural Language Processing, Suntec, Singapore, pp 72–80

  • Smith DA, Eisner J (2006) Minimum risk annealing for training log-linear models. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney, Australia, pp 787–794

  • Stolcke A (2002) SRILM - an extensible language modeling toolkit. In: 7th international conference on spoken language processing, Denver, USA, pp 901–904

  • Stroppa N, Way A (2006) MaTrEx: DCU machine translation system for IWSLT 2006. In: Proceedings of the International Workshop on Spoken Language Translation, Kyoto, Japan, pp 31–36

  • Vogel S (2005) PESA: Phrase pair extraction as sentence splitting. In: Machine Translation Summit X: Proceedings, Phuket, Thailand, pp 251–258

  • Yamamoto M, Church KW (2001) Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Comput. Linguist. 27(1):1–30. doi:10.1162/089120101300346787

  • Zhang Y, Vogel S (2005) An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora. In: Proceedings of the tenth annual conference of the European Association for Machine Translation, Budapest, Hungary, pp 294–301

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aaron B. Phillips.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Phillips, A.B. Cunei: open-source machine translation with relevance-based models of each translation instance. Machine Translation 25, 161–177 (2011). https://doi.org/10.1007/s10590-011-9109-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-011-9109-6

Keywords

Navigation