Skip to main content
Log in

The CMU-EBMT machine translation system

  • Published:
Machine Translation

Abstract

This paper presents an in-depth description of the features of the open-source CMU-EBMT example-based machine translation system. CMU-EBMT is a complete end-to-end system including lexicon induction, word and phrase alignment, corpus indexing and lookup, language model, decoder, and parameter tuning components. While it does not require them, it can take advantage of external alignment information and other annotations provided by GIZA++ and other systems. To illustrate a recent addition to CMU-EBMT, experiments are presented which show an improvement of 0.16 BLEU points (0.9% relative) on a cross-validated small-data English–Haitian translation task when using a new set of fine-grained log-linear feature values representing language model match lengths in addition to language model probabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Al-Onaizan Y, Curin J, Jahr M, Knight K, Lafferty J, Melamed I, Och FJ, Purdy D, Smith NA, Yarowsky D (1999) Statistical machine translation: final report. In: Proceedings of the summer workshop on language engineering. John Hopkins University Center for Language and Speech Processing

  • Bertoldi N, Haddow B, Fouet JB (2009) Improved minimum error rate training in Moses. Prague Bull Math Linguist, pp 1–11

  • Brants T, Franz A (2006) Web 1T 5-gram Version 1

  • Brown RD (1996) Example-based machine translation in the Pangloss system. In: Proceedings of the sixteenth international conference on computational linguistics, Copenhagen, Denmark, pp 169–174. http://www.aclweb.org/anthology/C/C96/C96-1030.pdf

  • Brown RD (1997) Automated dictionary extraction for “knowledge-free” example-based translation. In: Proceedings of the seventh international conference on theoretical and methodological issues in machine translation (TMI-97), Santa Fe, New Mexico, pp 111–118. http://www.cs.cmu.edu/~ralf/papers.html

  • Brown RD (2000) Automated generalization of translation examples. In: Proceedings of the eighteenth international conference on computational linguistics (COLING-2000), pp 125–131.http://www.aclweb.org/anthology/C00-1019

  • Brown RD (2001) Transfer-rule induction for example-based translation. In: Proceedings of the workshop on example-based machine translation. http://www.cs.-cmu.edu~ralf/papers.html

  • Brown RD (2003) Clustered transfer rule induction for example-based translation. In: Recent advances in example-based machine translation, text, speech and language technology, chap. 10. Kluwer Academic Publishers, Dordrecht, pp 287–306

  • Brown RD (2004) A modified Burrows–Wheeler transform for highly-scalable example-based translation. In: Machine translation: from real users to research, Proceedings of the 6th conference of the Association for Machine Translation in the Americas (AMTA-2004), Lecture Notes in Artificial Intelligence, vol 3265. Springer Verlag, pp 27–36. http://www.cs.cmu.edu/~ralf/papers.html

  • Brown RD (2005) Context-sensitive retrieval for example-based machine translation. In: Proceedings of workshop: example-based machine translation, the tenth machine translation summit, pp 12–16. http://www.cs.cmu.edu/~ralf/papers.html

  • Brown RD (2008) Exploiting document-level context for data-driven machine translation. In: Proceedings of the eighth conference of the Association for Machine Translation in the Americas (AMTA-2008). http://www.amtaweb.org/papers/-2.02_Brown.pdf

  • Brown RD (2010) Taming structured perceptrons on wild feature vectors. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 384–391. http://www.aclweb.org/anthology/W10-1758

  • Brown R, Frederking R (1995) Applying statistical English language modeling to symbolic machine translation. In: Proceedings of the sixth international conference on theoretical and methodological issues in machine translation (TMI-95), pp 221–239

  • Brown RD, Hutchinson R, Bennett PN, Carbonell JG, Jansen P (2003) Reducing boundary friction using translation-fragment overlap. In: Proceedings of the ninth machine translation summit, pp 24–31. http://www.cs.cmu.edu~ralf/papers.html

  • Burrows M, Wheeler D (1994) A block-sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation

  • Carnegie Mellon University: public release of Haitian-Creole language data (2010). http://www.speech.cs.cmu.edu/haitian/text

  • Frederking R (1994) Statistical language models for symbolic MT. In: Language engineering on the information highway workshop, Santorini, Greece

  • Frederking R, Nirenburg S (1994) Three heads are better than one. In: Proceedings of the fourth conference on applied natural language processing (ANLP-94), Stuttgart, Germany

  • Frederking R, Nirenburg S, Farwell D, Helmreich S, Hovy E, Knight K, Beale S, Domashnev C, Attardo D, Grannes D, Brown R (1994) Integrating translations from multiple sources within the Pangloss mark III machine translation. In: Proceedings of the first conference of the Association for Machine Translation in the Americas. Association for Machine Translation in the Americas, Columbia, Maryland, pp 73–80

  • Gangadharaiah R, Brown R, Carbonell J (2006) Spectral clustering for example based machine translation. In: Proceedings of the human language technology conference of the NAACL, companion volume: short papers. Association for Computational Linguistics, pp 41–44. http://www.aclweb.org/anthology/N06-2011

  • Gangadharaiah R, Brown RD, Carbonell JG (2010) Monolingual distributional profiles for word substitution in machine translation. In: Proceedings of the 23rd international conference on computation linguistics (COLING-2010). http://www.cs.cmu.edu/~rgangadh/rashmi_coling10.pdf

  • Gimpel K, Smith NA (2008) Rich source-side context for statistical machine translation. In: Proceedings of the third workshop on statistical machine translation, pp 9–17

  • Graff D, Kong J, Chen K, Maeda K (2007) English gigaword, 3rd edn. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T07

  • Hutchinson R, Bennett PN, Carbonell J, Jansen P, Brown R (2003) Maximal lattice overlap in example-based machine translation. Tech. Rep. CMU-CS-03-138. Computer Science Department, Carnegie Mellon University

  • Kim JD, Brown RD, Jansen PJ, Carbonell JG (2005) Symmetric probabilistic alignment for example-based translation. In: Proceedings of the tenth workshop of the European Assocation for Machine Translation (EAMT-05)

  • Kim JD, Brown RD, Carbonell JG (2010) Chunk-Based EBMT. In: Proceedings of the 14th workshop of the European Association for Machine Translation (EAMT-2010)

  • Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation. In: Machine translation: from real users to research, proceedings of the 6th conference of the Association for Machine Translation in the Americas (AMTA-2004), Lecture Notes in Artificial Intelligence, vol 3265. Springer Verlag

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics (ACL 2007), pp 177–180. (Demonstration session). http://www.aclweb.org/anthology/P07-2045.

  • Lin CY, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd meeting of the Association for Computational Linguistics (ACL’04), main volume, Barcelona, Spain, pp 605–612. doi:10.3115/1218955.1219032. http://www.aclweb.org/anthology/P04-1077

  • Lopez A (2007) Hierarchical phrase-based translation with suffix arrays. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, pp 976–985. http://www.aclweb.org/anthology/D/D07/D07-1104.pdf

  • Melamed ID (1997) A word-to-word model of translational equivalence. In: Proceedings of the 35th annual meeting of the Association for Computational Linguistics (ACL’97), pp 490–497. doi:10.3115/976909.979680. http://www.aclweb.org/anthology/P97-1063

  • Nagao M (1984) A framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence. North-Holland, pp 173–180

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st meeting of the Association for Computational Linguistics (ACL-2003). Association for Computational Linguistics, Sapporo, Japan, pp 160–167. doi:10.3115/1075096.1075117. http://www.aclweb.org/anthology/P03-1021

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics. doi:10.3115/1073083.1073135. http://www.aclweb.org/anthology/P02-1040

  • Phillips AB (2007) Sub-phrasal matching and structural templates in example-based MT. In: Proceedings of the 11th conference on theoretical and methodological issues in machine translation (TMI-07). http://www.dustoftheground.net/techne-/research/publications.php

  • Phillips AB (2010) The Cunei machine translation platform for WMT ’10. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics, MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 149–154. http://www.aclweb.org/anthology/W10-1721

  • Phillips AB (2011) Personal communication

  • Somers H (1999) Example-based machine translation. Mach Transl 14(2): 113–158

    Article  MathSciNet  Google Scholar 

  • Stolcke A (2002) Srilm—an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing, pp 901–904

  • Vogel S (2005) PESA: phrase pair extraction as sentence splitting. In: Proceedings of the tenth machine translation summit (MT Summit X)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ralf D. Brown.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brown, R.D. The CMU-EBMT machine translation system. Machine Translation 25, 179–195 (2011). https://doi.org/10.1007/s10590-011-9095-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-011-9095-8

Keywords

Navigation