The CMU-EBMT machine translation system

Brown, Ralf D.

doi:10.1007/s10590-011-9095-8

The CMU-EBMT machine translation system

Published: 27 July 2011

Volume 25, pages 179–195, (2011)
Cite this article

Machine Translation

Ralf D. Brown¹

267 Accesses
6 Citations
Explore all metrics

Abstract

This paper presents an in-depth description of the features of the open-source CMU-EBMT example-based machine translation system. CMU-EBMT is a complete end-to-end system including lexicon induction, word and phrase alignment, corpus indexing and lookup, language model, decoder, and parameter tuning components. While it does not require them, it can take advantage of external alignment information and other annotations provided by GIZA++ and other systems. To illustrate a recent addition to CMU-EBMT, experiments are presented which show an improvement of 0.16 BLEU points (0.9% relative) on a cross-validated small-data English–Haitian translation task when using a new set of fine-grained log-linear feature values representing language model match lengths in addition to language model probabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Al-Onaizan Y, Curin J, Jahr M, Knight K, Lafferty J, Melamed I, Och FJ, Purdy D, Smith NA, Yarowsky D (1999) Statistical machine translation: final report. In: Proceedings of the summer workshop on language engineering. John Hopkins University Center for Language and Speech Processing
Bertoldi N, Haddow B, Fouet JB (2009) Improved minimum error rate training in Moses. Prague Bull Math Linguist, pp 1–11
Brants T, Franz A (2006) Web 1T 5-gram Version 1
Brown RD (1996) Example-based machine translation in the Pangloss system. In: Proceedings of the sixteenth international conference on computational linguistics, Copenhagen, Denmark, pp 169–174. http://www.aclweb.org/anthology/C/C96/C96-1030.pdf
Brown RD (1997) Automated dictionary extraction for “knowledge-free” example-based translation. In: Proceedings of the seventh international conference on theoretical and methodological issues in machine translation (TMI-97), Santa Fe, New Mexico, pp 111–118. http://www.cs.cmu.edu/~ralf/papers.html
Brown RD (2000) Automated generalization of translation examples. In: Proceedings of the eighteenth international conference on computational linguistics (COLING-2000), pp 125–131.http://www.aclweb.org/anthology/C00-1019
Brown RD (2001) Transfer-rule induction for example-based translation. In: Proceedings of the workshop on example-based machine translation. http://www.cs.-cmu.edu~ralf/papers.html
Brown RD (2003) Clustered transfer rule induction for example-based translation. In: Recent advances in example-based machine translation, text, speech and language technology, chap. 10. Kluwer Academic Publishers, Dordrecht, pp 287–306
Brown RD (2004) A modified Burrows–Wheeler transform for highly-scalable example-based translation. In: Machine translation: from real users to research, Proceedings of the 6th conference of the Association for Machine Translation in the Americas (AMTA-2004), Lecture Notes in Artificial Intelligence, vol 3265. Springer Verlag, pp 27–36. http://www.cs.cmu.edu/~ralf/papers.html
Brown RD (2005) Context-sensitive retrieval for example-based machine translation. In: Proceedings of workshop: example-based machine translation, the tenth machine translation summit, pp 12–16. http://www.cs.cmu.edu/~ralf/papers.html
Brown RD (2008) Exploiting document-level context for data-driven machine translation. In: Proceedings of the eighth conference of the Association for Machine Translation in the Americas (AMTA-2008). http://www.amtaweb.org/papers/-2.02_Brown.pdf
Brown RD (2010) Taming structured perceptrons on wild feature vectors. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 384–391. http://www.aclweb.org/anthology/W10-1758
Brown R, Frederking R (1995) Applying statistical English language modeling to symbolic machine translation. In: Proceedings of the sixth international conference on theoretical and methodological issues in machine translation (TMI-95), pp 221–239
Brown RD, Hutchinson R, Bennett PN, Carbonell JG, Jansen P (2003) Reducing boundary friction using translation-fragment overlap. In: Proceedings of the ninth machine translation summit, pp 24–31. http://www.cs.cmu.edu~ralf/papers.html
Burrows M, Wheeler D (1994) A block-sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation
Carnegie Mellon University: public release of Haitian-Creole language data (2010). http://www.speech.cs.cmu.edu/haitian/text
Frederking R (1994) Statistical language models for symbolic MT. In: Language engineering on the information highway workshop, Santorini, Greece
Frederking R, Nirenburg S (1994) Three heads are better than one. In: Proceedings of the fourth conference on applied natural language processing (ANLP-94), Stuttgart, Germany
Frederking R, Nirenburg S, Farwell D, Helmreich S, Hovy E, Knight K, Beale S, Domashnev C, Attardo D, Grannes D, Brown R (1994) Integrating translations from multiple sources within the Pangloss mark III machine translation. In: Proceedings of the first conference of the Association for Machine Translation in the Americas. Association for Machine Translation in the Americas, Columbia, Maryland, pp 73–80
Gangadharaiah R, Brown R, Carbonell J (2006) Spectral clustering for example based machine translation. In: Proceedings of the human language technology conference of the NAACL, companion volume: short papers. Association for Computational Linguistics, pp 41–44. http://www.aclweb.org/anthology/N06-2011
Gangadharaiah R, Brown RD, Carbonell JG (2010) Monolingual distributional profiles for word substitution in machine translation. In: Proceedings of the 23rd international conference on computation linguistics (COLING-2010). http://www.cs.cmu.edu/~rgangadh/rashmi_coling10.pdf
Gimpel K, Smith NA (2008) Rich source-side context for statistical machine translation. In: Proceedings of the third workshop on statistical machine translation, pp 9–17
Graff D, Kong J, Chen K, Maeda K (2007) English gigaword, 3rd edn. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T07
Hutchinson R, Bennett PN, Carbonell J, Jansen P, Brown R (2003) Maximal lattice overlap in example-based machine translation. Tech. Rep. CMU-CS-03-138. Computer Science Department, Carnegie Mellon University
Kim JD, Brown RD, Jansen PJ, Carbonell JG (2005) Symmetric probabilistic alignment for example-based translation. In: Proceedings of the tenth workshop of the European Assocation for Machine Translation (EAMT-05)
Kim JD, Brown RD, Carbonell JG (2010) Chunk-Based EBMT. In: Proceedings of the 14th workshop of the European Association for Machine Translation (EAMT-2010)
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation. In: Machine translation: from real users to research, proceedings of the 6th conference of the Association for Machine Translation in the Americas (AMTA-2004), Lecture Notes in Artificial Intelligence, vol 3265. Springer Verlag
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics (ACL 2007), pp 177–180. (Demonstration session). http://www.aclweb.org/anthology/P07-2045.
Lin CY, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd meeting of the Association for Computational Linguistics (ACL’04), main volume, Barcelona, Spain, pp 605–612. doi:10.3115/1218955.1219032. http://www.aclweb.org/anthology/P04-1077
Lopez A (2007) Hierarchical phrase-based translation with suffix arrays. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, pp 976–985. http://www.aclweb.org/anthology/D/D07/D07-1104.pdf
Melamed ID (1997) A word-to-word model of translational equivalence. In: Proceedings of the 35th annual meeting of the Association for Computational Linguistics (ACL’97), pp 490–497. doi:10.3115/976909.979680. http://www.aclweb.org/anthology/P97-1063
Nagao M (1984) A framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence. North-Holland, pp 173–180
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st meeting of the Association for Computational Linguistics (ACL-2003). Association for Computational Linguistics, Sapporo, Japan, pp 160–167. doi:10.3115/1075096.1075117. http://www.aclweb.org/anthology/P03-1021
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics. doi:10.3115/1073083.1073135. http://www.aclweb.org/anthology/P02-1040
Phillips AB (2007) Sub-phrasal matching and structural templates in example-based MT. In: Proceedings of the 11th conference on theoretical and methodological issues in machine translation (TMI-07). http://www.dustoftheground.net/techne-/research/publications.php
Phillips AB (2010) The Cunei machine translation platform for WMT ’10. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics, MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 149–154. http://www.aclweb.org/anthology/W10-1721
Phillips AB (2011) Personal communication
Somers H (1999) Example-based machine translation. Mach Transl 14(2): 113–158
Article MathSciNet Google Scholar
Stolcke A (2002) Srilm—an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing, pp 901–904
Vogel S (2005) PESA: phrase pair extraction as sentence splitting. In: Proceedings of the tenth machine translation summit (MT Summit X)

Download references

Author information

Authors and Affiliations

Carnegie Mellon University Language Technologies Institute, Pittsburgh, PA, 15213, USA
Ralf D. Brown

Authors

Ralf D. Brown
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ralf D. Brown.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brown, R.D. The CMU-EBMT machine translation system. Machine Translation 25, 179–195 (2011). https://doi.org/10.1007/s10590-011-9095-8

Download citation

Received: 15 October 2010
Accepted: 14 July 2011
Published: 27 July 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s10590-011-9095-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The CMU-EBMT machine translation system

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A survey on large language model based autonomous agents

Natural Language Processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The CMU-EBMT machine translation system

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A survey on large language model based autonomous agents

Natural Language Processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation