Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Padó, Sebastian; Cer, Daniel; Galley, Michel; Jurafsky, Dan; Manning, Christopher D.

doi:10.1007/s10590-009-9060-y

Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Published: 08 November 2009

Volume 23, pages 181–193, (2009)
Cite this article

Machine Translation

Sebastian Padó¹,
Daniel Cer²,
Michel Galley²,
Dan Jurafsky² &
…
Christopher D. Manning²

316 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Current evaluation metrics for machine translation have increasing difficulty in distinguishing good from merely fair translations. We believe the main problem to be their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that assesses the quality of MT output through its semantic equivalence to the reference translation, based on a rich set of match and mismatch features motivated by textual entailment. We first evaluate this metric in an evaluation setting against a combination metric of four state-of-the-art scores. Our metric predicts human judgments better than the combination metric. Combining the entailment and traditional features yields further improvements. Then, we demonstrate that the entailment metric can also be used as learning criterion in minimum error rate training (MERT) to improve parameter estimation in MT system training. A manual evaluation of the resulting translations indicates that the new model obtains a significant improvement in translation quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Translation Evaluation: Manual Versus Automatic—A Comparative Study

ELEXR: Automatic Evaluation of Machine Translation Using Lexical Relationships

Machine Translation Quality Estimation: Applications and Future Perspectives

References

Amigó E, Giménez J, Gonzalo J, Màrquez L (2006) MT evaluation: human-like vs. human acceptable. In: Proceedings of COLING/ACL 2006, pp 17–24
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on evaluation measures, pp 65–72
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the ACL workshop on statistical machine translation, pp 70–106
Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: Proceedings of EACL. pp 249–256
Cer D, Jurafsky D, Manning CD (2008) Regularization and search for minimum error rate training. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 26–34
Chan YS, Ng HT (2008) MAXSIM: a maximum similarity metric for machine translation evaluation. In: Proceedings of ACL-08/HLT, pp 55–62
Dagan I, Glickman O, Magnini B (2005) The PASCAL recognising textual entailment challenge. In: Proceedings of the PASCAL RTE workshop, pp 177–190
de Marneffe M-C, Grenager T, MacCartney B, Cer D, Ramage D, Kiddon C, Manning CD (2007) Aligning semantic graphs for textual inference and machine reading. In: Proceedings of the AAAI spring symposium on machine reading, pp 36–42
de Marneffe M-C, MacCartney B, Manning CD (2006) Generating typed dependency parses from phrase structure parses. In: Fifth international conference on language resources and evaluation (LREC 2006), pp 449–454
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In: Proceedings of HLT, pp 128–132
Fabrigar LR, Krosnick JA, MacDougall BL (2005) Attitude measurement: techniques for measuring the unobservable. In: Brock T, Green M (eds) Persuasion: psychological insights and perspectives, Chap 2. 2nd edn. Sage, Thousand Oaks
Google Scholar
Giménez J, Márquez L (2008) Heterogeneous automatic MT evaluation through non-parametric metric combinations. In: Proceedings of IJCNLP, pp 319–326
Hoang H, Birch A, Callison-Burch C, Zens R, Aachen R, Constantin A, Federico M, Bertoldi N, Dyer C, Cowan B, Shen W, Moran C, Bojar O (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, pp 177–180
Kauchak D, Barzilay R (2006) Paraphrasing for automatic evaluation. In: Proceedings of HLT-NAACL, pp 455–462
Koehn P, Och F, Marcu D (2003) Statistical Phrase-Based Translation. In: Proceedings of HLT-NAACL. pp 127–133
Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140): 1–55
Google Scholar
Lin C-Y, Och FJ (2004) ORANGE: a method for evaluating automatic evaluation metrics for machine translation. In: Proceedings of COLING. pp. 501–507
Lin D (1998) Extracting collocations from text corpora. In: First workshop on computational terminology, pp 57–63
Liu D, Gildea D (2005) Syntactic features for evaluation of machine translation. In: Proceedings of the ACL workshop on evaluation measures, pp 25–32
MacCartney B, Grenager T, de Marneffe M-C, Cer D, Manning CD (2006) Learning to recognize features of valid textual entailments. In: Proceedings of NAACL, pp 41–48
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller K (1990) WordNet: an on-line lexical database. Int J Lexicogr 3: 235–244
Article Google Scholar
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of ACL, pp 160–167
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Article Google Scholar
Owczarzak K, van Genabith J, Way A (2008) Evaluating machine translation with LFG dependencies. Mach Transl 21(2): 95–119
Article Google Scholar
Padó S, Galley M, Jurafsky D, Manning C (2009) Textual entailment features for machine translation evaluation. In: Proceedings of the EACL workshop on machine translation, pp 37–41
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp 311–318
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of AMTA, pp 223–231
Snow R, O’Connor B, Jurafsky D, Ng A (2008) Cheap and fast—but is it good? evaluating non-expert annotations for natural language tasks. In: Proceedings of EMNLP, pp 254–263
Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the international conference on spoken language processing, pp 901–904
Takayama Y, Flournoy R, Kaufmann S, Peters S (1999) Information retrieval based on domain-specific word associations. In: Proceedings of PACLING, pp 155–161
Tseng H, Chang P-C, Andrew G, Jurafsky D, Manning C (2005) A conditional random field word segmenter for the SIGHAN bakeoff 2005. In: Proceedings of the SIGHAN workshop on chinese language processing, pp 32–39
Zhou L, Lin C-Y, Hovy E (2006) Re-evaluating machine translation results with paraphrase support. In: Proceedings of EMNLP, pp 77–84

Download references

Author information

Authors and Affiliations

Stuttgart University, Stuttgart, Germany
Sebastian Padó
Stanford University, Stanford, USA
Daniel Cer, Michel Galley, Dan Jurafsky & Christopher D. Manning

Authors

Sebastian Padó
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Cer
View author publications
You can also search for this author in PubMed Google Scholar
Michel Galley
View author publications
You can also search for this author in PubMed Google Scholar
Dan Jurafsky
View author publications
You can also search for this author in PubMed Google Scholar
Christopher D. Manning
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Padó.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Padó, S., Cer, D., Galley, M. et al. Measuring machine translation quality as semantic equivalence: A metric based on entailment features. Machine Translation 23, 181–193 (2009). https://doi.org/10.1007/s10590-009-9060-y

Download citation

Received: 10 May 2009
Accepted: 15 October 2009
Published: 08 November 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s10590-009-9060-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Abstract

Access this article

Similar content being viewed by others

Machine Translation Evaluation: Manual Versus Automatic—A Comparative Study

ELEXR: Automatic Evaluation of Machine Translation Using Lexical Relationships

Machine Translation Quality Estimation: Applications and Future Perspectives

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Abstract

Access this article

Similar content being viewed by others

Machine Translation Evaluation: Manual Versus Automatic—A Comparative Study

ELEXR: Automatic Evaluation of Machine Translation Using Lexical Relationships

Machine Translation Quality Estimation: Applications and Future Perspectives

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation