Metric and reference factors in minimum error rate training

He, Yifan; Way, Andy

doi:10.1007/s10590-010-9072-7

Metric and reference factors in minimum error rate training

Published: 10 April 2010

Volume 24, pages 27–38, (2010)
Cite this article

Machine Translation

Yifan He¹ &
Andy Way¹

115 Accesses
3 Citations
Explore all metrics

Abstract

In Minimum Error Rate Training (MERT), Bleu is often used as the error function, despite the fact that it has been shown to have a lower correlation with human judgment than other metrics such as Meteor and Ter. In this paper, we present empirical results in which parameters tuned on Bleu may lead to sub-optimal Bleu scores under certain data conditions. Such scores can be improved significantly by tuning on an entirely different metric altogether, e.g. Meteor, by 0.0082 Bleu or 3.38% relative improvement on the WMT08 English–French data. We analyze the influence of the number of references and choice of metrics on the result of MERT and experiment on different data sets. We show the problems of tuning on a metric that is not designed for the single reference scenario and point out some possible solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. Ann Arbor, MI, pp 65–72
Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluation the role of bleu in machine translation research. In: EACL-2006, Proceedings of the 11th conference of the european chapter of the association for computational linguistics. Trento, Italy, pp 249–256
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third workshop on statistical machine translation. Columbus, OH, pp 70–106
Cer D, Jurafsky D, Manning C (2008) Regularization and search for minimum error rate training. In: Proceedings of the third workshop on statistical machine translation. Columbus, OH, pp 26–34
Chiang D, DeNeefe S, Chan YS, Ng HT (2008) Decomposability of translation metrics for improved evaluation and efficient algorithms. In: Proceedings of the 2008 conference on empirical methods in natural language processing. Honolulu, HI, pp 610–619
Dyer C, Setiawan H, Marton Y, Resnik P (2009) The University of Maryland statistical machine translation system for the fourth workshop on machine translation. In: Proceedings of the fourth workshop on statistical machine translation. Athens, Greece, pp 145–149
He Y, Way A (2009) Improving the objective function in minimum error rate training. In: Proceedings of the twelfth machine translation summit. Ottawa, ON, Canada, pp 238–245
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings the 2004 conference of empirical methods in natural language processing (EMNLP-2004). Barcelona, Spain, pp 388–395
Lambert P, Giménez J, Costa-jussà MR, Amigó E, Banchs RE, Màrquez L, Fonollosa JAR (2006) Machine Translation system development based on human likeness. In: Proceedings of the IEEE/ACL workshop on spoken language technology. Palm Beach, Aruba, pp 246–249
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8): 707–710
MathSciNet Google Scholar
Macherey W, Och F, Thayer I, Uszkoreit J (2008) Lattice-based minimum error rate training for statistical machine translation. In: Proceedings of the 2008 conference on empirical methods in natural language processing. Honolulu, HI, pp 725–734
Moore RC, Quirk C (2008) Random restarts in minimum error rate training for statistical machine translation. In: Proceedings of the 22nd international conference on computational linguistics (Coling 2008). Manchester, UK, pp 585–592
Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the association for computational linguistics. Sapporo, Japan, pp 160–167
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the association for computational linguistics. Philadelphia, PA, pp 295–302
Owczarzak K, van Genabith J, Way A (2007) Labelled dependencies in machine translation evaluation. In: Proceedings of the second workshop on statistical machine translation. Prague, Czech Republic, pp 104–111
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th annual meeting of the association for computational linguistics. Philadelphia, PA, pp 311–318
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006, Proceedings of the 7th conference of the association for machine translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 223–231

Download references

Author information

Authors and Affiliations

CNGL, School of Computing, Dublin City University, Dublin, Ireland
Yifan He & Andy Way

Authors

Yifan He
View author publications
You can also search for this author in PubMed Google Scholar
Andy Way
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yifan He.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, Y., Way, A. Metric and reference factors in minimum error rate training. Machine Translation 24, 27–38 (2010). https://doi.org/10.1007/s10590-010-9072-7

Download citation

Received: 12 May 2009
Accepted: 18 March 2010
Published: 10 April 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10590-010-9072-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Metric and reference factors in minimum error rate training

Abstract

Access this article

Similar content being viewed by others

Analysis of the AutoML Challenge Series 2015–2018

MetaAudio: A Few-Shot Audio Classification Benchmark

Deep Bilevel Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Metric and reference factors in minimum error rate training

Abstract

Access this article

Similar content being viewed by others

Analysis of the AutoML Challenge Series 2015–2018

MetaAudio: A Few-Shot Audio Classification Benchmark

Deep Bilevel Learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation