Skip to main content
Log in

Metric and reference factors in minimum error rate training

  • Published:
Machine Translation

Abstract

In Minimum Error Rate Training (MERT), Bleu is often used as the error function, despite the fact that it has been shown to have a lower correlation with human judgment than other metrics such as Meteor and Ter. In this paper, we present empirical results in which parameters tuned on Bleu may lead to sub-optimal Bleu scores under certain data conditions. Such scores can be improved significantly by tuning on an entirely different metric altogether, e.g. Meteor, by 0.0082 Bleu or 3.38% relative improvement on the WMT08 English–French data. We analyze the influence of the number of references and choice of metrics on the result of MERT and experiment on different data sets. We show the problems of tuning on a metric that is not designed for the single reference scenario and point out some possible solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. Ann Arbor, MI, pp 65–72

  • Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluation the role of bleu in machine translation research. In: EACL-2006, Proceedings of the 11th conference of the european chapter of the association for computational linguistics. Trento, Italy, pp 249–256

  • Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third workshop on statistical machine translation. Columbus, OH, pp 70–106

  • Cer D, Jurafsky D, Manning C (2008) Regularization and search for minimum error rate training. In: Proceedings of the third workshop on statistical machine translation. Columbus, OH, pp 26–34

  • Chiang D, DeNeefe S, Chan YS, Ng HT (2008) Decomposability of translation metrics for improved evaluation and efficient algorithms. In: Proceedings of the 2008 conference on empirical methods in natural language processing. Honolulu, HI, pp 610–619

  • Dyer C, Setiawan H, Marton Y, Resnik P (2009) The University of Maryland statistical machine translation system for the fourth workshop on machine translation. In: Proceedings of the fourth workshop on statistical machine translation. Athens, Greece, pp 145–149

  • He Y, Way A (2009) Improving the objective function in minimum error rate training. In: Proceedings of the twelfth machine translation summit. Ottawa, ON, Canada, pp 238–245

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings the 2004 conference of empirical methods in natural language processing (EMNLP-2004). Barcelona, Spain, pp 388–395

  • Lambert P, Giménez J, Costa-jussà MR, Amigó E, Banchs RE, Màrquez L, Fonollosa JAR (2006) Machine Translation system development based on human likeness. In: Proceedings of the IEEE/ACL workshop on spoken language technology. Palm Beach, Aruba, pp 246–249

  • Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8): 707–710

    MathSciNet  Google Scholar 

  • Macherey W, Och F, Thayer I, Uszkoreit J (2008) Lattice-based minimum error rate training for statistical machine translation. In: Proceedings of the 2008 conference on empirical methods in natural language processing. Honolulu, HI, pp 725–734

  • Moore RC, Quirk C (2008) Random restarts in minimum error rate training for statistical machine translation. In: Proceedings of the 22nd international conference on computational linguistics (Coling 2008). Manchester, UK, pp 585–592

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the association for computational linguistics. Sapporo, Japan, pp 160–167

  • Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the association for computational linguistics. Philadelphia, PA, pp 295–302

  • Owczarzak K, van Genabith J, Way A (2007) Labelled dependencies in machine translation evaluation. In: Proceedings of the second workshop on statistical machine translation. Prague, Czech Republic, pp 104–111

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th annual meeting of the association for computational linguistics. Philadelphia, PA, pp 311–318

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006, Proceedings of the 7th conference of the association for machine translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 223–231

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yifan He.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, Y., Way, A. Metric and reference factors in minimum error rate training. Machine Translation 24, 27–38 (2010). https://doi.org/10.1007/s10590-010-9072-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-010-9072-7

Keywords

Navigation