Abstract
We present two evaluation measures for Machine Translation (MT), which are defined as error rates extended by block moves. In contrast to Ter, these measures are constrained in a way that allows for an exact calculation in polynomial time. We then investigate three methods to estimate the standard error of error rates, and compare them to bootstrap estimates. We assess the correlation of our proposed measures with human judgment using data from the National Institute of Standards and Technology (NIST) 2008 MetricsMATR workshop.
Similar content being viewed by others
References
Bisani M, Ney H (2004) Bootstrap estimates for confidence intervals in ASR performance evaluation. In: IEEE international conference on acoustics, peech, and signal processing. Montreal, Canada, pp 409–412
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New York and London
Gatz DF, Smith L (1995) The standard error of a weighted mean concentration—I. Bootstrapping vs other methods. Atmos Environ 29(11): 1185–1193
Karakos D, Eisner J, Khudanpur S, Dreyer M (2008) Machine translation system combination using ITG-based alignments. In: Proceedings of ACL-08: HLT, short papers. Columbus, Ohio, pp 81–84
Knuth DE (1993) The Stanford GraphBase: a platform for combinatorial computing. ACM Press, New York, NY, pp 74–87
Leusch G, Ueffing N, Ney H (2003) A novel string-to-string distance measure with applications to machine translation evaluation. In: Proceedings of MT Summit IX. New Orleans, LA, pp 240–247
Leusch G, Ueffing N, Ney H (2006) CDER: efficient MT evaluation using block movements. In: Conference of the European chapter of the association for computational linguistics. European Chapter of the Association for Computational Linguistics, Trento, Italy, pp 241–248
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Doklady 10(8): 707–710
Lin C-Y, Och FJ (2004) ORANGE: a method for evaluation automatic evaluation metrics for machine translation. In: Proceedings of COLING 2004. Geneva, Switzerland, pp 501–507
Lopresti D, Tomkins A (1997) Block edit models for approximate string matching. Theor Comput Sci 181(1): 159–179
Przybocki M, Peterson K, Bronsart S (2008) Official results of the NIST 2008 Metrics for MAchine TRanslation Challenge (MetricsMATR08). http://nist.gov/speech/tests/metricsmatr/2008/results/
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul (2006) A study of translation error rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas (AMTA). Boston, MA, pp 223–231
Tillmann C, Vogel S, Ney H, Zubiaga A, Sawaf H (1997) Accelerated DP Based search for statistical translation. In: European conference on speech communication and technology. Rhodes, Greece, pp 2667–2670
Wu D (1995) An algorithm for simultaneously bracketing parallel texts by aligning words. In: Proceedings of the 33rd annual conference of the association for computational linguistics, pp 244–251
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Leusch, G., Ney, H. Edit distances with block movements and error rate confidence estimates. Machine Translation 23, 129–140 (2009). https://doi.org/10.1007/s10590-009-9063-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-009-9063-8