Abstract
This paper evaluates the performance of our recently proposed automatic machine translation evaluation metric MaxSim and examines the impact of translation fluency on the metric. MaxSim calculates a similarity score between a pair of English system-reference sentences by comparing information items such as n-grams across the sentence pair. Unlike most metrics which perform binary matching, MaxSim also computes similarity scores between items and models them as nodes in a bipartite graph to select a maximum weight matching. Our experiments show that MaxSim is competitive with state-of-the-art metrics on benchmark datasets.
Similar content being viewed by others
References
Albrecht JS, Hwa R (2008) Regression for machine translation evaluation at the sentence level. Mach Translat 22: 1–27
Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2003) Confidence estimation for machine translation. In: Proceedings of CLSP summer workshop final report
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-) Evaluation of machine translation. In: Proceedings of the second workshop on statistical machine translation, ACL-07, pp 136–158
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third workshop on statistical machine translation, ACL-08:HLT, pp 70–106
Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the fourth workshop on statistical machine translation, EACL-09, pp 1–28
Chan YS, Ng HT (2008a) MAXSIM: a maximum similarity metric for machine translation evaluation. In: Proceedings of ACL-08:HLT, pp 55–62
Chan YS, Ng HT (2008b) MAXSIM: an automatic metric for machine translation evaluation based on maximum similarity. In: Proceedings of the MetricsMATR workshop of AMTA-08
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of HLT-02, pp 138–145
Giménez J, Màrquez L (2007) Linguistic features for automatic evaluation of heterogenous MT systems. In: Proceedings of the second workshop on statistical machine translation, ACL-07, pp 256–264
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of EMNLP-04, pp 388–395
Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2(1): 83–97
Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation, ACL-07, pp 228–231
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1): 32–38
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL-02, pp 311–318
Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. In: Proceedings of EMNLP-96, pp 133–142
Riezler S, Maxwell JT (2005) On some pitfalls in automatic evaluation and significance testing for MT. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 57–64
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chan, Y.S., Ng, H.T. MaxSim: performance and effects of translation fluency. Machine Translation 23, 157–168 (2009). https://doi.org/10.1007/s10590-009-9058-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-009-9058-5