Abstract
Adopting the regression SVM framework, this paper proposes a linguistically motivated feature engineering strategy to develop an MT evaluation metric with a better correlation with human assessments. In contrast to current practices of “greedy” combination of all available features, six features are suggested according to the human intuition for translation quality. Then the contribution of linguistic features is examined and analyzed via a hill-climbing strategy. Experiments indicate that, compared to either the SVM-ranking model or the previous attempts on exhaustive linguistic features, the regression SVM model with six linguistic information based features generalizes across different datasets better, and augmenting these linguistic features with proper non-linguistic metrics can achieve additional improvements.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Papineni K, Roukos S, Ward T, Zhu W J. BLEU: A method for automatic evaluation of machine translation. IBM Research Report, RC22176 (W0109-022), 2001.
George D. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proc. the 2nd International Conference of Human Language Technology Research, San Diego, USA, Mar. 24-27, 2002, pp. 138–145.
Kulesza A, Shieber S M. A learning approach to improving sentence-level MT evaluation. In Proc. the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, Baltimore, USA, Oct. 4-6, 2004, pp. 75–84.
Leusch G, Ueffing N, Nev H. CDER: Efficient MT evaluation using block movements. In Proc. the 13th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, Apr. 3-7, 2006, pp. 241–248
Russo-Lassner G, Lin J, Resnik P. A paraphrase-based approach to machine translation evaluation. Technical Report, LAMP-TR-125/CS-TR-4754/UMIACS-TR-2005-57, University of Maryland, College Park, USA, August.
Lin C Y, Och F J. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proc. the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, Jul. 21-26, 2004, pp. 605–612.
Banerjee S, Lavie A, Meteor: An automatic metric for MT evaluation with improved correlation with human judgments. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, USA, Jun. 29-30, 2005, pp. 65–72.
Corston-Oliver S, Gamon M, Chris B. A machine learning approach to the automatic evaluation of machine translation. In Proc. the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, Jul. 9-11, 2001, pp. 148–155
Albrecht J S, Hwa R. A re-examination of machine learning approaches for sentence-level MT evaluation. In Proc. the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech, Jun. 23-30, 2007, pp. 880–887.
Ye Y, Zhou M, Lin C Y. Sentence level machine translation evaluation as a ranking. In Proc. ACL Second Workshop on Statistical Machine Translation, Prague, Czech, Jun. 23-30, 2007, pp. 240–247.
Duh K. Ranking vs. regression in machine translation evaluation. In Proc. ACL 3rd Workshop on Statistical Machine Translation, Columbus, USA, Jun. 15-20, 2008, pp. 191–194.
Giménez J, Mμarquez L. Linguistic features for automatic evaluation of heterogenous MT systems. In Proc. ACL 2nd Workshop on Statistical Machine Translation, Prague, Czech, Jun. 23-30, 2007, pp. 256–264.
Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N. Confidence estimation for machine translation. Natural Language Engineering Work-shop Final Report, Johns Hopkins University, 2003.
Amigó E, Giménez J, Gonzalo J, Mμarquez L. MT evaluation: Human-like vs. human acceptable. In Proc. the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, Jul. 17-21, 2006, pp. 17–24.
Nießn S, Och F J, Leusch G, Ney H. An evaluation tool for machine translation: Fast evaluation for MT research. In Proc. the 2nd International Conference on Language Resources & Evaluation, Athens, Greek, May 30-Jun. 2, 2000, pp. 39–45.
Tillmann C, Vogel S, Ney H, Zubiaga A, Sawaf H. Accelerated DP based search for statistical translation. In Proc. European Conference on Speech Communication and Technology, Rhodes, Greece, Sept. 22-25, 1997, pp. 2667–2670.
Giménez J, Mμaquez L. Linguistic features for automatic evaluation of heterogeneous MT systems. In Proc. ACL Second Workshop on Statistical Machine Translation, Prague, Czech, Jun. 23-30, 2007, pp. 256–264.
Catford J. A Linguistic Theory of Translation. London: Oxford University Press, 1965.
Collins M. Head-driven statistical models for natural language parsing [Ph.D. Dissertation]. University of Pennsylvania, 1999.
Gale W A, Church K W. A program for aligning sentences in bilingual corpora. Computational Linguistics, 1993, 19(1): 75–102.
Abramowitz M, Stegun I. Handbook of Mathematical Functions. US Government Printing Office. 1964.
Liu D, Gildea D. Syntactic features for evaluation of machine translation. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, USA, Jun. 25-30, 2005, pp. 25–32.
Quirk C B. Training a sentence-level machine translation confidence measure. In Proc. the 4th International Conference on Language Resources and Evaluation, Lisbon, May, 2004, pp. 825–828.
Koehn P. Statistical significance tests for machine translation evaluation. In Proc. Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain Jul. 25-26, 2004.
Giménez J, Mμarquez L. A smorgasbord of features for automatic MT evaluation. In Proc. ACL Third Workshop on Statistical Machine Translation, Columbus, USA, Jun. 15-20, 2008, pp. 195–198.
Zhu X, Yang M, Wang L, Wang J, Li S. A quantitative analysis of linguistic factors in human translation evaluation. In Proc. the 2nd International Symposium on Knowledge Acquisition Modeling (KAM 2009), Wuhan, China, Nov. 30-Dec. 1, 2009, pp. 410–413.
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J. Further meta-evaluation of machine translation. In Proc. ACL Third Workshop on Statistical Machine Translation, Columbus, USA, Jun. 15-20, 2008, pp. 70–106.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China under Grant Nos. 60773066 and 60736014, the National High Technology Development 863 Program of China under Grant No. 2006AA010108, and the Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant No. HIT.NSFIR.20009070.
This fact partially reflects the difficulty of getting the rich linguistics even for the researchers.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Yang, MY., Sun, SQ., Zhu, JG. et al. Improvement of Machine Translation Evaluation by Simple Linguistically Motivated Features. J. Comput. Sci. Technol. 26, 57–67 (2011). https://doi.org/10.1007/s11390-011-9415-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-011-9415-8