Abstract
This paper addresses diagnostic evaluation of machine translation (MT) systems for Indian languages, English to Hindi MT in particular, assessing the performance of MT systems on relevant linguistic phenomena (checkpoints). We use the diagnostic evaluation tool DELiC4MT to analyze the performance of MT systems on various PoS categories (e.g. nouns, verbs). The current system supports only word level checkpoints which might not be as helpful in evaluating the translation quality as compared to using checkpoints at phrase level and checkpoints that deal with named entities (NE), inflections, word order, etc. We therefore suggest phrase level checkpoints and NEs as additional checkpoints for DELiC4MT. We further use Hjerson to evaluate checkpoints based on word order and inflections that are relevant for evaluation of MT with Hindi as the target language. The experiments conducted using Hjerson generate overall (document level) error counts and error rates for five error classes (inflectional errors, reordering errors, missing words, extra words, and lexical errors) to take into account the evaluation based on word order and inflections. The effectiveness of the approaches was tested on five English to Hindi MT systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Snover, M., Madnani, N., Dorr, B.J., Schwartz, R.: Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric. In: Proceedings of the 4th EACL Workshop on Statistical Machine Translation, pp. 259–268. Association for Computational Linguistics, Athens (2009)
Callison-Burch, C., Fordyce, C., Koehn, P., Monz, C., Schroeder, J.: (Meta-) evaluation of machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp. 136–158 (2007)
Stymne, S., Ahrenberg, L.: On the practice of error analysis for machine translation evaluation. In: Proceedings of 8th International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 1785–1790 (2012)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: A method for automatic evaluation of machine translation. In: Proceedings of 40th Annual Meeting of the ACL, Philadelphia, PA, USA, pp. 311–318 (2002)
Doddington, G.: Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics. In: Proceedings of the Human Language Technology Conference (HLT), San Diego, CA, pp. 128–132 (2002)
Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization, Ann Arbor, Michigan, pp. 65–72 (2005)
Lavie, A., Agarwal, A.: METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second ACL Workshop on Statistical Machine Translation, Prague, Czech Republic, pp. 228–231 (2007)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, AMTA 2006, Cambridge, MA, pp. 223–231 (2006)
Chatterjee, N., Balyan, R.: Towards Development of a Suitable Evaluation Metric for English to Hindi Machine Translation. International Journal of Translation 23(1), 7–26 (2011)
Gupta, A., Venkatapathy, S., Sangal, R.: METEOR-Hindi: Automatic MT Evaluation Metric for Hindi as a Target Language. In: Proceedings of ICON 2010: 8th International Conference on Natural Language Processing. Macmillan Publishers, India (2010)
Ananthakrishnan, R., Bhattacharyya, P., Sasikumar, M., Shah, R.: Some issues in automatic evaluation of English-Hindi MT: More blues for BLEU. In: Proceeding of 5th International Conference on Natural Language Processing (ICON 2007), Hyderabad, India (2007)
Chatterjee, N., Johnson, A., Krishna, M.: Some improvements over the BLEU metric for measuring the translation quality for Hindi. In: Proceedings of the International Conference on Computing: Theory and Applications, ICCTA 2007, Kolkata, India, pp. 485–490 (2007)
Moona, R.S., Sangal, R., Sharma, D.M.: MTeval: A Evaluation methodolgy for Machine Translation system. In: Proceedings of SIMPLE 2004, Kharagpur, India, pp. 15–19 (2004)
Toral, A., Naskar, S.K., Gaspari, F., Groves, D.: DELiC4MT: A Tool for Diagnostic MT Evaluation over User-defined Linguistic Phenomena. The Prague Bulletin of Mathematical Linguistics 98, 121–131 (2012)
Popović, M.: Hjerson:An Open Source Tool for Automatic Error Classification of Machine Translation Output. The Prague Bulletin of Mathematical Linguistics 96, 59–68 (2011)
Vilar, D., Xu, J., Fernando, L., D’Haro, N.H.: Error analysis of statistical machine translation output. In: Proceedings of Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, pp. 697–702 (2006)
Farrús, M., Costa-jussà , M.R.: Mariño, J. B., Fonollosa, J. A. R.: Linguistic-based evaluation criteria to identify statistical machine translation errors. In: Proceedings of EAMT, Saint Raphaël, France, pp. 52–57 (2010)
Popović, M., Ney, H.: Towards automatic error analysis of machine translation output. Computational Linguistics 37(4), 657–688 (2011)
Popović, M., Ney, H., Gispert, A.D., Mariño, J.B., Gupta, D., Federico, M., Lambert, P., Banchs, R.: Morpho-syntactic information for automatic error analysis of statistical machine translation output. In: StatMT 2006: Proceedings of the Workshop on Statistical Machine Translation, New York, pp. 1–6 (2006)
Popović, M., Burchardt, A.: From human to automatic error classification for machine translation output. In: Proceedings of EAMT 2011, Leuven, Belgium, pp. 265–272 (2011)
Popović, M.: rgbF: An Open Source Tool for n-gram Based Automatic Evaluation of Machine Translation Output. The Prague Bulletin of Mathematical Linguistics 98, 99–108 (2012)
Zeman, D., Fishel, M., Berka, J., Bojar, O.: Addicter: What Is Wrong with My Translations? The Prague Bulletin of Mathematical Linguistics 96, 79–88 (2011)
Fishel, M., Sennrich, R., Popović, M., Bojar, O.: TerrorCat: a translation error categorization-based MT quality metric. In: WMT 2012 Proceedings of the Seventh Workshop on Statistical Machine Translation, Stroudsburg, PA, USA, pp. 64–70 (2012)
Xiong, D., Zhang, M., Li, H.: Error detection for statistical machine translation using linguistic features. In: Proceedings of ACL 2010, Uppsala, Sweden, pp. 604–611 (2010)
Zhou, M., Wang, B., Liu, S., Li, M., Zhang, D., Zhao, T.: Diagnostic Evaluation of Machine Translation Systems using Automatically Constructed Linguistic Checkpoints. In: Proceedings of 22nd International Conference on Computational Linguistics (COLING 2008), pp. 1121–1128. Manchester (2008)
Naskar, S.K., Toral, A., Gaspari, F., Ways, A.: A framework for Diagnostic Evaluation of MT based on Linguistic Checkpoints. In: Proceedings of the 13th Machine Translation Summit, Xiamen,China, pp. 529–536 (2011)
Popović, M., Ney, H.: Word Error Rates: Decomposition over POS classes and Applications for Error Analysis. In: Proceedings of the 2nd ACL 2007 Workshop on Statistical MachineTranslation (WMT 2007), Prague, Czech Republic, pp. 48–55 (2007)
Koehn, P.: Statistical Significance Tests for Machine Translation Evaluation. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing, EMNLP, pp. 385–395 (2004)
Balyan, R., Naskar, S.K., Toral, A., Chatterjee, N.: A Diagnostic Evaluation Approach Targeting MT Systems for Indian Languages. In: Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-COLING 2012), Mumbai, India, pp. 61–72 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Balyan, R., Naskar, S.K., Toral, A., Chatterjee, N. (2013). A Diagnostic Evaluation Approach for English to Hindi MT Using Linguistic Checkpoints and Error Rates. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37256-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-37256-8_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37255-1
Online ISBN: 978-3-642-37256-8
eBook Packages: Computer ScienceComputer Science (R0)