Factor-based evaluation for English to Hindi MT outputs

Balyan, Renu; Chatterjee, Niladri

doi:10.1007/s10579-018-9426-y

Factor-based evaluation for English to Hindi MT outputs

Original Paper
Published: 29 September 2018

Volume 52, pages 969–996, (2018)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

193 Accesses
3 Citations
Explore all metrics

Abstract

Design and implementation of automatic evaluation methods is an integral part of any scientific research in accelerating the development cycle of the output. This is no less true for automatic machine translation (MT) systems. However, no such global and systematic scheme exists for evaluation of performance of an MT system. The existing evaluation metrics, such as BLEU, METEOR, TER, although used extensively in literature have faced a lot of criticism from users. Moreover, performance of these metrics often varies with the pair of languages under consideration. The above observation is no less pertinent with respect to translations involving languages of the Indian subcontinent. This study aims at developing an evaluation metric for English to Hindi MT outputs. As a part of this process, a set of probable errors have been identified manually as well as automatically. Linear regression has been used for computing weight/penalty for each error, while taking human evaluations into consideration. A sentence score is computed as the weighted sum of the errors. A set of 126 models has been built using different single classifiers and ensemble of classifiers in order to find the most suitable model for allocating appropriate weight/penalty for each error. The outputs of the models have been compared with the state-of-the-art evaluation metrics. The models developed for manually identified errors correlate well with manual evaluation scores, whereas the models for the automatically identified errors have low correlation with the manual scores. This indicates the need for further improvement and development of sophisticated linguistic tools for automatic identification and extraction of errors. Although many automatic machine translation tools are being developed for many different language pairs, there is no such generalized scheme that would lead to designing meaningful metrics for their evaluation. The proposed scheme should help in developing such metrics for different language pairs in the coming days.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

The translator as accessed in September, 2017. All the translations mentioned in the paper have been done in Sept, 2017.
If the subject is a pronoun, the gender will be that of the noun the pronoun is referring to.
http://tdil-dc.in/.
http://www.bing.com/translator/.
https://translate.google.co.in/.
http://www.cdacmumbai.in/matra.
http://sivareddy.in/downloads (The tagger has been developed by IIIT Hyderabad).

References

Balyan, R., & Chatterjee, N. (2015). Translating noun compounds using semantic relations. Computer Speech & Language, 32(1), 91–108.
Article Google Scholar
Banerjee, S. & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization at 43rd ACL, Ann Arbor, Michigan.
Bernard, C. (1989). Language universals and linguistic typology. Chicago: The University of Chicago Press.
Google Scholar
Bharati, A. & Kulkarni, A. (2005). English from Hindi viewpoint: A Paaninian perspective. In Platinum Jubilee conference of Linguistic Society of India, held at CALTS, University of Hyderabad, Hyderabad.
Breiman, L. (1996a). Bagging predictors. Machine Learning, 24(2), 123–140.
Google Scholar
Breiman, L. (1996b). Stacked regressions. Machine Learning, 24(1), 49–64.
Google Scholar
Chatterjee, N., & Balyan, R. (2011). Context resolution of verb particle constructions for English to Hindi translation. In Proceedings of the 25th Pacific Asia conference on language, information and computation (PACLIC 25), Singapore (pp. 140–149).
Chatterjee, N., Johnson, A., & Krishna, M. (2007). Some Improvements over the BLEU metric for measuring translation quality for Hindi. Proceedings of ICCTA IEEE Computer Society, 2007, 485–490.
Google Scholar
Dave, S., Parikh, J., & Bhattacharya, P. (2001). Interlingua-based English–Hindi machine translation and language divergence. Machine Translation, 16, 251.
Article Google Scholar
Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of LILT 2002, human language technology conference, San Diego, California. pp. 138–145.
Dorr, B. (1993). Machine translation: A view from the Lexicon. Cambridge, MA: The MIT Press.
Google Scholar
Dorr, B. (1994). Classification of machine translation divergences and a proposed solution. Computational Linguistics, 20(4), 597–633.
Google Scholar
Farrús, M., Costa-jussà, M. R., Mariño, J. B., & Fonollosa, J. A. R. (2010). Linguistic-based evaluation criteria to identify statistical machine translation errors. In Proceedings of EAMT, Saint Raphael, France (pp. 52–57).
Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the thirteenth international conference on machine learning, Bari, Italy (pp. 148–156).
Guenther, W. C. (1964). Analysis of variance. Upper Saddle river: Prentice-Hall.
Google Scholar
Gupta, D., & Chatterjee, N. (2001). Study of divergence for example based English–Hindi machine translation, STRANS 2001. IIT Kanpur, 2001, 132–139.
Google Scholar
Gupta, D., & Chatterjee, N. (2003). Identification of divergence for English-to-Hindi EBMT. In MT Summit-IX, Orleans. LA, 2003 (pp. 141–148).
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10(8), 707–710.
Google Scholar
Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia (pp. 311–318).
Popović, M. (2011). Hjerson: An open source tool for automatic error classification of machine translation output. The Prague Bulletin of Mathematical Linguistics, 96, 59–67.
Article Google Scholar
Popović, M., & Ney, H. (2007). Word error rates: Decomposition over POS classes and applications for error analysis. In Proceedings of the 2nd ACL 07 workshop on statistical machine translation (WMT 07), Prague, Czech Republic (pp. 48–55).
Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.
Google Scholar
Sinha, R. M. K., & Thakur, A. (2005a). Translation divergence in English–Hindi MT, In EAMT 10th annual conference, Budapest. Hungary, May 2005 (pp. 245–254).
Sinha, R. M. K., & Thakur, A. (2005b). Divergence patterns in machine translation between Hindi and English, In MT Summit X. Phuket. Thailand, September 2005 (pp. 346–353).
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas—AMTA 2006. Cambridge, MA (pp. 223–231).
Stone, C. J. (1985). Additive regression and other nonparametric models. The Annals of Statistics, 13(2), 689–705.
Article Google Scholar
Vilar, D., Xu, J., D’Haro, L. F., & Ney, H. (2006). Error analysis of statistical machine translation output. In Proceedings of the 5th international conference on language resources and evaluation (LREC, 06). Genoa (pp. 697–702).
Wolpert, D. (1992). Stacked generalization. Neural Networks, 5(2), 241–260.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Arizona State University, Tempe, AZ, USA
Renu Balyan
Indian Institute of Technology Delhi, New Delhi, India
Niladri Chatterjee

Authors

Renu Balyan
View author publications
You can also search for this author in PubMed Google Scholar
Niladri Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Niladri Chatterjee.

Additional information

Renu Balyan: Work done while at IIT Delhi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balyan, R., Chatterjee, N. Factor-based evaluation for English to Hindi MT outputs. Lang Resources & Evaluation 52, 969–996 (2018). https://doi.org/10.1007/s10579-018-9426-y

Download citation

Published: 29 September 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10579-018-9426-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Factor-based evaluation for English to Hindi MT outputs

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Machine translation systems and quality assessment: a systematic review

Assessing gender bias in machine translation: a case study with Google Translate

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Factor-based evaluation for English to Hindi MT outputs

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Machine translation systems and quality assessment: a systematic review

Assessing gender bias in machine translation: a case study with Google Translate

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation