Abstract
Machine translation helps resolve language incomprehensibility issues and eases interaction among people from varying linguistic backgrounds. Although corpus-based approaches (statistical and neural) offer reasonable translation accuracy for large-sized corpus, robustness of such approaches lie in their ability to adapt to low-resource languages, which confront unavailability of large-sized corpus. In this paper, prediction aptness of two approaches has been meticulously explored in the context of Mizo, a low-resource Indian language. Translations predicted by the two approaches have been comparatively and adequately analyzed on a number of grounds to infer their strengths and weaknesses, particularly in low-resource scenarios.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bentham J, Pakray P, Majumder G, Lalbiaknia S, Gelbukh A (2016) Identification of rules for recognition of named entity classes in Mizo language. In: 15th Mexican international conference on artificial intelligence (MICAI 2016). Springer, Cancun
Bhattacharyya P (2015) Machine translation. CRC Press, Boca Raton
Brown PF, Cocke J, Pietra SAD, Pietra VJD, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2):79–85
Cheng Y, Xu W, He Z, He W, Wu H, Sun M, Liu Y (2016) Semi-supervised learning for neural machine translation. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, Berlin (long papers), vol 1, pp 1965–1974. https://doi.org/10.18653/v1/P16-1185
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder–decoder approaches. In: Proceedings of the eighth workshop on syntax, semantics and structure in statistical translation. Association for Computational Linguistics, Doha, Qatar, pp 103–111. https://doi.org/10.3115/v1/W14-4012
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Doha, Qatar, pp 1724–1734
Dabre R, Cromieres F, Kurohashi S, Bhattacharyya P (2015) Leveraging small multilingual corpora for SMT using many Pivot languages. In: Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Denver, CO, pp 1192–1202
Das A, Yerra P, Kumar K, Sarkar S (2016) A study of attention-based neural machine translation models on Indian languages. In: 6th workshop on South and Southeast Asian Natural Language Processing, Osaka, pp 153–162
Dave S, Parikh J, Bhattacharyya P (2001) Interlingua-based English–Hindi Machine Translation and Language Divergence. Mach Transl 16(4):251–304
Fadaee M, Bisazza A, Monz C (2017) Data augmentation for low-resource neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (short papers), Association for Computational Linguistics, Vancouver, vol 2, pp 567–573. https://doi.org/10.18653/v1/P17-2090
Firat O, Cho K, Sankaran B, Yarman Vural FT, Bengio Y (2017) Multi-way, multilingual neural machine translation. Comput Speech Lang 45:236–252. https://doi.org/10.1016/j.csl.2016.10.006
Gu J, Hassan H, Devlin J, Li VO (2018) Universal neural machine translation for extremely low resource languages. arXiv preprint arXiv:1802.05368
Hearne M, Way A (2011) Statistical machine translation: a guide for linguists and translators. Lang Linguist Compass 5(5):205–226
Hutchins WJ, Somers HL (1992) An introduction to machine translation, vol 362. Academic Press London, Oxford
Irvine A, Callison-Burch C (2013) Combining bilingual and comparable corpora for Low Resource Machine Translation. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, pp 262–270
Kalchbrenner N, Blunsom P (2013) Recurrent convolutional neural networks for discourse compositionality. In: Proceedings of the 2013 workshop on continuous vector space models and their compositionality, Sofia, Bulgaria, pp 119–126
Karakanta A, Dehdari J, van Genabith J (2017) neural machine translation for low-resource languages without parallel corpora. Mach Transl. https://doi.org/10.1007/s10590-017-9203-5
Klein G, Kim Y, Deng Y, Senellart J, Rush A (2017) OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, system demonstrations. Association for Computational Linguistics, Vancouver, pp 67–72
Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge
Koehn P, Hoang H (2010) Moses, statistical machine translation system. User manual and code guide
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R et al (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational Linguistics, Stroudsburg, PA, pp 177–180
Kunchukuttan A, Shah M, Prakash P, Bhattacharyya P (2017) Utilizing lexical similarity between related, low-resource languages for Pivot-based SMT. In: Proceedings of the eighth international joint conference on natural language processing (short papers), Taipei, Taiwan, vol 2, pp 283–289
Lakew SM, Mattia A, Marcello F (2017) Multilingual neural machine translation for low resource languages. In: CLiC-it 2017—4th Italian conference on computational linguistics. CLiC-it, Rome
Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation. Association for Computational Linguistics, Prague, pp 228–231
Lavie A, Denkowski MJ (2009) The METEOR metric for automatic evaluation of machine translation. Mach Transl 23(2–3):105–115
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in Natural Language Processing, EMNLP 2015. Lisbon, pp 1412–1421
Luong T, Sutskever I, Le QV, Vinyals O, Zaremba W (2015) Addressing the rare word problem in neural machine translation. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Beijing, pp 11–19
Majumder G, Pakray P, Khiangte Z, Gelbukh A (2016) Multiword expressions (MWE) for Mizo Language: literature survey. In: International conference on intelligent text processing and computational linguistics, Springer, Konya, pp 623–635
Marie B, Fujita A (2018) Phrase table induction using monolingual data for low-resource statistical machine translation. ACM Trans Asian Low Resour Lang Inf Process (TALLIP) 17(3):1–25
Martınez A, Matsumoto Y (2016) Improving neural machine translation on resource-limited pairs using auxiliary data of a third language. In: Proceedings of AMTA 2016, Austin, pp 135–204
Marton Y, Callison-Burch C, Resnik P (2009) Improved statistical machine translation using monolingually-derived paraphrases. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Association for Computational Linguistics, Singapore, pp 381–390
Pakray P, Pal A, Majumder G, Gelbukh A (2015) Resource Building and POS Tagging for Mizo Language. In: Fourteenth Mexican International Conference on Artificial Intelligence (MICAI), pp 3–7. Cuernavaca, Mexico
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318. Association for Computational Linguistics, Philadelphia, Pennsylvania
Sánchez-Cartagena VM, Sánchez-Martínez F, Pérez-Ortiz JA (2011) Enriching a statistical machine translation system trained on small parallel corpora with rule-based bilingual phrases. In: Proceedings of the international conference recent advances in natural language processing 2011. Hissar, Bulgaria, pp 90–96
Somers H (1999) Review article: Example-based machine translation. Mach Transl 14(2):113–157
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of advances in neural information processing systems. Montréal, pp 3104–3112
Tinsley J, Hearne M, Way A (2009) Exploiting parallel treebanks to improve phrase-based statistical machine translation. In: International conference on intelligent text processing and computational linguistics, Springer, Mexico City, pp 318–331
Tyers FM, Dugast L, Park J (2009) Rule-based augmentation of training data in Breton–French statistical machine translation. In: Proceedings of the 13th annual conference of the european association of machine translation, EAMT09, Barcelona, pp 213–217
Wu J, Hou H, Shen Z, Du J, Li J (2016) Adapting attention-based neural network to low-resource Mongolian–Chinese machine translation, pp 470–480
Xiang B, Deng Y, Zhou B (2010) Diversify and combine: improving word alignment for machine translation on low-resource languages. In: Proceedings of the ACL 2010 conference short papers. Association for Computational Linguistics, Uppsala, pp 22–26
Zoph B, Yuret D, May J, Knight K (2016) Transfer learning for low-resource neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, TX, pp 1568–1575. https://doi.org/10.18653/v1/D16-1163
Acknowledgements
Authors would like to thank Department of Computer Science and Engineering, National Institute of Technology Mizoram, for providing the requisite support and infrastructure to execute this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Pathak, A., Pakray, P. & Bentham, J. English–Mizo Machine Translation using neural and statistical approaches. Neural Comput & Applic 31, 7615–7631 (2019). https://doi.org/10.1007/s00521-018-3601-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3601-3