research-article

Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages

Authors:
Rupjyoti Baruah

Indian Institute Technology, BHU, Varanasi, Uttar-pradesh, INDIA

Indian Institute Technology, BHU, Varanasi, Uttar-pradesh, INDIA
View Profile

,
Rajesh Kumar Mundotiya

Indian Institute Technology, BHU, Varanasi, Uttar-pradesh, INDIA

Indian Institute Technology, BHU, Varanasi, Uttar-pradesh, INDIA
View Profile

,
Anil Kumar Singh

Indian Institute Technology, BHU, Varanasi, Uttar-pradesh, INDIA

Indian Institute Technology, BHU, Varanasi, Uttar-pradesh, INDIA
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21 Issue 1Article No.: 19pp 1–32https://doi.org/10.1145/3469721

Published:16 November 2021Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Machine translation (MT) systems have been built using numerous different techniques for bridging the language barriers. These techniques are broadly categorized into approaches like Statistical Machine Translation (SMT) and Neural Machine Translation (NMT). End-to-end NMT systems significantly outperform SMT in translation quality on many language pairs, especially those with the adequate parallel corpus. We report comparative experiments on baseline MT systems for Assamese to other Indo-Aryan languages (in both translation directions) using the traditional Phrase-Based SMT as well as some more successful NMT architectures, namely basic sequence-to-sequence model with attention, Transformer, and finetuned Transformer. The results are evaluated using the most prominent and popular standard automatic metric BLEU (BiLingual Evaluation Understudy), as well as other well-known metrics for exploring the performance of different baseline MT systems, since this is the first such work involving Assamese. The evaluation scores are compared for SMT and NMT models for the effectiveness of bi-directional language pairs involving Assamese and other Indo-Aryan languages (Bangla, Gujarati, Hindi, Marathi, Odia, Sinhalese, and Urdu). The highest BLEU scores obtained are for Assamese to Sinhalese for SMT (35.63) and the Assamese to Bangla for NMT systems (seq2seq is 50.92, Transformer is 50.01, and finetuned Transformer is 50.19). We also try to relate the results with the language characteristics, distances, family trees, domains, data sizes, and sentence lengths. We find that the effect of the domain is the most important factor affecting the results for the given data domains and sizes. We compare our results with the only existing MT system for Assamese (Bing Translator) and also with pairs involving Hindi.

REFERENCES

[1] Antony P. J.. 2013. Machine translation approaches and survey for Indian languages. In Proceedings of the International Journal of Computational Linguistics & Chinese Language Processing, Vol. 18.Google Scholar
[2] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2014. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), San Diego, CA, USA, May 7-9, 2015, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.0473.Google Scholar
[3] Baker Paul, Hardie Andrew, McEnery Tony, Cunningham Hamish, and Gaizauskas Robert J.. 2002. EMILLE, A 67-Million word corpus of Indic languages: Data collection, mark-up and harmonisation. In Proceedings of the 3rd International Conference on Language Resources and Evaluation.Google Scholar
[4] Banerjee Satanjeev and Lavie Alon. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 65–72.Google ScholarDigital Library
[5] Banik Debajyoty, Ekbal Asif, Bhattacharyya Pushpak, and Bhattacharyya Siddhartha. 2019. Assembling translations from multi-engine machine translation outputs. Applied Soft Computing 78 (2019), 230–239. DOI: https://doi.org/10.1016/j.asoc.2019.02.031Google ScholarDigital Library
[6] Baruah Kalyanee Kanchan, Das Pranjal, Hannan Abdul, and Sarma Shikhar Kr. 2014. Assamese-English Bilingual Machine Translation. CoRR abs/1407.2019. http://arxiv.org/abs/1407.2019.Google Scholar
[7] Bentivogli Luisa, Bisazza Arianna, Cettolo Mauro, and Federico Marcello. 2016. Neural versus Phrase-Based Machine Translation Quality: a Case Study. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 257–267.Google ScholarCross Ref
[8] Bojanowski Piotr, Grave Edouard, Joulin Armand, and Mikolov Tomas. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 1 (2017), 135–146.Google ScholarCross Ref
[9] Britz Denny, Goldie Anna, Luong Minh-Thang, and Le Quoc. 2017. Massive Exploration of Neural Machine Translation Architectures. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1442–1451.Google ScholarCross Ref
[10] Brown Peter F., Pietra Stephen A. Della, Pietra Vincent J. Della, and Mercer Robert L.. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19, 2 (1993), 263–311. Retrieved from https://www.aclweb.org/anthology/J93-2003.Google ScholarDigital Library
[11] Chatterji Sanjay, Roy Devshri, Sarkar Sudeshna, and Basu Anupam. 2009. A hybrid approach for bengali to hindi machine translation. In Proceedings of the ICON-2009 7th International Conference on Natural Language Processing. 81–91.Google Scholar
[12] Cho Kyunghyun, Merriënboer Bart van, Bahdanau Dzmitry, and Bengio Yoshua. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the SSST-8, 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 103–111.Google ScholarCross Ref
[13] Cho Kyunghyun, Merriënboer Bart Van, Gulcehre Caglar, Bahdanau Dzmitry, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1724–1734.Google Scholar
[14] Dabre Raj, Chu Chenhui, and Kunchukuttan Anoop. 2020. A survey of multilingual neural machine translation. ACM Computing Surveys 53, 5 (2020), 1–38.Google ScholarDigital Library
[15] Dargan Shaveta and Kumar Munish. 2019. Writer identification system for indic and non-indic scripts: State-of-the-art survey. Archives of Computational Methods in Engineering 26, 4 (2019), 1283–1311.Google ScholarCross Ref
[16] Das Ayan, Yerra Pranay, Kumar Ken, and Sarkar Sudeshna. 2016. A study of attention-based neural machine translation model on Indian languages. In Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing. 163–172.Google Scholar
[17] Das Pranjal and Baruah Kalyanee K.. 2014. Assamese to English statistical machine translation integrated with a transliteration module. International Journal of Computer Applications 100, 5 (2014), 20–24.Google ScholarCross Ref
[18] Deb Debajit. 2012. On case marking in assamese bengali and oriya. International Journal of Applied Linguistics & English Literature 1, 2 (2012), 102.Google ScholarCross Ref
[19] Doddington George. 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the 2nd International Conference on Human Language Technology Research. 138–145.Google ScholarCross Ref
[20] Durrani Nadir, Sajjad Hassan, Fraser Alexander, and Schmid Helmut. 2010. Hindi-to-Urdu machine translation through transliteration. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 465–474. Retrieved from https://www.aclweb.org/anthology/P10-1048.Google ScholarDigital Library
[21] Garje G. V. and Kharate G. K.. 2013. Survey of machine translation systems in India. International Journal on Natural Language Computing 2, 4 (2013), 47–65.Google ScholarCross Ref
[22] Gehring Jonas, Auli Michael, Grangier David, Yarats Denis, and Dauphin Yann N.. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning,Vol. 70. JMLR. org, 1243–1252.Google ScholarDigital Library
[23] Goyal Vikrant, Kumar Sourav, and Sharma Dipti Misra. 2020. Efficient neural machine translation for low-resource languages via exploiting related languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 162–168.Google Scholar
[24] Goyal Vishal and Lehal Gurpreet Singh. 2008. Comparative study of Hindi and Punjabi language scripts. Nepalese Linguistics 23 (2008), 67–82.Google Scholar
[25] Goyal Vishal and Lehal Gurpreet Singh. 2011. Hindi to Punjabi machine translation system. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations. Association for Computational Linguistics, 1–6.Google ScholarDigital Library
[26] Grave Edouard, Bojanowski Piotr, Gupta Prakhar, Joulin Armand, and Mikolov Tomas. 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation.Google Scholar
[27] Guzmán Francisco, Chen Peng-Jen, Ott Myle, Pino Juan, Lample Guillaume, Koehn Philipp, Chaudhary Vishrav, and Ranzato Marc’Aurelio. 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali–English and Sinhala–English. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 6100–6113.Google ScholarCross Ref
[28] Hasan Md Arid, Alam Firoj, Chowdhury Shammur Absar, and Khan Naira. 2019. Neural machine translation for the Bangla-English language pair. In Proceedings of the 2019 22nd International Conference on Computer and Information Technology. IEEE, 1–6.Google ScholarCross Ref
[29] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
[30] He Wei, He Zhongjun, Wu Hua, and Wang Haifeng. 2016. Improved neural machine translation with SMT features. In Proceedings of the 13th AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
[31] Heafield Kenneth. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the 6th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 187–197.Google Scholar
[32] Heafield Kenneth, Pouzyrevsky Ivan, Clark Jonathan H., and Koehn Philipp. 2013. Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Vol. 2. 690–696.Google Scholar
[33] Hoang Hieu and Koehn Philipp. 2008. Design of the moses decoder for statistical machine translation. In Proceedings of the Software Engineering, Testing, and Quality Assurance for Natural Language Processing. Association for Computational Linguistics, 58–65. Retrieved from https://www.aclweb.org/anthology/W08-0510.Google ScholarDigital Library
[34] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.Google ScholarDigital Library
[35] Ismail Tanvira and Singh L. Joyprakash. 2017. Dialect identification of assamese language using spectral features. Indian Journal of Science and Technology 10, 20 (2017), 1–7.Google ScholarCross Ref
[36] Isozaki Hideki, Hirao Tsutomu, Duh Kevin, Sudoh Katsuhito, and Tsukada Hajime. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 944–952.Google ScholarDigital Library
[37] Jawaid Bushra and Zeman Daniel. 2011. Word-order issues in english-to-urdu statistical machine translation. The Prague Bulletin of Mathematical Linguistics 95, 1 (2011), 87–106.Google ScholarCross Ref
[38] Jean Sébastien, Cho Kyunghyun, Memisevic Roland, and Bengio Yoshua. 2015. On using very large target vocabulary for neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing Vol. 1. Association for Computational Linguistics, 1–10. DOI: https://doi.org/10.3115/v1/P15-1001Google ScholarCross Ref
[39] Johnson Melvin, Schuster Mike, Le Quoc, Krikun Maxim, Wu Yonghui, Chen Zhifeng, Thorat Nikhil, Viégas Fernanda, Wattenberg Martin, Corrado Greg, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339–351. DOI: 10.1162/tacl_a_00065Google ScholarCross Ref
[40] Josan Gurpreet Singh and Lehal Gurpreet Singh. 2008. A Punjabi to Hindi machine translation system. In Proceedings of the 22nd International Conference on on Computational Linguistics. Association for Computational Linguistics, 157–160.Google Scholar
[41] Kakati Banikanta. 1953. Aspects of Early Assamese Literature-1953. Gauhati University.Google Scholar
[42] Kakati Banikanta and Goswami Golockchandra. 1962. Assamese, its Formation and Development: a Scientific Treatise on the History and Philology of the Assamese Language. Lawyer’s Book Stall.Google Scholar
[43] Kalchbrenner Nal and Blunsom Phil. 2013. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1700–1709. Retrieved from https://www.aclweb.org/anthology/D13-1176.Google Scholar
[44] Kalita Nayan Jyoti and Islam Baharul. 2015. Bengali to assamese statistical machine translation using moses (corpus based). CoRR abs/1504.01182. http://arxiv.org/abs/1504.01182.Google Scholar
[45] Kaur Harmandeep and Kumar Munish. 2018. A comprehensive survey on word recognition for non-Indic and Indic scripts. Pattern Analysis and Applications 21, 4 (2018), 897–929.Google ScholarDigital Library
[46] Khan Nadeem Jadoon, Anwar Waqas, and Durrani Nadir. 2017. Machine translation approaches and survey for indian languages. CoRR abs/1701.04290. http://arxiv.org/abs/1701.04290.Google Scholar
[47] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), San Diego, CA, USA, May 7-9, 2015. http://arxiv.org/abs/1412.6980.Google Scholar
[48] Klein Guillaume, Kim Yoon, Deng Yuntian, Senellart Jean, and Rush Alexander M.. 2017. Opennmt: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. 67–72.Google Scholar
[49] Koehn Philipp. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the MT Summit, Vol. 5. Citeseer, 79–86.Google Scholar
[50] Koehn Philipp, Hoang Hieu, Birch Alexandra, Callison-Burch Chris, Federico Marcello, Bertoldi Nicola, Cowan Brooke, Shen Wade, Moran Christine, Zens Richard, Chris Dyer, Ondej Bojar, Alex Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. 177–180.Google ScholarCross Ref
[51] Koehn Philipp and Knowles Rebecca. 2017. Six challenges for neural machine translation. In Proceedings of the 1st Workshop on Neural Machine Translation. 28–39.Google ScholarCross Ref
[52] Koehn Philipp, Och Franz Josef, and Marcu Daniel. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Vol. 1. Association for Computational Linguistics, 48–54.Google ScholarDigital Library
[53] Kumar Munish, Jindal M. K., and Sharma R. K.. 2011. Review on OCR for handwritten Indian scripts character recognition. In Proceedings of the International Conference on Digital Image Processing and Information Technology. Springer, 268–276.Google ScholarCross Ref
[54] Kumar Munish, Jindal M. K., and Sharma R. K.. 2016. A novel framework for grading of writers using offline Gurmukhi characters. Proceedings of the National Academy of Sciences, India Section A: Physical Sciences 86, 3 (2016), 405–415.Google ScholarCross Ref
[55] Kumar Munish, Jindal Manish Kumar, Sharma Rajendra Kumar, and Jindal Simpel Rani. 2019. Character and numeral recognition for non-Indic and Indic scripts: A survey. Artificial Intelligence Review 52, 4 (2019), 2235–2261.Google ScholarDigital Library
[56] Kumar Munish, Jindal Manish Kumar, Sharma Rajendra Kumar, and Jindal Simpel Rani. 2020. Performance evaluation of classifiers for the recognition of offline handwritten Gurmukhi characters and numerals: A study. Artificial Intelligence Review 53, 3 (2020), 2075–2097.Google ScholarCross Ref
[57] Kumar Munish, Jindal Simpel Rani, Jindal Manish Kumar, and Lehal Gurpreet Singh. 2019. Improved recognition results of medieval handwritten Gurmukhi manuscripts using boosting and bagging methodologies. Neural Processing Letters 50, 1 (2019), 43–56.Google ScholarDigital Library
[58] Lahiri Bornini. 2018. Classifiers in surjapuri. Jadavpur Journal of Languages and Linguistics 2, 1 (2018), 27–37.Google Scholar
[59] Laskar Sahinur Rahman, Khilji Abdullah Faiz Ur Rahman, Pakray Partha, and Bandyopadhyay Sivaji. 2020. EnAsCorp1. 0: English-Assamese Corpus. In Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages. 62–68.Google Scholar
[60] Lavie Alon. 2010. Evaluating the output of machine translation systems. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Tutorials. Association for Machine Translation in the Americas. https://aclanthology.org/2010.amta-tutorials.4.Google Scholar
[61] Leusch Gregor, Ueffing Nicola, and Ney Hermann. 2006. CDER: Efficient MT evaluation using block movements. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics.Google Scholar
[62] Levenshtein Vladimir I.. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 8 (1966), 707–710.Google Scholar
[63] Ling Wang, Luís Tiago, Marujo Luís, Astudillo Ramón Fernandez, Amir Silvio, Dyer Chris, Black Alan W., and Trancoso Isabel. 2015. Finding function in form: Compositional character models for open vocabulary word representation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1520–1530.Google Scholar
[64] Ling Wang, Tsvetkov Yulia, Amir Silvio, Fermandez Ramon, Dyer Chris, Black Alan W., Trancoso Isabel, and Lin Chu-Cheng. 2015. Not all contexts are created equal: Better word representations with variable attention. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1367–1372.Google ScholarCross Ref
[65] Luong Minh-Thang, Pham Hieu, and Manning Christopher D.. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 412–1421.Google Scholar
[66] Masica Colin P.. 1991. The Indo-Aryan Languages. Cambridge University Press, Cambridge.Google Scholar
[67] Masica Colin P.. 2005. A new survey of the Indo-Aryan languages. The Journal of the American Oriental Society 125, 1 (2005), 79–90.Google Scholar
[68] Mumin Mohammad Abdullah Al, Seddiqui Md Hanif, Iqbal Muhammed Zafar, and Islam Mohammed Jahirul. 2019. Neural machine translation for low-resource English-Bangla. Journal of Computer Science 15, 11 (2019), 1627–1637. DOI: https://doi.org/10.3844/jcssp.2019.1627.1637Google ScholarCross Ref
[69] Mundotiya Rajesh Kumar, Singh Manish Kumar, Kapur Rahul, Mishra Swasti, and Singh Anil Kumar. 2021. Basic linguistic resources and baselines for Bhojpuri, Magahi and Maithili for natural language processing. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 6, Article 95 (2021), 37 pages.Google Scholar
[70] Och Franz Josef. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Vol. 1. Association for Computational Linguistics, 160–167.Google ScholarDigital Library
[71] Och Franz Josef and Ney Hermann. 2003. A systematic comparison of various statistical alignment models. Computational linguistics 29, 1 (2003), 19–51.Google ScholarDigital Library
[72] Ojha Atul Kr, Kumar Ritesh, Bansal Akanksha, and Rani Priya. 2019. Panlingua-KMI MT system for similar language translation task at WMT 2019. In Proceedings of the 4th Conference on Machine Translation, Vol. 3. 213–218.Google ScholarCross Ref
[73] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311–318.Google Scholar
[74] PATTANAYAK D. P.. 2016. ORIYA and ASSAMESE. Current Trends in Linguistics.De Gruyter Mouton, 122–152.Google Scholar
[75] Philip Jerin, Namboodiri Vinay P., and Jawahar C. V.. 2019. A baseline neural machine translation system for Indian languages. CoRR abs/1907.12437 (2019). https://dblp.org/rec/journals/corr/abs-1907-12437.bib.Google Scholar
[76] Ramanathan Ananthakrishnan, Hegde Jayprasad, Shah Ritesh M., Bhattacharyya Pushpak, and M. Sasikumar2008. Simple syntactic and morphological processing can help English-Hindi statistical machine translation. In Proceedings of the 3rd International Joint Conference on Natural Language Processing, Vol. 1. Retrieved from https://www.aclweb.org/anthology/I08-1067.Google Scholar
[77] Ren Shuo, Zhang Zhirui, Liu Shujie, Zhou Ming, and Ma Shuai. 2019. Unsupervised neural machine translation with smt as posterior regularization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 241–248.Google ScholarDigital Library
[78] Revanuru Karthik, Turlapaty Kaushik, and Rao Shrisha. 2017. Neural machine translation of Indian languages. In Proceedings of the 10th Annual ACM India Compute Conference. ACM, 11–20.Google ScholarDigital Library
[79] Saharia Navanath, Konwar Kishori M., Sharma Utpal, and Kalita Jugal K.. 2013. An improved stemming approach using HMM for a highly inflectional language. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 164–173.Google ScholarDigital Library
[80] Sen Sukanta, Gupta Kamal Kumar, Ekbal Asif, and Bhattacharyya Pushpak. 2018. IITP-MT at WAT2018: Transformer-based multilingual Indic-English neural machine translation system. In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation.Google Scholar
[81] Sengupta Debapriya and Saha Goutam. 2015. Study on similarity among Indian languages using language verification framework. Advances in Artificial Intelligence 2015, Article 2 (2015), 1.Google ScholarDigital Library
[82] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Vol. 1. 86–96.Google ScholarCross Ref
[83] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Neural machine translation of rare words with Subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Vol. 1. 1715–1725.Google ScholarCross Ref
[84] Shah Parth and Bakrola Vishvajit. 2019. Neural machine translation system of Indic languages-an attention based approach. In Proceedings of the 2019 2nd International Conference on Advanced Computational and Communication Paradigms. IEEE, 1–5.Google ScholarCross Ref
[85] Singh Amitoj, Kadyan Virender, Kumar Munish, and Bassan Nancy. 2020. ASRoIL: A comprehensive survey for automatic speech recognition of Indian languages. Artificial Intelligence Review 53, 5 (2020), 3673–3704.Google ScholarDigital Library
[86] Singh Anil Kumar. 2010. Modeling and Application of Linguistic Similarity. Ph.D. Dissertation. International Institute of Information Technology, Hyderabad, India.Google Scholar
[87] Singh Muskaan, Kumar Ravinder, and Chana Inderveer. 2019. Neural-based machine translation system outperforming statistical phrase-based machine translation for low-resource languages. In Proceedings of the 2019 12th International Conference on Contemporary Computing. IEEE, 1–7.Google ScholarCross Ref
[88] Singh Moirangthem Tiken, Borgohain Rajdeep, and Gohain Sourav. 2014. An English-assamese machine translation system. International Journal of Computer Applications 93, 4 (2014), 1–6.Google Scholar
[89] Sinha R. Mahesh and K.. 2004. An engineering perspective of machine translation: anglabharti-II and anubharti-II architectures. In Proceedings of the International Symposium on Machine Translation, NLP and Translation Support System. 10–17.Google Scholar
[90] Snover Matthew, Dorr Bonnie, Schwartz Richard, Micciulla Linnea, and Makhoul John. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas, Vol. 200.Google Scholar
[91] Srivastava Nitish, Hinton Geoffrey, Krizhevsky Alex, Sutskever Ilya, and Salakhutdinov Ruslan. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929–1958. Retrieved from http://jmlr.org/papers/v15/srivastava14a.html.Google Scholar
[92] Sutskever I., Vinyals O., and Le Q. V.. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
[93] Tillmann Christoph, Vogel Stephan, Ney Hermann, Zubiaga Arkaitz, and Sawaf Hassan. 1997. Accelerated DP based search for statistical translation. In Proceedings of the 5th European Conference on Speech Communication and Technology.Google ScholarCross Ref
[94] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 5998–6008.Google Scholar
[95] Wang Xing, Lu Zhengdong, Tu Zhaopeng, Li Hang, Xiong Deyi, and Zhang Min. 2017. Neural machine translation advised by statistical machine translation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
[96] Wang Yining, Zhang Jiajun, Zhai Feifei, Xu Jingfang, and Zong Chengqing. 2018. Three strategies to improve one-to-many multilingual translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2955–2960.Google ScholarCross Ref
[97] Zoph Barret and Knight Kevin. 2016. Multi-source neural translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 30–34.Google ScholarCross Ref
[98] Zoph Barret, Yuret Deniz, May Jonathan, and Knight Kevin. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1568–1575.Google ScholarCross Ref

Index Terms

Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
      2. Machine translation

Recommendations

Neural Machine Translation of Indian Languages
Compute '17: Proceedings of the 10th Annual ACM India Compute Conference

Neural Machine Translation (NMT) is a new technique for machine translation that has led to remarkable improvements compared to rule-based and statistical machine translation (SMT) techniques, by overcoming many of the weaknesses in the conventional ...
Read More
Parallel Corpora Preparation for English-Amharic Machine Translation
Advances in Computational Intelligence
Abstract
In this paper, we describe the development of an English-Amharic parallel corpus and Machine Translation (MT) experiments conducted on it. Two different tests have been achieved. Statistical Machine Translation (SMT) and Neural Machine Translation ...
Read More
Deep Neural Network--based Machine Translation System Combination

Deep neural networks (DNNs) have provably enhanced the state-of-the-art natural language process (NLP) with their capability of feature learning and representation. As one of the more challenging NLP tasks, neural machine translation (NMT) becomes a new ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 1
January 2022
442 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3494068
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 November 2021
- Accepted: 1 June 2021
- Revised: 1 May 2021
- Received: 1 July 2020
Published in tallip Volume 21, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Machine translation
SMT
NMT
low resource
Assamese
Indo-Aryan
sequence-to-sequence
Transformer
finetuned Transformer
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 665
  Total Downloads
- Downloads (Last 12 months)228
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Neural Machine Translation of Indian Languages

Parallel Corpora Preparation for English-Amharic Machine Translation

Deep Neural Network--based Machine Translation System Combination

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Neural Machine Translation of Indian Languages

Parallel Corpora Preparation for English-Amharic Machine Translation

Deep Neural Network--based Machine Translation System Combination

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media