Implementation of Neural Machine Translation for Nahuatl as a Web Platform: A Focus on Text Translation

García, S. Khalil Bello; Lucero, E. Sánchez; Huerta, E. Bonilla; Hernández, J. Crispín Hernández; Cruz, J. Federico Ramírez; Méndez, B. Estela Pedroza

doi:10.1134/S0361768821080168

Implementation of Neural Machine Translation for Nahuatl as a Web Platform: A Focus on Text Translation

Published: 28 December 2021

Volume 47, pages 778–792, (2021)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

S. Khalil Bello García¹,
E. Sánchez Lucero¹,
E. Bonilla Huerta¹,
J. Crispín Hernández Hernández¹,
J. Federico Ramírez Cruz¹ &
…
B. Estela Pedroza Méndez¹

225 Accesses
4 Citations
Explore all metrics

Abstract

There are few on-line platforms related to Natural Language Processing and zero services of machine translation for Nahuatl as a low-resource language. However, Nahuatl has had academical implementations on machine translation, from Statistical Machine Translation (SMT) to Neural Machine Translation (NMT), in specific Recurrent Neural Networks (RNNs). This research aims to create a platform that can address this issue with text, voice and Text-To-Speech features. In particular, the current paper presents several advancements on text translation as a comparative analysis between two attention architectures, transformers and RNNs using several models that combine such architectures, two parallel corpuses, and two tokenization techniques. Additionally, the development of a platform and iOS application client is described. A new and bigger corpus, over 35,000 pairs, is made to improve the state of the art, where a conscious cleaning of it shows a reduction on the religious bias presented on the source text. The model performance is evaluated with % BLEU in order to conduct a direct comparative on previous Nahuatl machine translation works. The results outperformed those works with a score of 66.45 at best using transformers compared to 34.78 and 14.28 for RNNs and SMT respectively, confirming that transformers and a sub-word tokenization are the best combination so far for Nahuatl Machine translation. Moreover, emerging behaviors were observed in the Transformers, where a subtle pleonasm seen only in rural locations where Mexican Spanish is spoken arouse from the model, linking its origin to Nahuatl, as well as the ability of the model of transforming numbers from base 10 to base 20. Finally, some out of corpus translations were presented to a Nahuatl speaker where the model demonstrated a good performance and retention of information for its size. This research seeks to be used as a framework of how a polysynthetic language can be manipulated to be used for different languages like Spanish, English or Russian. This research work was carried out at the “Tecnológico Nacional de México” (TecNM), campus “Instituto Tecnológico de Apizaco” (ITA).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nahuatl Neural Machine Translation Using Attention Based Architectures: A Comparative Analysis for RNNs and Transformers as a Mobile Application Service

Neural Machine Translation from English to Marathi Using Various Techniques

Neural Machine Translation of Low Resource Languages: Application to Transcriptions of Tunisian Dialect

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

REFERENCES

Bello García, S.K., Sánchez Lucero, E., Pedroza Méndez, B.E., Hernández Hernández, J.C., Bonilla Huerta, E., and Ramírez Cruz, J.F., Towards the implementation of an Attention-based neural machine translation with artificial pronunciation for Nahuatl as a mobile application, Proc. 8th Int. Conf. in Software Engineering Research and Innovation (CONISOFT), Nov. 2020, pp. 235–244. https://doi.org/10.1109/CONISOFT50191.2020.00041
Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al., Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144, 2016.
Vasquez, S. and Lewis, M., Melnet: a generative model for audio in the frequency domain, Jun. 2019. arXiv:906.01083.
Bahdanau, D., Cho, K., and Bengio, Y., Neural machine translation by jointly learning to align and translate, Sept. 2014. arXiv:1409.0473.
Du, H., Tian, X., Xie, L., and Li, H., Factorized WaveNet for voice conversion with limited data, Speech Commun., 2021, vol. 130, pp. 45–54.
Article Google Scholar
Mager, M., Mager, E., Medina-Urrea, A., Meza, I., and Kann, K., Lost in translation: analysis of information loss during machine translation between polysynthetic and fusional languages, July 2018. arXiv:1807.00286.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I., Attention is all you need, Proc. 31st Conf. on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, 2017.
Nolazco-Flores, J.A., Salgado-Garza, L.R., and Peña-Díaz, M., Speaker dependent ASRs for Huastec and Western-Huastec Náhuatl languages, in Pattern Recognition and Image Analysis. IbPRIA 2005, Springer, 2005, vol. 3523.
Google Scholar
Goodfellow, I., Bengio, Y., and Courville, A., Deep Learning, MIT Press, 2016, pp. 171–176. http://www.deeplearningbook.org.
MATH Google Scholar
Mager, M. and Meza, I., Hacia la traducción automática de las lenguas indígenas de México, Proc. Digital Humanities Conf., Mexico, Nov. 2018.
LeCun, Y., Bengio, Y., and Hinton, G., Deep learning, Nature, 2015, vol. 521, no. 7553, pp. 436–444. https://doi.org/10.1038/nature14539
Article Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W.J., Bleu: a method for automatic evaluation of machine translation, Proc. 40th Annu. Meeting of the Association for Computational Linguistics, Philadelphia, July 2002, pp. 311–318. https://doi.org/10.3115/1073083.1073135
Choi, H., Cho, K., and Bengio, Y., Fine-grained attention mechanism for neural machine translation, Neurocomputing, 2018, vol. 284, pp. 171–176. http://arxiv.org/ abs/1803.11407.
Article Google Scholar
Kudo, T. and Richardson, J., Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing, Proc. Conf. on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, 2018. http://arxiv.org/abs/1808.06226.
Iturrioza, L., Luis, J., and Paula, G.L., Gramatica Wixarika 1, Lincom Europa, 2006.
Google Scholar
Gastaldo, P., Delić, V., Perić, Z., Sečujski, M., Jakovljević, N., Nikolić, J., Mišković, D., Simić, N., Suzić, S., and Delić, T., Speech technology progress based on new machine learning paradigm, Comput. Intellig. Neurosci., 2019, vol. 2019, p. 4368036. https://doi.org/10.1155/2019/4368036
Google Scholar
Good, C., How much faster would it be to render toy story in 2011 compared to how long it took in 1995?, 2017. https://www.quora.com/How-much-faster-would-it-be-torender-Toy-Story-in-2011-compared-to-how-long-ittook-in-1995.
Witnesses, J., Various articles and magazines, 2021. https://www.jw.org/nch.
Witnesses, J., How is our literature written and translated?, 2021. https://www.jw.org/en/library/books/jehovahswill/literature-written-and-translated/.
Carolina, E., Cerbón, V., and Gutierrez-vasques, X., Recopilación de un corpus paralelo electrónico para una lengua minoritaria: el caso del español-náhuatl, Proc. Primer Congreso Int. el Patrimonio Cultural y las Nuevas Tecnologías. INAH 2015, Mexico, Jan. 2015.
Inc, A., Converting trained models to core ml, 2021. https://developer.apple.com/documentation/coreml/converting_trained_models_to_core_ml.
Howard, J., Biography. https://www.usfca.edu/faculty/jeremy-howard.
Thomas, R., Biography. https://www.usfca.edu/faculty/rachel-thomas.
Howard, J. and Gugger, S., Fastai: a layered API for deep learning, Information, 2020, vol. 11, no. 2, p. 108. arXiv:2002.04688. https://www.mdpi.com/2078-2489/11/2/108.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., and Amodei, D., Language models are few-shot learners, 2020. arXiv:2005.14165.
Zhang, S., Frey, B., and Bansal, M., ChrEn: cherokee-english machine translation for endangered language revitalization, Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), Nov. 2020, pp. 577–595. arXiv:2010.04791 [cs.CL].
S. I. (known as the Summer Institute of Linguistics), ISO 639 Code Tables, 2021. https://iso639-3.sil.org/code_tables/639/data/n?title=&field_iso639cd_st_mmbrshp_639_1_tid=All&name_3=nahuatl&field_iso639_element_scope_tid=All&field_iso639_language_type_tid=All& items_per_page=200.
Inc, A., Language support, 2021. https://cloud.google.com/translate/docs/languages.
Devlin, J., Chang, M., Lee, K., and Toutanova, K., BERT: pre-training of deep bidirectional transformers for language understanding, Proc. NAACL-HLT 2019, Minneapolis, June 2–7, 2019, pp. 4171–4186. http://arxiv.org/abs/1810.04805.
Sutskever, I., Vinyals, O., and Le, Q.V., Sequence to sequence learning with neural networks, Proc. 28th Conf. on Neural Information Processing Systems NIPS 2014, Montreal, 2014. http://arxiv.org/abs/1409.3215.
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., and Bengio, Y., Learning phrase representations using RNN encoder-decoder for statistical machine translation, Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), Doha, 2014. http://arxiv. org/abs/1406.1078.
Luong, M., Pham, H., and Manning, C.D., Effective approaches to attention-based neural machine translation, Proc. Conf. on Empirical Methods in Natural Language Processing, Lisbon, 2015. http://arxiv.org/abs/1508.04025.
I. de Transparencia Acceso a la Información Pública y Protección de Datos Personales del Estado de México y Municipios, Guías de información pública, Nov. 2012, https://www.infoem.org.mx/es/contenido/conocenos/publicaciones.
Inc, A., SwiftUI, Feb. 2021. https://developer.apple.com/xcode/swiftui/.
Kudo, T., Subword regularization: improving neural network translation models with multiple subword candidates, Proc. 56th Annu. Meeting of the Association for Computational Linguistics, Melbourne, 2018. arXiv:1804.10959.
Bing, M., Translator on-line, Feb. 2021. https://www.bing.com/translator/.
Dolores, J.C.R., Traducción automática náhuatl-español: variables que influyen en la calidad de la traducción, Master’s Thesis, Univ. Nacional Autónoma de México, 2019. http://132.248.9.195/ptd2019/septiembre/0795765/Index.html.
Halili, E.H., Apache JMeter: a Practical Beginner’s Guide to Automated Testing and Performance Measurement for Your Websites, Packt Publ. Ltd., 2008.
Google Scholar
Zhang, S., Frey, B., and Bansal, M., ChrEnTranslate: cherokee-english machine translation demo with quality estimation and corrective feedback, 2021. arXiv:2107.14800.
Ghukasyan, T., Yeshilbashyan, Y., and Avetisyan, K., Subwords-only alternatives to fast text for morphologically rich languages, Program. Comput. Software, 2021, vol. 47, pp. 56–66.
Article MathSciNet Google Scholar

Download references

ACKNOWLEDGMENTS

To my family that helped me carry out my master’s degree. To my thesis advisor Eduardo Sánchez Lucero who inspired me to develop a tool than can be used in a way of helping the maintenance and spreading of Nahuatl. This work is supported by Conacyt.

Author information

Authors and Affiliations

Division de Estudios de Posgrado e Investigación, Tecnológico Nacional de México (TecNM) campus Apizaco, Xoco, Benito Juárez, 03330, Ciudad de México, CDMX, México
S. Khalil Bello García, E. Sánchez Lucero, E. Bonilla Huerta, J. Crispín Hernández Hernández, J. Federico Ramírez Cruz & B. Estela Pedroza Méndez

Authors

S. Khalil Bello García
View author publications
You can also search for this author inPubMed Google Scholar
E. Sánchez Lucero
View author publications
You can also search for this author inPubMed Google Scholar
E. Bonilla Huerta
View author publications
You can also search for this author inPubMed Google Scholar
J. Crispín Hernández Hernández
View author publications
You can also search for this author inPubMed Google Scholar
J. Federico Ramírez Cruz
View author publications
You can also search for this author inPubMed Google Scholar
B. Estela Pedroza Méndez
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to S. Khalil Bello García, E. Sánchez Lucero, E. Bonilla Huerta, J. Crispín Hernández Hernández, J. Federico Ramírez Cruz or B. Estela Pedroza Méndez.

Additional information

This paper is an extension of work originally presented in: “Proceedings of 2020 8th Edition of the International Conference in Software Engineering Research and Innovation CONISOFT” 2020, Chetumal, México [1].

Rights and permissions

Reprints and permissions

About this article

Cite this article

García, S.K., Lucero, E.S., Huerta, E.B. et al. Implementation of Neural Machine Translation for Nahuatl as a Web Platform: A Focus on Text Translation. Program Comput Soft 47, 778–792 (2021). https://doi.org/10.1134/S0361768821080168

Download citation

Received: 10 July 2021
Revised: 23 July 2021
Accepted: 10 August 2021
Published: 28 December 2021
Issue Date: December 2021
DOI: https://doi.org/10.1134/S0361768821080168

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implementation of Neural Machine Translation for Nahuatl as a Web Platform: A Focus on Text Translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Nahuatl Neural Machine Translation Using Attention Based Architectures: A Comparative Analysis for RNNs and Transformers as a Mobile Application Service

Neural Machine Translation from English to Marathi Using Various Techniques

Neural Machine Translation of Low Resource Languages: Application to Transcriptions of Tunisian Dialect

Explore related subjects

REFERENCES

ACKNOWLEDGMENTS

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now