Abstract
Machine Translation is a problem that consists of automating the task of translating a sentence into another target language done by a computer, and is still in research, especially with low-resource languages. The neoteric introduction of attention techniques inside the Natural Language Processing (NLP) field in coalescence with a broader disposal of word-segmentation and Web Scrapping techniques; including the lack of a proper online tool translation for Nahuatl dialect, inspired this work in an effort to produce such a tool. Once availability of suitable corpus via Web Scrapping is searched for with scrutiny, therefore, doubling the state of the art in parallel phrases; several vocabulary files were produced using two sorts of word segmentation tools in order to extract the morphemes and break down the agglutination Nahuatl contains. By performing a comparative analysis between Recurrent Neural Networks (RNNs) and Transformers, incorporating two segmentation techniques and two different corpus, it is possible to improve the state of the art regarding Nahuatl by more than four times the BLEU score (66.45) with second validation by using a Fuzzy similarity library. Such experiments confirmed the hypothesis that by increasing the corpus size by double, using transformers and sub-word segmentation, a translation from Spanish to Nahuatl is the best approach that can be accomplished so far with the current tools; outperforming many times Statistical Machine Translation (SMT) and RNNs which do not contain attention, plus the deployment of an application that serves as a platform for the language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
GitHub repository https://github.com/seatgeek/fuzzywuzzy.
- 2.
iTlajtol repo https://github.com/i-khalil-s/iTlajtol.
References
Aiken, M.: An updated evaluation of Google translate accuracy. Stud. Linguist. Lit. 3, 253 (2019). https://doi.org/10.22158/sll.v3n3p253
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. ArXiv 1409, September 2014
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1171–1179. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/e995f98d56967d946471af29d7bf99f1-Paper.pdf
Carolina, E., Cerbón, V., Gutierrez-vasques, X.: Recopilación de un corpus paralelo electrónico para una lengua minoritaria: el caso del español-náhuatl, January 2015
Charoenpornsawat, P., Sornlertlamvanich, V., Charoenporn, T.: Improving translation quality of rule-based machine translation. In: COLING-2002: Machine Translation in Asia (2002)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Eberhard, D.M., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, 23 edn. SIL International, Dallas (2020). https://www.ethnologue.com/language/nhe
Gutierrez-vasques, X., Medina-Urrea, A., Sierra, G.: Morphological segmentation for extracting Spanish-Nahuatl bilingual lexicon. Procesamiento de Lenguaje Natural 63, 41–48 (2019). https://doi.org/10.26342/2019-63-4
Gutierrez-Vasques, X., Sierra, G., Pompa, I.H.: Axolotl: a web accessible parallel corpus for Spanish-Nahuatl. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 4210–4214. European Language Resources Association (ELRA), Portorož, May 2016. https://www.aclweb.org/anthology/L16-1666
Gutierrez-Vasques, X., Sierra, G., Pompa, I.H.: Axolotl corpus paralelo náhuatl-español beta (2020). https://axolotl-corpus.mx/search
Instituto Nacional de Antropología e Historia (Mexico): CEN juntamente: compendio enciclopédico del Náhuatl. Instituto Nacional de Antropología e Historia (2009). https://books.google.com.mx/books?id=JccvxgEACAAJ
Howard, J., Gugger, S.: Fastai: a layered API for deep learning. Information 11(2), 108 (2020). https://doi.org/10.3390/info11020108
Microsoft Inc.: Microsoft translator community partners (2016). https://www.microsoft.com/en-us/translator/business/community/
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.M.: OpenNMT: open-source toolkit for neural machine translation. CoRR abs/1701.02810 (2017). http://arxiv.org/abs/1701.02810
Koehn, P.: Statistical significance tests for machine translation evaluation, pp. 388–395, January 2004
Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)
Kudo, T.: Subword regularization: improving neural network translation models with multiple subword candidates. CoRR abs/1804.10959 (2018). http://arxiv.org/abs/1804.10959
Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. CoRR abs/1808.06226 (2018). http://arxiv.org/abs/1808.06226
SIL International (formerly known as the Summer Institute of Linguistics): ISO 639 code tables (2020). https://iso639-3.sil.org/code_tables/639/data/n?name_3=nahuatl
Liu, Q., Wang, J., Zhang, D., Yang, Y., Wang, N.: Text features extraction based on TF-IDF associating semantic. In: 2018 IEEE 4th International Conference on Computer and Communications (ICCC), pp. 2338–2343, December 2018. https://doi.org/10.1109/CompComm.2018.8780663
Luong, M., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. CoRR abs/1508.04025 (2015). http://arxiv.org/abs/1508.04025
Mager, M., Gutierrez-Vasques, X., Sierra, G., Meza-Ruíz, I.V.: Challenges of language technologies for the indigenous languages of the Americas. CoRR abs/1806.04291 (2018). http://arxiv.org/abs/1806.04291
Mager, M., Meza, I.: Hacia la traducción automática de las lenguas indígenas de méxico, June 2018
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation, October 2002. https://doi.org/10.3115/1073083.1073135
TheWordPoint: What is the most translated website in the world? (2020). https://thewordpoint.com/blog/worlds-most-translated-website
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2018). https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR abs/1910.10683 (2019). http://arxiv.org/abs/1910.10683
Ríos Dolores, J.C., Sierra Martínez, G.E.: Traducción automática náhuatl-español: variables que influyen en la calidad de la traducción. Master’s thesis, Universidad Nacional Autónoma de México, September 2019. http://132.248.9.195/ptd2019/septiembre/0795765/Index.html
Somers, H.: Example-based machine translation. Mach. Transl. 14(2), 113–157 (1999). https://doi.org/10.1023/A:1008109312730
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014). http://arxiv.org/abs/1409.3215
Thouvenot, M.: Gran diccionario náhuatl (2005). http://www.gdn.unam.mx/
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Villegas, M.R.: Diccionario aulex náhuatl español, April 2019. https://aulex.org/ayuda/nahuatl.php
Virpioja, S., Smit, P., Grönroos, S., Kurimo, M.: Morfessor 2.0: Python implementation and extensions for Morfessor Baseline (2013)
Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989). https://doi.org/10.1162/neco.1989.1.2.270
Witnesses, J.: How is our literature written and translated? (2021). https://www.jw.org/en/library/books/jehovahs-will/literature-written-and-translated/
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation, September 2016
Zhang, S., Frey, B., Bansal, M.: ChrEn: Cherokee-English machine translation for endangered language revitalization (2020)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bello García, S.K., Sánchez Lucero, E., Bonilla Huerta, E., Hernández Hernández, J.C., Ramírez Cruz, J.F., Pedroza Méndez, B.E. (2021). Nahuatl Neural Machine Translation Using Attention Based Architectures: A Comparative Analysis for RNNs and Transformers as a Mobile Application Service. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-89820-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89819-9
Online ISBN: 978-3-030-89820-5
eBook Packages: Computer ScienceComputer Science (R0)