Skip to main content

Nahuatl Neural Machine Translation Using Attention Based Architectures: A Comparative Analysis for RNNs and Transformers as a Mobile Application Service

  • Conference paper
  • First Online:
Advances in Soft Computing (MICAI 2021)

Abstract

Machine Translation is a problem that consists of automating the task of translating a sentence into another target language done by a computer, and is still in research, especially with low-resource languages. The neoteric introduction of attention techniques inside the Natural Language Processing (NLP) field in coalescence with a broader disposal of word-segmentation and Web Scrapping techniques; including the lack of a proper online tool translation for Nahuatl dialect, inspired this work in an effort to produce such a tool. Once availability of suitable corpus via Web Scrapping is searched for with scrutiny, therefore, doubling the state of the art in parallel phrases; several vocabulary files were produced using two sorts of word segmentation tools in order to extract the morphemes and break down the agglutination Nahuatl contains. By performing a comparative analysis between Recurrent Neural Networks (RNNs) and Transformers, incorporating two segmentation techniques and two different corpus, it is possible to improve the state of the art regarding Nahuatl by more than four times the BLEU score (66.45) with second validation by using a Fuzzy similarity library. Such experiments confirmed the hypothesis that by increasing the corpus size by double, using transformers and sub-word segmentation, a translation from Spanish to Nahuatl is the best approach that can be accomplished so far with the current tools; outperforming many times Statistical Machine Translation (SMT) and RNNs which do not contain attention, plus the deployment of an application that serves as a platform for the language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    GitHub repository https://github.com/seatgeek/fuzzywuzzy.

  2. 2.

    iTlajtol repo https://github.com/i-khalil-s/iTlajtol.

References

  1. Aiken, M.: An updated evaluation of Google translate accuracy. Stud. Linguist. Lit. 3, 253 (2019). https://doi.org/10.22158/sll.v3n3p253

    Article  Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. ArXiv 1409, September 2014

    Google Scholar 

  3. Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1171–1179. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/e995f98d56967d946471af29d7bf99f1-Paper.pdf

  4. Carolina, E., Cerbón, V., Gutierrez-vasques, X.: Recopilación de un corpus paralelo electrónico para una lengua minoritaria: el caso del español-náhuatl, January 2015

    Google Scholar 

  5. Charoenpornsawat, P., Sornlertlamvanich, V., Charoenporn, T.: Improving translation quality of rule-based machine translation. In: COLING-2002: Machine Translation in Asia (2002)

    Google Scholar 

  6. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805

  7. Eberhard, D.M., Simons, G.F., Fennig, C.D.: Ethnologue: Languages of the World, 23 edn. SIL International, Dallas (2020). https://www.ethnologue.com/language/nhe

  8. Gutierrez-vasques, X., Medina-Urrea, A., Sierra, G.: Morphological segmentation for extracting Spanish-Nahuatl bilingual lexicon. Procesamiento de Lenguaje Natural 63, 41–48 (2019). https://doi.org/10.26342/2019-63-4

    Article  Google Scholar 

  9. Gutierrez-Vasques, X., Sierra, G., Pompa, I.H.: Axolotl: a web accessible parallel corpus for Spanish-Nahuatl. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 4210–4214. European Language Resources Association (ELRA), Portorož, May 2016. https://www.aclweb.org/anthology/L16-1666

  10. Gutierrez-Vasques, X., Sierra, G., Pompa, I.H.: Axolotl corpus paralelo náhuatl-español beta (2020). https://axolotl-corpus.mx/search

  11. Instituto Nacional de Antropología e Historia (Mexico): CEN juntamente: compendio enciclopédico del Náhuatl. Instituto Nacional de Antropología e Historia (2009). https://books.google.com.mx/books?id=JccvxgEACAAJ

  12. Howard, J., Gugger, S.: Fastai: a layered API for deep learning. Information 11(2), 108 (2020). https://doi.org/10.3390/info11020108

    Article  Google Scholar 

  13. Microsoft Inc.: Microsoft translator community partners (2016). https://www.microsoft.com/en-us/translator/business/community/

  14. Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.M.: OpenNMT: open-source toolkit for neural machine translation. CoRR abs/1701.02810 (2017). http://arxiv.org/abs/1701.02810

  15. Koehn, P.: Statistical significance tests for machine translation evaluation, pp. 388–395, January 2004

    Google Scholar 

  16. Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)

    Book  Google Scholar 

  17. Kudo, T.: Subword regularization: improving neural network translation models with multiple subword candidates. CoRR abs/1804.10959 (2018). http://arxiv.org/abs/1804.10959

  18. Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. CoRR abs/1808.06226 (2018). http://arxiv.org/abs/1808.06226

  19. SIL International (formerly known as the Summer Institute of Linguistics): ISO 639 code tables (2020). https://iso639-3.sil.org/code_tables/639/data/n?name_3=nahuatl

  20. Liu, Q., Wang, J., Zhang, D., Yang, Y., Wang, N.: Text features extraction based on TF-IDF associating semantic. In: 2018 IEEE 4th International Conference on Computer and Communications (ICCC), pp. 2338–2343, December 2018. https://doi.org/10.1109/CompComm.2018.8780663

  21. Luong, M., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. CoRR abs/1508.04025 (2015). http://arxiv.org/abs/1508.04025

  22. Mager, M., Gutierrez-Vasques, X., Sierra, G., Meza-Ruíz, I.V.: Challenges of language technologies for the indigenous languages of the Americas. CoRR abs/1806.04291 (2018). http://arxiv.org/abs/1806.04291

  23. Mager, M., Meza, I.: Hacia la traducción automática de las lenguas indígenas de méxico, June 2018

    Google Scholar 

  24. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation, October 2002. https://doi.org/10.3115/1073083.1073135

  25. TheWordPoint: What is the most translated website in the world? (2020). https://thewordpoint.com/blog/worlds-most-translated-website

  26. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2018). https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf

  27. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR abs/1910.10683 (2019). http://arxiv.org/abs/1910.10683

  28. Ríos Dolores, J.C., Sierra Martínez, G.E.: Traducción automática náhuatl-español: variables que influyen en la calidad de la traducción. Master’s thesis, Universidad Nacional Autónoma de México, September 2019. http://132.248.9.195/ptd2019/septiembre/0795765/Index.html

  29. Somers, H.: Example-based machine translation. Mach. Transl. 14(2), 113–157 (1999). https://doi.org/10.1023/A:1008109312730

    Article  MathSciNet  Google Scholar 

  30. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014). http://arxiv.org/abs/1409.3215

  31. Thouvenot, M.: Gran diccionario náhuatl (2005). http://www.gdn.unam.mx/

  32. Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762

  33. Villegas, M.R.: Diccionario aulex náhuatl español, April 2019. https://aulex.org/ayuda/nahuatl.php

  34. Virpioja, S., Smit, P., Grönroos, S., Kurimo, M.: Morfessor 2.0: Python implementation and extensions for Morfessor Baseline (2013)

    Google Scholar 

  35. Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989). https://doi.org/10.1162/neco.1989.1.2.270

    Article  Google Scholar 

  36. Witnesses, J.: How is our literature written and translated? (2021). https://www.jw.org/en/library/books/jehovahs-will/literature-written-and-translated/

  37. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation, September 2016

    Google Scholar 

  38. Zhang, S., Frey, B., Bansal, M.: ChrEn: Cherokee-English machine translation for endangered language revitalization (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sergio Khalil Bello García or Eduardo Sánchez Lucero .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bello García, S.K., Sánchez Lucero, E., Bonilla Huerta, E., Hernández Hernández, J.C., Ramírez Cruz, J.F., Pedroza Méndez, B.E. (2021). Nahuatl Neural Machine Translation Using Attention Based Architectures: A Comparative Analysis for RNNs and Transformers as a Mobile Application Service. In: Batyrshin, I., Gelbukh, A., Sidorov, G. (eds) Advances in Soft Computing. MICAI 2021. Lecture Notes in Computer Science(), vol 13068. Springer, Cham. https://doi.org/10.1007/978-3-030-89820-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89820-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89819-9

  • Online ISBN: 978-3-030-89820-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics