ABSTRACT
The preservation and understanding of indigenous languages emerge as crucial, given their substantial contribution to the cultural and linguistic heritage of communities. Despite their undeniable value, these languages are threatened by extinction due to a dwindling number of native speakers and the predominance of oral traditions over written forms. In this context, this study aims to contribute to the conservation of these languages through the development of a Spanish-indigenous language translator. This research employs neural machine translation technology, investigating three distinct approaches: a translation model based on transformers, finetuning with a Finnish translator, and finetuning with a multilingual translator. The results obtained from these methodologies are promising, demonstrating competitive viability when compared to the limited existing research in this field of study.
- [n. d.]. Antiguo testamento en Wayuu. https://www.jw.org/guc/karaloutairua/biblia/wiwuliakat-genesis-nuchikimaajatkat-jesucristo/karaloutairua/G%C3%A9nesis/1/Google Scholar
- [n. d.]. Biblia en Wayuu, https://www.bible.com/es/bible/1584/MAT.1.GUC. https://www.bible.com/es/bible/1584/MAT.1.GUCGoogle Scholar
- [n. d.]. Visor Biblia Iku. https://www.fdpm-co.org/es/nuestros-servicios/traducci%C3%B3n-b%C3%ADblica/biblia-iku/visor-biblia-ikuGoogle Scholar
- 2012. Putunkaa Serruma: Duérmete, pajarito blanco. Arrullos y relatos indígenas de cinco etnias colombianas.Google Scholar
- 2014. Niwi úmuke pari ayunnuga, Cantando desde la Sierra.Google Scholar
- Rafael Jose Negrette Amaya. 2021. OSF spanish-wayuunaki. https://osf.io/6kbze/Google Scholar
- Centro Colombiano de Estudios de Lenguas Aborígenes. 1994. Constitución Política de 1991 traducida a Lenguas Indígenas.Google Scholar
- El Centro Colombiano de Estudios de Lenguas Aborígenes (C.C.E.L.A). 1994. Estructuras sintácticas de la predicación: lenguas amerindias de Colombia.Google Scholar
- Autoridad Nacional de Gobierno Indígena -- ONIC. 2015. 65 Lenguas Nativas de las 69 en Colombia son Indígenas. https://www.onic.org.co/noticias/636-65-lenguas-nativas-de-las-69-en-colombia-son-indigenasGoogle Scholar
- Nora Graichen, Josef Van Genabith, and Cristina España-bonet. 2023. Enriching Wayúunaiki-Spanish Neural Machine Translation with Linguistic Information. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, and Katharina Kann (Eds.). Association for Computational Linguistics, Toronto, Canada, 67--83. https://doi.org/10.18653/v1/2023.americasnlp-1.9Google ScholarCross Ref
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Fed- erico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, Prague, Czech Republic, 177--180. https://aclanthology.org/P07--2045Google ScholarDigital Library
- Jesús Manuel Mager Hois, Carlos Barron Romero, and Ivan Vladimir Meza Ruíz. 2016. Traductor estadístico wixarika - español usando descomposición morfológica. COMTEL 6 (sep 2016).Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania) (ACL '02). Association for Computational Linguistics, USA, 311--318. https://doi.org/10.3115/1073083.1073135Google ScholarDigital Library
- Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Ondřej Bojar, Rajan Chatterjee, Christian Federmann, Barry Haddow, Chris Hokamp, Matthias Huck, Varvara Logacheva, and Pavel Pecina (Eds.). Association for Computational Linguistics, Lisbon, Portugal, 392--395. https://doi.org/10.18653/v1/W15--3049Google ScholarCross Ref
- NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. 2022. No Language Left Behind: Scaling Human-Centered Machine Translation. arXiv:2207.04672 [cs.CL]Google Scholar
- Microsoft Translator. 2020. Marian - an efficient Neural Machine Translation framework written in pure C++. Mainly developed at Microsoft Translator and at the University of Edinburgh. https://huggingface.co/transformers/v3.5.1/model_doc/marian.htmlGoogle Scholar
- Geraldyn Otavo Rodríguez y Melissa Lizette Portilla Narváez. 2022. Relatos ancestrales: una alternativa para la preservación de la identidad cultural oral del territorio Inga.Google Scholar
- Aldo Andrés Álvarez López. 2022. Recopilación de corpus paralelo español-guaraní y experimentos iniciales con traductor automático estadístico. Revista sobre estudios e investigaciones del saber académico 17 (dic. 2022), e2023003. https://revistas.uni.edu.py/index.php/rseisa/article/view/342Google Scholar
Index Terms
- Preserving Heritage: Developing a Translation Tool for Indigenous Dialects
Recommendations
Simple measures of bridging lexical divergence help unsupervised neural machine translation for low-resource languages
AbstractUnsupervised Neural Machine Translation (UNMT) approaches have gained widespread popularity in recent times. Though these approaches show impressive translation performance using only monolingual corpora of the languages involved, these approaches ...
Machine Translation for Historical Research: A Case Study of Aramaic-Ancient Hebrew Translations
In this article, by the ability to translate Aramaic to another spoken languages, we investigated machine translation in a cultural heritage domain for two primary purposes: evaluating the quality of ancient translations and preserving Aramaic (an ...
Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource Languages
Phrase-based machine translation (MT) systems require large bilingual corpora for training. Nevertheless, such large bilingual corpora are unavailable for most language pairs in the world, causing a bottleneck for the development of MT. For the Asian ...
Comments