skip to main content
10.1145/3616855.3637828acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
abstract

Preserving Heritage: Developing a Translation Tool for Indigenous Dialects

Published:04 March 2024Publication History

ABSTRACT

The preservation and understanding of indigenous languages emerge as crucial, given their substantial contribution to the cultural and linguistic heritage of communities. Despite their undeniable value, these languages are threatened by extinction due to a dwindling number of native speakers and the predominance of oral traditions over written forms. In this context, this study aims to contribute to the conservation of these languages through the development of a Spanish-indigenous language translator. This research employs neural machine translation technology, investigating three distinct approaches: a translation model based on transformers, finetuning with a Finnish translator, and finetuning with a multilingual translator. The results obtained from these methodologies are promising, demonstrating competitive viability when compared to the limited existing research in this field of study.

References

  1. [n. d.]. Antiguo testamento en Wayuu. https://www.jw.org/guc/karaloutairua/biblia/wiwuliakat-genesis-nuchikimaajatkat-jesucristo/karaloutairua/G%C3%A9nesis/1/Google ScholarGoogle Scholar
  2. [n. d.]. Biblia en Wayuu, https://www.bible.com/es/bible/1584/MAT.1.GUC. https://www.bible.com/es/bible/1584/MAT.1.GUCGoogle ScholarGoogle Scholar
  3. [n. d.]. Visor Biblia Iku. https://www.fdpm-co.org/es/nuestros-servicios/traducci%C3%B3n-b%C3%ADblica/biblia-iku/visor-biblia-ikuGoogle ScholarGoogle Scholar
  4. 2012. Putunkaa Serruma: Duérmete, pajarito blanco. Arrullos y relatos indígenas de cinco etnias colombianas.Google ScholarGoogle Scholar
  5. 2014. Niwi úmuke pari ayunnuga, Cantando desde la Sierra.Google ScholarGoogle Scholar
  6. Rafael Jose Negrette Amaya. 2021. OSF spanish-wayuunaki. https://osf.io/6kbze/Google ScholarGoogle Scholar
  7. Centro Colombiano de Estudios de Lenguas Aborígenes. 1994. Constitución Política de 1991 traducida a Lenguas Indígenas.Google ScholarGoogle Scholar
  8. El Centro Colombiano de Estudios de Lenguas Aborígenes (C.C.E.L.A). 1994. Estructuras sintácticas de la predicación: lenguas amerindias de Colombia.Google ScholarGoogle Scholar
  9. Autoridad Nacional de Gobierno Indígena -- ONIC. 2015. 65 Lenguas Nativas de las 69 en Colombia son Indígenas. https://www.onic.org.co/noticias/636-65-lenguas-nativas-de-las-69-en-colombia-son-indigenasGoogle ScholarGoogle Scholar
  10. Nora Graichen, Josef Van Genabith, and Cristina España-bonet. 2023. Enriching Wayúunaiki-Spanish Neural Machine Translation with Linguistic Information. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, and Katharina Kann (Eds.). Association for Computational Linguistics, Toronto, Canada, 67--83. https://doi.org/10.18653/v1/2023.americasnlp-1.9Google ScholarGoogle ScholarCross RefCross Ref
  11. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Fed- erico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, Prague, Czech Republic, 177--180. https://aclanthology.org/P07--2045Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jesús Manuel Mager Hois, Carlos Barron Romero, and Ivan Vladimir Meza Ruíz. 2016. Traductor estadístico wixarika - español usando descomposición morfológica. COMTEL 6 (sep 2016).Google ScholarGoogle Scholar
  13. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania) (ACL '02). Association for Computational Linguistics, USA, 311--318. https://doi.org/10.3115/1073083.1073135Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Ondřej Bojar, Rajan Chatterjee, Christian Federmann, Barry Haddow, Chris Hokamp, Matthias Huck, Varvara Logacheva, and Pavel Pecina (Eds.). Association for Computational Linguistics, Lisbon, Portugal, 392--395. https://doi.org/10.18653/v1/W15--3049Google ScholarGoogle ScholarCross RefCross Ref
  15. NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. 2022. No Language Left Behind: Scaling Human-Centered Machine Translation. arXiv:2207.04672 [cs.CL]Google ScholarGoogle Scholar
  16. Microsoft Translator. 2020. Marian - an efficient Neural Machine Translation framework written in pure C++. Mainly developed at Microsoft Translator and at the University of Edinburgh. https://huggingface.co/transformers/v3.5.1/model_doc/marian.htmlGoogle ScholarGoogle Scholar
  17. Geraldyn Otavo Rodríguez y Melissa Lizette Portilla Narváez. 2022. Relatos ancestrales: una alternativa para la preservación de la identidad cultural oral del territorio Inga.Google ScholarGoogle Scholar
  18. Aldo Andrés Álvarez López. 2022. Recopilación de corpus paralelo español-guaraní y experimentos iniciales con traductor automático estadístico. Revista sobre estudios e investigaciones del saber académico 17 (dic. 2022), e2023003. https://revistas.uni.edu.py/index.php/rseisa/article/view/342Google ScholarGoogle Scholar

Index Terms

  1. Preserving Heritage: Developing a Translation Tool for Indigenous Dialects

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Article Metrics

      • Downloads (Last 12 months)63
      • Downloads (Last 6 weeks)22

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader