New Method for Automatic Recognition of Mexican Indigenous Languages: Comparative Performance of Classifiers

Huerta-Hernández, Luis David; Ramírez-Pacheco, Julio Cesar; Toral-Cruz, Homero; Aloufi, Khalid S.; de la Rosa Aguilar, Oscar Alonso; León-Borges, José Antonio

doi:10.1007/s42979-023-01985-w

New Method for Automatic Recognition of Mexican Indigenous Languages: Comparative Performance of Classifiers

Original Research
Published: 28 August 2023

Volume 4, article number 649, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

71 Accesses
Explore all metrics

Abstract

This work proposes a method for the automatic recognition of the Mexican indigenous languages (MILs): Mayan, Mixtec, Zapotec, Mixe, Nahuatl, Tarahumara, Mazahua, Tseltal, Chichimeco and Huichol. The long-term average spectrum (LTAS) is used as a feature for the language recognition process. In addition, the performance of classifiers such as multi-layer perceptron, sequential minimal optimization, naive Bayes, Simple Logistic and Logistic Model Tree is also highlighted. To reduce the features of the speech vector, the LTAS sequences extracted from the audios are first passed to BestFirst filters. In our experiments, high performance for MILs recognition was achieved using a simplified speech coding scheme with feature vectors with a low number of values. Our method is notable for its simplicity and efficiency, since it eliminates untested languages from the speech process. Different classifiers and tunning of its parameters were experimented with increase on accuracy of MILs recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

Article 12 October 2018

A lazy learning-based language identification from speech using MFCC-2 features

Article 28 January 2019

A Novel Approach for Spoken Language Identification and Performance Comparison Using Machine Learning-Based Classifiers and Neural Network

Data availability

All relevant data from the paper is available and can be requested from the corresponding author.

References

Sunija A, Rajisha T, Riyas K. Comparative study of different classifiers for Malayalam dialect recognition system. In: International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST), vol. 24, p. 1080–8, 2015.
Liu G, Lei Y, Hansen J. Dialect idenfitication: impact of differences between read versus sponateous speech. In: EUSIPCO2010: European Signal Processing Conference, p. 2003–6, 2010.
Ali A, Dehak N, Cardinal P, Khurana S, Glass J, Bell P, Renal S. Automatic dialect detection in Arabic broadcast speech. Proc Interspeech. 2016;2016:2934–8.
Google Scholar
Zongze R, Guofu Y, Shugong X. Two-stage training for chinese dialect recognition. In: Proc. Interspeech 2019, 2019.
Gray S, Hansen J. An integrated approach to the detection and classification of accents/dialects for a spoken document retrieval system. In: IEEE ASRU-2006, p. 35–40, 2006.
United Nations, “United Nations,” 2008. [Online]. Available: https://www.un.org/en/events/iyl/multilingualism.shtml. Accessed 2021 Feb 2021.
Martínez C, Zempoalteca A, Soancatl V, Estudillo M, Lara J, Alcántara S. Computer systems for analysis of Nahuatl. Res Comput Sci. 2012;47:11–6.
Article Google Scholar
Pappu V, Pardalos PM. High-Dimensional Data Classification. In: Aleskerov F, Goldengorin B, Pardalos P, editors. Clusters, Orders, and Trees: Methods and Applications. Springer Optimization and Its Applications, vol. 92. New York, NY: Springer; 2014. https://doi.org/10.1007/978-1-4939-0742-7_8
Othman A, Hasan T, Impact of dimensionality reduction on the accuracy of data classification. In: 3rd international conference on engineering technology and its applications (IICETA) 2020, p. 128–33, 2020. https://doi.org/10.1109/IICETA50496.2020.9318955.
Hassan M, Nath B, Bhuiya M. Bengali phoneme recognition: a new. In: 6th International Conference on Computer and Information Technology, Dhaka, Bangladesh, 2003.
Cheng H, Ma X, Yugong X. A study of speech feature extraction based on manifold learning. J Phys Conf Ser. 2019;1187(5): 052021.
Article Google Scholar
Byrne EAD. An international comparison of long-term average speech spectra. J Acoust Soc Am. 1996;96(4):2108–20.
Article Google Scholar
Antonetti A, Siqueira L, Gobbo M, Brasolotto A, Silverio K. Relationship of cepstral peak prominence-smoothed and long-term average spectrum with auditory-perceptual analysis. Multidiscipl Digit Publ. 2020;10(8598):12.
Google Scholar
Tanner K, Roy N, Ash A, Buder EH. Spectral moments of the long-term average spectrum: sensitive indices of voice change after therapy? J Voice. 2005;19(2):211–22.
Article Google Scholar
Boersma P, Weenink D. Praat: doing phonetics by computer [Computer program]. Version 6.1.50. 20 June 2021. [Online]. Available: http://www.praat.org/. Accessed 22 June 2021.
Stephens SS, Volkman J. The relation of pitch to frequency. Am J Psychol. 1940;53(3):329–53.
Huerta L, Huesca J, Contreras J. Speech segmentation algorithm based on fuzzy memberships. Int J Comput Sci Inf Secur. 2010:229–34.
Tukey J, Bogert P, Healy M. The quefrency analysis of time series for echoes: Cepstrum, psuedo-autocovariance, cross-cepstrum and sa phe cracking. In; Proceedings of the Symposium on Time Series Analysis, 2006.
Hummersone C. Calculate the long-term average spectrum of a signal. 2021. [Online]. Available: https://github.com/IoSR-Surrey/MatlabToolbox. Accessed 08 June 2021.
Kinnunen T, Hautmaki V, Franti P. On the use of long-term average spectrum in automatic speaker recognition. In: International Synposium on Chinese Spoken Language Processing (ISCSLP 2006), 2006.
Cukier-Blaj S, Camargo Z, Madureira S. Longterm average spectrum loudness variation in speakers with asthma, paradoxical vocal fold motion and without breathing problems. In: Proceedings of the Fourth Conference on Speech Prosody, no. 9780616220, p. 41–4, 2008.
Lofqvist A. The long-time-average spectrum as a tool in voice research. J Phon. 1986;14:471–5.
Article Google Scholar
Rose P. Forensic speaker identification. London: CRC Press; 2002. p. 380.
Book Google Scholar
Insituto Nacional de Lenguas Indígenas. Prontuarios de frases de cortersía de Lenguas Indígenas. 01 09 2010. [Online]. Available: https://site.inali.gob.mx/Micrositios/Prontuarios/index.html. Accessed 2020 Jan 24.
Ohala J. The origin of sound patterns in vocal tract constraints. In: The production of speech. New York: Springer; 1983. p. 189–216.
Chapter Google Scholar
Kohavi R, George JH. Wrappers for feature subset selection. Artif Intell. 1997;97(1):273–324.
Article MATH Google Scholar
Pittam J. Voice in social interaction: an interdisciplinary approach. London: SAGE Publications; 1994.
Book Google Scholar

Download references

Acknowledgements

This paper is an expanded version of the research presented at the 3rd Geographic Information Systems Latin-American (GIS-LATAM) International Conference Series in October 2021.

Author information

Authors and Affiliations

Universidad del Istmo, 70110, Ixtepec, Oaxaca, Mexico
Luis David Huerta-Hernández & Oscar Alonso de la Rosa Aguilar
College of Computer Science and Engineering, Taibah University, Medina, 41477, Saudi Arabia
Khalid S. Aloufi
Universidad Autónoma del Estado de Quintana Roo, 77519, Chetumal, Quintana Roo, Mexico
Julio Cesar Ramírez-Pacheco, Homero Toral-Cruz & José Antonio León-Borges

Authors

Luis David Huerta-Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Julio Cesar Ramírez-Pacheco
View author publications
You can also search for this author in PubMed Google Scholar
Homero Toral-Cruz
View author publications
You can also search for this author in PubMed Google Scholar
Khalid S. Aloufi
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Alonso de la Rosa Aguilar
View author publications
You can also search for this author in PubMed Google Scholar
José Antonio León-Borges
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Antonio León-Borges.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest. The study was not supported by any funding.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Telematics, IA and Security” guest edited by Felix Mata, Roberto Zagal Flores and Jose Antonio Leon-Borges.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huerta-Hernández, L.D., Ramírez-Pacheco, J.C., Toral-Cruz, H. et al. New Method for Automatic Recognition of Mexican Indigenous Languages: Comparative Performance of Classifiers. SN COMPUT. SCI. 4, 649 (2023). https://doi.org/10.1007/s42979-023-01985-w

Download citation

Received: 18 August 2022
Accepted: 01 June 2023
Published: 28 August 2023
DOI: https://doi.org/10.1007/s42979-023-01985-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New Method for Automatic Recognition of Mexican Indigenous Languages: Comparative Performance of Classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

A lazy learning-based language identification from speech using MFCC-2 features

A Novel Approach for Spoken Language Identification and Performance Comparison Using Machine Learning-Based Classifiers and Neural Network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now