skip to main content
10.1145/3639233.3639345acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article

Deploying a Speech Recognition Model for Under-Resourced Languages: A Case Study on Dioula Wake Words 1, 2, 3, and 4

Published: 05 March 2024 Publication History

Abstract

Abstract: Speech recognition technology has the potential to provide valuable information and services to the 12.5 million Dioula speakers, especially the illiterates. However, these people, who could benefit the most, often do not have access to this technology because there are few data sets for resource-poor languages. This paper investigates the effectiveness of data augmentation in training wake words such as 1, 2, 3 and 4 in Dioula. The study contains two major contributions: the release of a Dioula language corpus for wake words 1, 2, 3 and 4, comprising 1.4 hours of audio with a labeled dataset, and a training of speech recognition model for 1, 2, 3, and 4 applying the data augmentation technique, which resulted in a significant improvement in accuracy from 51% to 96%. Additionally, the confusion matrices illustrate the model’s enhanced predictive capacity, with an average of 1762 out of 1817 instances of the number "1" being correctly recognized after data augmentation. The study also uncovered an impressive reduction in loss from 205% to 14% after implementing data augmentation. These results underscore the pivotal role of data augmentation in improving the model’s performance and mitigating overfitting issues, underscoring the promise of this technique in addressing data scarcity in underrepresented speech contexts. Training a speech recognition model to detect specific wake words, such as "1," "2," "3," and "4" in Dioula, can be highly valuable in constructing interactive voice response systems, thereby fostering greater inclusivity and accessibility for underserved communities.

References

[1]
[1] Gabardo E, de Freitas Firkowski OLC, Viana ACA. The digital divide in Brazil and the accessibility as a fundamental right. Revista Chilena de Derecho y Tecnología 2022;[, 11:1–26],.
[2]
[2] Babirye C, Nakatumba-Nabende J, Katumba A, Ogwang R, Francis JT, Mukiibi J, et al. Building text and speech datasets for low resourced languages: A case of languages in East Africa 2022.
[3]
[3] Krauwer S. The basic language resource kit (BLARK) as the first milestone for the language resources roadmap. vol. 2003, 2003, p. 15.
[4]
[4] Berment V. Méthodes pour informatiser les langues et les groupes de langues «peu dotées» (PhD thesis). Université Joseph-Fourier-Grenoble I, Grenoble 2004.
[5]
[5] Omniglot. Dioula language 2021. — https://omniglot.com/writing/dioula.htm (accessed September 25, 2023).
[6]
[6] Mangeot M, Sadat F. TALN-RECITAL 2014 Workshop TALAf 2014: Traitement Automatique des Langues Africaines (TALAf 2014: African Language Processing), 2014.
[7]
[7] Tapo AA, Coulibaly B, Diarra S, Homan C, Kreutzer J, Luger S, et al. Neural machine translation for extremely low-resource African languages: A case study on Bambara. arXiv Preprint arXiv:201105284 2020.
[8]
[8] Maslinsky K. Positional skipgrams for Bambara: a resource for corpus-based studies. Mandenkan Bulletin Semestriel d’études Linguistiques Mandé 2019.
[9]
[9] Shah F. Discrete wavelet transforms and artificial neural networks for speech emotion recognition. International Journal of Computer Theory and Engineering 2010;2:319.
[10]
[10] Ranjan S. Exploring the discrete wavelet transform as a tool for Hindi speech recognition. International Journal of Computer Theory and Engineering 2010;2:642.
[11]
[11] Mahar JA, Memon GQ. Sindhi part of speech tagging system using WordNet. International Journal of Computer Theory and Engineering 2010;2:538.
[12]
[12] Netshiombo D, Mokgonyane TB, Manamela MJ, Modipa TI. Spoken Digit Recognition System for an Extremely Under-resourced Language n.d.
[13]
[13] Chapaneri SV. Spoken digits recognition using weighted MFCC and improved features for dynamic time warping. International Journal of Computer Applications 2012;40:6–12.
[14]
[14] Dave N. Feature extraction methods LPC, PLP, and MFCC in speech recognition. International Journal for Advance Research in Engineering and Technology 2013;[, 1:1–4],.
[15]
[15] Team K. Keras documentation: KerasTuner. Keras[Online] Available: — Https://Keras Io/Keras_tuner/ — [Accessed: 05-Feb-2022] 2022.
[16]
[16] Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research 2017;[, 18:6765–816],.
[17]
[17] de Wet F, Kleynhans N, van Compernolle D, Sahraeian R. Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems. South African Journal of Science 2017;[, 113:1–9],. — https://doi.org/10.17159/sajs.2017/20160038.
[18]
[18] van der Westhuizen E, Padhi T, Niesler T. Multilingual Training Set Selection for ASR in Under-Resourced Malian Languages. In: Karpov A, Potapova R, editors. Speech and Computer, Cham: Springer International Publishing; 2021,[, p. 749–60],. — https://doi.org/10.1007/978-3-030-87802-3_67.
[19]
[19] Doumbouya M, Einstein L, Piech C. Using radio archives for low-resource speech recognition: towards an intelligent virtual assistant for illiterate users. vol. 35, 2021, [, p. 14757–65],.
[20]
[20] Some MJ, Ouedraogo I, Benedikter R, Yameogo R, Atemezing G, Traoré I, et al. Interactive Voice Response Service to Improve High School Students Covid-19 Literacy in Burkina Faso: A Usability Study. Advances in Informatics, Management and Technology in Healthcare, IOS Press; 2022, [, p. 454–7].

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
NLPIR '23: Proceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval
December 2023
336 pages
ISBN:9798400709227
DOI:10.1145/3639233
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dioula language
  2. under-resourced languages.
  3. user interface
  4. voice recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

NLPIR 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 21
    Total Downloads
  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)6
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media