Multilingual Training Set Selection for ASR in Under-Resourced Malian Languages

van der Westhuizen, Ewald; Padhi, Trideba; Niesler, Thomas

doi:10.1007/978-3-030-87802-3_67

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

Abstract

We present first speech recognition systems for the two severely under-resourced Malian languages Bambara and Maasina Fulfulde. These systems will be used by the United Nations as part of a monitoring system to inform and support humanitarian programmes in rural Africa. We have compiled datasets in Bambara and Maasina Fulfulde, but since these are very small, we take advantage of six similarly under-resourced datasets in other languages for multilingual training. We focus specifically on the best composition of the multilingual pool of speech data for multilingual training. We find that, although maximising the training pool by including all six additional languages provides improved speech recognition in both target languages, substantially better performance can be achieved by a more judicious choice. Our experiments show that the addition of just one language provides best performance. For Bambara, this additional language is Maasina Fulfulde, and its introduction leads to a relative word error rate reduction of 6.7%, as opposed to a 2.4% relative reduction achieved when pooling all six additional languages. For the case of Maasina Fulfulde, best performance was achieved when adding only Luganda, leading to a relative word error rate improvement of 9.4% as opposed to a 3.9% relative improvement when pooling all six languages. We conclude that careful selection of the out-of-language data is worthwhile for multilingual training even in highly under-resourced settings, and that the general assumption that more data is better does not always hold.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multilingual speech recognition initiative for African languages

Article 06 November 2024

Multilingual end-to-end ASR for low-resource Turkic languages with common alphabets

Article Open access 15 June 2024

Bi-Lingual TDNN-LSTM Acoustic Modeling for Limited Resource Hindi and Marathi Language ASR

Notes

1.
https://www.unglobalpulse.org/project/making-ugandan-community-radio-machine-readable-using-speech-recognition-technology/.https://www.unglobalpulse.org/document/using-machine-learning-to-analyse-radio-content-in-uganda/.
2.
OpenSLR Tunisian Modern Standard Arabic corpus, accessed 2021-02-21 at http://www.openslr.org/46/.

References

Arnott, D.W.: The Nominal and Verbal Systems of Fula. Clarendon Press, Oxford (1970)
Google Scholar
Arnott, D.W.: Some aspects of the study of Fula dialects. Bull. Sch. Oriental Afr. Stud. Univ. Lond. 37(1), 8–18 (1974)
Article Google Scholar
Barry, A., Barry, I., Constable, P., Glass, A.: Proposal to encode ADLAM nasalization mark for ADLaM script (2018)
Google Scholar
Boersma, P., Weenink, D.: Praat: doing phonetics by computer [Computer program]. Version 6.1.39. http://www.praat.org/. Accessed 26 Mar 2021
Donaldson, C.: Clear Language: Script, Register and the N’ko Movement of Manding-speaking West Africa. Ph.D. thesis, University of Pennsylvania (2017)
Google Scholar
Donaldson, C.: Orthography, standardization, and register: the case of Manding. In: Lane, P., Costa, J., Korne, H.D. (eds.) Standardizing Minority Languages: Competing Ideologies of Authority and Authenticity in the Global Periphery, 1st edn., pp. 175–199. Routledge (2017). https://doi.org/10.4324/9781315647722
Fagerberg-Diallo, S.: A Practical Guide and Reference Grammar to the Fulfulde of Maasina 1 & 2. Joint Christian Ministry in West Africa, Jos (1984)
Google Scholar
Grézl, F., Karafiét, M., Veselý, K.: Adaptation of multilingual stacked bottle-neck neural network structure for new language. In: Proceedings of 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy (2014)
Google Scholar
Heigold, G., et al.: Multilingual acoustic models using distributed deep neural networks. In: Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada (2013)
Google Scholar
Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada (2013)
Google Scholar
Katzner, K., Miller, K.: The Languages of the World. Routledge (2002)
Google Scholar
Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: Audio augmentation for speech recognition. In: Proceedings of Interspeech 2015, Dresden, Germany (2015)
Google Scholar
Menon, R., Saeb, A., Cameron, H., Kibira, W., Quinn, J., Niesler, T.: Radio-browsing for developmental monitoring in Uganda. In: Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA (2017)
Google Scholar
Osborn, D.W., Dwyer, D.J., Donohoe, J.I.J.: A Fulfulde (Maasina) - English - French Lexicon: A Root-Based Compilation Drawn from Extant Sources. Michigan State University Press (1993)
Google Scholar
Padhi, T., Biswas, A., de Wet, F., van der Westhuizen, E., Niesler, T.: Multilingual bottleneck features for improving ASR performance of code-switched speech in under-resourced languages. In: Proceedings of the First Workshop on Speech Technologies for Code-switching in Multilingual Communities (WSTCSMC), Shanghai, China (2020)
Google Scholar
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. In: Proceedings of Interspeech 2019, Graz, Austria (2019)
Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Hawaii, USA (2011)
Google Scholar
Povey, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Proceedings of Interspeech (2016)
Google Scholar
Saeb, A., Menon, R., Cameron, H., Kibira, W., Quinn, J., Niesler, T.: Very low resource radio browsing for agile developmental and humanitarian monitoring. In: Proceedings of Interspeech 2017, Stockholm, Sweden (2017)
Google Scholar
Schultz, T., Kirchhoff, K.: Multilingual Speech Processing. Elsevier (2006)
Google Scholar
Schultz, T., Waibel, A.: Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Commun. 35(1–2), 31–51 (2001)
Article Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of Interspeech 2002, Denver, Colorado (2002)
Google Scholar
Tachbelie, M.Y., Abate, S.T., Schultz, T.: Development of multilingual ASR using globalphone for less-resourced languages: the case of Ethiopian languages. In: Proceedings of Interspeech 2020, Shanghai, China (2020)
Google Scholar
Veselỳ, K., Karafiát, M., Grézl, F., Janda, M., Egorova, E.: The language-independent bottleneck features. In: Proceedings of 2012 IEEE Spoken Language Technology Workshop (SLT), Miami, USA (2012)
Google Scholar
Vydrine, V.: Manding-English Dictionary: Maninka, Bamana, vol. 1. MeaBooks Inc. (2015)
Google Scholar

Download references

Acknowledgments

We would like to thank United Nations Global Pulse for collaboration and supporting this research. We also gratefully acknowledge the support of NVIDIA corporation with the donation GPU equipment used during the course of this research, as well as the support of Council for Scientific and Industrial Research (CSIR), Department of Science and Technology, South Africa for provisioning us the Lengau CHPC cluster for seamlessly conducting our experiments. We also gratefully acknowledge the support of Telkom South Africa.

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, Stellenbosch University, Stellenbosch, South Africa
Ewald van der Westhuizen, Trideba Padhi & Thomas Niesler

Authors

Ewald van der Westhuizen
View author publications
You can also search for this author in PubMed Google Scholar
Trideba Padhi
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Niesler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ewald van der Westhuizen .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van der Westhuizen, E., Padhi, T., Niesler, T. (2021). Multilingual Training Set Selection for ASR in Under-Resourced Malian Languages. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_67

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_67
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics