Skip to main content
Log in

Development of a diacritic-aware large vocabulary automatic speech recognition for Hausa language

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Research on voice recognition for African languages is limited due to the scarcity of digital resources for training and adaptation, despite its broad usefulness. The Hausa language, spoken by almost fifty million inhabitants in West and Central Africa, is an example of a linguistic domain that has not been thoroughly studied. The Hausa language employs diacritics, which are symbols located above alphabetical characters to convey further information. By removing diacritics, the number of homographs increases, making it difficult to distinguish between similar words. This paper presents a study on speech recognition in the Hausa Language, specifically focusing on diacritized words. The study utilises the state-of-the-art wave2vec2.0 and Whisper deep learning architecture models, for transcribing audio signals into corresponding Hausa text. According to the results obtained in the study, the Whisper-large deep model emerged as the best, achieving a word error rate of 4.23% representing a considerable improvement of 43.9% when compared to the existing state-of-the-art model for Hausa language speech recognition. Additionally, the Whsiper-large model demonstrated a diacritic coverage of 92%, precision of 98.87%, with a diacritic error rate of 2.1%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availibility

The dataset used for this research is Mozilla Common Voice dataset Version 15.0 for the Hausa Language which contains 10106 audio files of 13 h with their corresponding transcripts with 4 validated hours, publicly available at https://commonvoice.mozilla.org/en/datasets.

References

  • Abdulhamid, T. H., & Tahir, S. M. (2017). Intelligent system speech recognition voice and speech recognition for Hausa words and numerals. International Journal of Advance Technology in Engineering, 5, 107519.

    Google Scholar 

  • Abdulmumin, S. (2014). A survey of historical prevalence of Hausa language in contemporary literacy. ZAHIRA–Journal of Historical Research, 5(4)

  • Abubakar, M. K. (2014). Pronunciation problems of Hausa speakers of English

  • Akhilesh, A., Brinda, P., Keerthana, S., Gupta, D., & Vekkot, S. (2022). Tamil speech recognition using XLSR wav2vec2. 0 & CTC algorithm. In 13th international conference on computing communication and networking technologies (ICCCNT) (pp. 1–6). IEEE

  • Al-Dujaili, M. J., & Ebrahimi-Moghadam, A. (2023). Speech emotion recognition: A comprehensive survey. Wireless Personal Communications, 129(4), 2525–2561.

    Article  Google Scholar 

  • Alhumud, A. M., AL-Qurishi, M., Alomar, Y. O., Alzahrani, A., & Souissi, R. (2024). Improving automated speech recognition using retrieval-based voice conversion. In The second tiny papers track at ICLR 2024. https://openreview.net/forum?id=OMBFB6pU6c

  • Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F. M., & Weber, G. (2019). Common voice: A massively multilingual speech corpus. arXiv:1912.06670

  • Babatunde, A. N., Ogundokun, R. O., Jimoh, E. R., Misra, S., & Singh, D. (2023). Hausa character recognition using logistic regression. In Machine intelligence techniques for data analysis and signal processing: Proceedings of 4th international conference MISP 2022 (Vol. 1, pp. 801–811). Springer

  • Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 33, 12449–12460.

    Google Scholar 

  • Bashir, M., Owaseye, J. F., & Eze, J. C. (2023). Substitution as a phonological interference in Hausa spoken by IGBO and Yoruba speakers. Advance Journal of Linguistics and Mass Communication, 7(4), 1–14.

    Google Scholar 

  • Biswas, D., Nadipalli, S., Sneha, B., & Supriya, M. (2022). Speech recognition using weighted finite-state transducers. In 7th international conference for convergence in technology (I2CT) (pp. 1–5). IEEE

  • Callejo, D. R., & Boets, B. (2023). A systematic review on speech-in-noise perception in autism. Neuroscience & Biobehavioral Reviews. https://doi.org/10.1016/j.neubiorev.2023.105406

    Article  Google Scholar 

  • Caubrière, A., & Gauthier, E. (2024). Africa-centric self-supervised pre-training for multilingual speech representation in a sub-saharan context. arXiv:2404.02000

  • Chen, J., Vekkot, S., & Shukla, P. (2024). Music source separation based on a lightweight deep learning framework (DTTNET: Dual-path TFC-TDF UNET). In 2024 IEEE international conference on acoustics, speech and signal processing (ICASSP 2024) (pp. 656–660). IEEE

  • Diskin, M., Bukhtiyarov, A., Ryabinin, M., Saulnier, L., Sinitsin, A., Popov, D., Pyrkin, D. V., Kashirin, M., Borzunov, A., Moral, A., et al. (2021). Distributed deep learning in open collaborations. Advances in Neural Information Processing Systems, 34, 7879–7897.

    Google Scholar 

  • Dong, M., Peng, L., Nie, Q., & Li, W. (2023). Speech signal processing of industrial speech recognition. Journal of Physics: Conference Series, 2508, 012039.

    Google Scholar 

  • Gauthier, E., Besacier, L., & Voisin, S. (2016). Automatic speech recognition for African languages with vowel length contrast. Procedia Computer Science, 81, 136–143.

    Article  Google Scholar 

  • Gris, L. R. S., Casanova, E., Oliveira, F. S., Soares, A., & Junior, A. C. (2021). Brazilian Portuguese speech recognition using wav2vec 2.0. arXiv:2107.11414

  • Hancock, A., Northcott, S., Hobson, H., & Clarke, M. (2023). Speech, language and communication needs and mental health: The experiences of speech and language therapists and mental health professionals. International Journal of Language & Communication Disorders, 58(1), 52–66.

    Article  Google Scholar 

  • Ibrahim, Y. A., Faki, S. A., & Abidemi, T. I. F. (2019). Automatic speech recognition using MFCC in feature extraction based HMM for human-computer interaction in Hausa. Anale Seria Informatica, 18

  • Ibrahim, U. A., Mahatma, M. B., & Suleiman, M. A. (2022). Framework for Hausa speech recognition. In 2022 5th information technology for education and development (ITED) (pp. 1–4). IEEE

  • Inuwa-Dutse, I. (2021). The first large-scale collection of diverse Hausa language datasets. arXiv:2102.06991

  • Klejch, O., Wallington, E., & Bell, P. (2021). Deciphering speech: A zero-resource approach to cross-lingual transfer in ASR. arXiv:2111.06799

  • Kumar, A., Cambria, E., & Trueman, T. E. (2021). Transformer-based bidirectional encoder representations for emotion detection from text. In IEEE symposium series on computational intelligence (SSCI) (pp 1–6). IEEE

  • Kumar, M. R., Vekkot, S., Lalitha, S., Gupta, D., Govindraj, V. J., Shaukat, K., Alotaibi, Y. A., & Zakariah, M. (2022). Dementia detection from speech using machine learning and deep learning architectures. Sensors, 22(23), 9311.

    Article  Google Scholar 

  • Likhomanenko, T., Lugosch, L., & Collobert, R. (2023). Unsupervised ASR via cross-lingual pseudo-labeling. arXiv:2305.13330

  • Luka, M. K., Ibikunle, F., & Gregory, O. (2012). Neural network based Hausa language speech recognition. International Journal of Advanced Research in Artificial Intelligence, 1(2), 39–44.

    Google Scholar 

  • Mak, F., Govender, A., & Badenhorst, J. (2024). Exploring ASR fine-tuning on limited domain-specific data for low-resource languages. Journal of the Digital Humanities Association of Southern Africa. https://doi.org/10.55492/dhasa.v5i1.5024

    Article  Google Scholar 

  • Manasa, C. S., Priya, K. J., & Gupta, D. (2019). Comparison of acoustical models of GMM-HMM-based for speech recognition in Hindi using Pocketsphinx. In 3rd international conference on computing methodologies and communication (ICCMC) (pp. 534–539). IEEE

  • Mbonu, C. E., Chukwuneke, C. I., Paul, R. U., Ezeani, I., & Onyenwe, I. (2022). Igbosum1500-introducing the IGBO text summarization dataset. In 3rd workshop on African natural language processing

  • Mekki, S. A., Hassan, E. M., Dayhum, A. F. A., & Galhom, D. H. (2023). Brief insight about speech perception and classification of speech sound in Arabic dialects. Journal of Pharmaceutical Negative Results, 1256–1262

  • Millet, J., Caucheteux, C., Boubenec, Y., Gramfort, A., Dunbar, E., Pallier, C., King, J., et al. (2022). Toward a realistic model of speech processing in the brain with self-supervised learning. Advances in Neural Information Processing Systems, 35, 33428–33443.

    Google Scholar 

  • Musa, I. I. (2022). An assessment of the ancient Hausa traditional security system before the imposition of the British colonial administration in Hausa land. Sapientia Global Journal of Arts, Humanities and Development Studies, 5(1)

  • Owodunni, A. T., Yadavalli, A., Emezue, C. C., Olatunji, T., & Mbataku, C. C. (2024). Accentfold: A journey through African accents for zero-shot ASR adaptation to target accents. arXiv:2402.01152

  • Palo, P., Moisik, S. R., & Faytak, M. (2023). Analysing speech data with Satkit. In International conference of phonetic sciences (ICPhS 2023), Prague

  • Pati, P. B., Shreyas, V. (2022). Speech to equation conversion using a POE tagger. In 7th international conference for convergence in technology (I2CT) (pp. 1–4). IEEE

  • Payne, J., Au, A., & Dowell, R. C. (2023). An overview of factors affecting bimodal and electric-acoustic stimulation (EAS) speech understanding outcomes. Hearing Research, 431, 108736.

    Article  Google Scholar 

  • Podila, R. S. A., Kommula, G. S. S., Ruthvik, K., Vekkot, S., & Gupta, D. (2022). Telugu dialect speech dataset creation and recognition using deep learning techniques. In IEEE 19th India council international conference (INDICON) (pp. 1–6). IEEE

  • Priya, K. J., Sowmya, S., Navya, T., & Gupta, D. (2018). Implementation of phonetic level speech recognition in Kannada using HTK. In Proceedings of international conference on communication and signal processing (ICCSP) (pp. 0082–0085). https://doi.org/10.1109/ICCSP.2018.8524192

  • Priyamvada, R., Kumar, S.S., Ganesh, H., & Soman, K. (2022). Multilingual speech recognition for Indian languages. In Advanced machine intelligence and signal processing (pp. 545–553)

  • Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In International conference on machine learning (PMLR) (pp. 28492–28518)

  • Ritchie, S., Cheng, Y.-C., Chen, M., Mathews, R., Esch, D., Li, B., & Sim, K. C. (2022). Large vocabulary speech recognition for languages of Africa: Multilingual modelling and self-supervised learning. arXiv:2208.03067

  • Schultz, I. T., Djomgang, E. G. K., Schlippe, D. T., & Vu, D. T. (2011). Hausa large vocabulary continuous speech recognition. Karlsruhe Institute of Technology

  • Seikel, J. A., Drumright, D. G., & Hudock, D. J. (2023). Anatomy & physiology for speech, language, and hearing. Plural Publishing.

    Google Scholar 

  • Shamma, A. L., Vekkot, S., Gupta, D., Zakariah, M., & Alotaibi, Y. A. (2024). Development of a non-invasive COVID-19 detection framework using explainable AI and data augmentation 1. Journal of Intelligent & Fuzzy Systems. https://doi.org/10.3233/JIFS-219387

    Article  Google Scholar 

  • Sharma, R. S., Paladugu, S. H., Priya, K. J., & Gupta, D. (2019). Speech recognition in Kannada using HTK and Julius: A comparative study. In 2019 international conference on communication and signal processing (ICCSP) (pp. 0068–0072). https://doi.org/10.1109/ICCSP.2019.8698039

  • Sharma, S. B. N. (2017). Isolated word speech recognition system using dynamic time warping. Global Journal of Advance Engineering Technology and Science, 5, 107519.

    Google Scholar 

  • Sneha, V., Hardhika, G., Priya, K. J., & Gupta, D. (2018). Isolated Kannada speech recognition using HTK—A detailed approach. In Progress in advanced computing and intelligent engineering: Proceedings of ICACIE 2016 (Vol. 2, pp. 185–194). Singapore

  • Tachbelie, M. Y., Abate, S. T., & Schultz, T. (2022). Multilingual speech recognition for globalphone languages. Speech Communication, 140, 71–86.

    Article  Google Scholar 

  • Unubi, S. A.: Significant linguistic information on the Arabic and Hausa languages (2023)

  • Vancha, P., Nagarajan, H., Inakollu, V., Gupta, D., & Vekkot, S. (2022). Word-level speech dataset creation for Sourashtra and recognition system using Kaldi. In IEEE 19th India council international conference (INDICON) (pp. 1–6). IEEE

  • Vekkot, S., & Gupta, D. (2022). Fusion of spectral and prosody modelling for multilingual speech emotion conversion. Knowledge-Based Systems, 242, 108360.

    Article  Google Scholar 

  • Vekkot, S., Prakash, N. N. V. S., Reddy, T. S. E., Sripathi, S. R., Lalitha, S., Gupta, D., Zakariah, M., & Alotaibi, Y. A. (2023). Dementia speech dataset creation and analysis in Indic languages—A pilot study. IEEE Access, 11, 130697–130718.

    Article  Google Scholar 

  • Venugopalan, M., & Gupta, D. (2020). An unsupervised hierarchical rule-based model for aspect term extraction augmented with pruning strategies. Procedia Computer Science, 171, 22–31.

    Article  Google Scholar 

  • Voice, M. C.: Mozilla common voice for Hausa language version 13.0. https://commonvoice.mozilla.org/en/datasets

  • Wu, P., Wang, R., Lin, H., Zhang, F., Tu, J., & Sun, M. (2023). Automatic depression recognition by intelligent speech signal processing: A systematic survey. CAAI Transactions on Intelligence Technology, 8(3), 701–711.

    Article  Google Scholar 

  • Xu, S., Yu, J., Guo, H., Tian, S., Long, Y., Yang, J., & Zhang, L. (2023). Force-induced ion generation in zwitterionic hydrogels for a sensitive silent-speech sensor. Nature Communications, 14(1), 219.

    Article  Google Scholar 

  • Zubairu, B. S., Kadiri, G. C., & Ekwueme, J. (2020). Comparative study of English and Hausa affixation. Academic Journal of Current Research, 7(11), 1–10.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susmitha Vekkot.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abubakar, A.M., Gupta, D. & Vekkot, S. Development of a diacritic-aware large vocabulary automatic speech recognition for Hausa language. Int J Speech Technol 27, 687–700 (2024). https://doi.org/10.1007/s10772-024-10111-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-024-10111-x

Keywords