An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture

Daouad, Mohamed; Allah, Fadoua Ataa; Dadi, El Wardani

doi:10.1007/s10772-023-10054-9

An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture

Published: 11 October 2023

Volume 26, pages 775–787, (2023)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

160 Accesses
3 Citations
Explore all metrics

Abstract

The availability of automatic speech recognition systems is crucial in various domains such as communication, healthcare, security, education, etc. However, currently, the existing systems often favor dominant languages such as English, French, Arabic, or Asian languages, leaving under-resourced languages without the consideration they deserve. In this specific context, our work is focused on the Amazigh language, which is widely spoken in North Africa. Our primary objective is to develop an automatic speech recognition system specifically for isolated words, with a particular focus on the Tarifit dialect spoken in the Rif region of Northern Morocco. For the dataset construction, we considered 30 isolated words recorded from 80 diverse speakers, resulting in 2400 audio files. The collected corpus is characterized by its quantity, quality, and variety. Moreover, the dataset serves as a valuable resource for further research and development in the field, supporting the advancement of speech recognition technology for underrepresented languages. For the recognition system, we have chosen the most recent approach in the field of speech recognition, which is a combination of convolutional neural networks and LSTM (CNN-LSTM). For the test, we have evaluated two different architectural models: the 1D CNN LSTM and the 2D CNN LSTM. The experimental results demonstrate a remarkable accuracy rate of over 96% in recognizing spoken words utilizing the 2D CNN LSTM architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recognition Method of Wa Language Isolated Words Based on Convolutional Neural Network

Isolated Word Automatic Speech Recognition System

An acoustic model and linguistic analysis for Malayalam disyllabic words: a low resource language

Article 03 February 2021

Data availability

The data used in this study are not publicly available due to restrictions imposed by the LSA Laboratory. However, the data can be made available by the corresponding author upon reasonable request.

References

Abakarim, F., & Abenaou, A. (2020). Amazigh isolated word speech recognition system using the adaptive orthogonal transform method. 2020 International conference on intelligent systems and computer vision. https://doi.org/10.1109/ISCV49265.2020.9204291
Article Google Scholar
Abdullah, M., Ahmad, M., & Han, D. (2020). Facial expression recognition in videos: An CNN-LSTM based model for video classification. In 2020 International conference on electronics, information, and communication (ICEIC) (pp. 16–18). IEEE. https://doi.org/10.1109/ICEIC49074.2020.9051332
Ameur, M., Bouhjar, A., Boukhris, F., Boukouss, A., Boumalk, A., Elmedlaoui, M., El Mehdi, I., & Souifi, H. (2004). Initiation à la langue amazighe. El Maârif al Jadida.
Badshah, A. M., Rahim, N., Ullah, N., Ahmad, J., Muhammad, K., Lee, M. Y., Kwon, S., & Baik, S. W. (2019). Deep features-based speech emotion recognition for smart affective services. Multimedia Tools and Applications,78(5), 5571–5589. https://doi.org/10.1007/s11042-017-5292-7
Article Google Scholar
Barkani, F., Hamidi, M., Laaidi, N., Zealouk, O., Satori, H., & Satori, K. (2023). Amazigh speech recognition based on the Kaldi ASR toolkit. International Journal of Information Technology. https://doi.org/10.1007/s41870-023-01354-z
Article Google Scholar
Boukous, A. (1995). Société, langues et cultures au maroc : Enjeux symboliques (8th ed., p. 239). Faculté des Lettres et des Sciences Humaines, Université Mohamed V.
Google Scholar
Choi, K., Fazekas, G., Sandler, M., & Cho, K. (2018). A comparison of audio signal preprocessing methods for deep neural networks on music tagging. European Signal Processing Conference,2018, 1870–1874. https://doi.org/10.23919/EUSIPCO.2018.8553106
Article Google Scholar
El Ouahabi, S., Atounti, M., & Bellouki, M. (2017). A database for Amazigh speech recognition research: AMZSRD. In Proceedings of 2017 International conference of cloud computing technologies and applications, CloudTech 2017, 2018-Janua (pp. 1–5). IEEE. https://doi.org/10.1109/CloudTech.2017.8284715
El Ouahabi, S., Atounti, M., & Bellouki, M. (2019a). Amazigh speech recognition using triphone modeling and clustering tree decision. Annals of the University of Craiova Mathematics and Computer Science Series,46(1), 55–65.
Google Scholar
El Ouahabi, S., Atounti, M., & Bellouki, M. (2019b). Toward an automatic speech recognition system for Amazigh-tarifit language. International Journal of Speech Technology,22(2), 421–432. https://doi.org/10.1007/s10772-019-09617-6
Article Google Scholar
El Ouahabi, S., Atounti, M., & Bellouki, M. (2020). Optimal parameters selected for automatic recognition of spoken Amazigh digits and letters using hidden Markov model toolkit. International Journal of Speech Technology,23(4), 861–871. https://doi.org/10.1007/s10772-020-09762-3
Article Google Scholar
Essa, Y., Hunt, H. G. P., Gijben, M., & Ajoodha, R. (2022). Deep learning prediction of thunderstorm severity using remote sensing weather data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,15, 4004–4013. https://doi.org/10.1109/JSTARS.2022.3172785
Article Google Scholar
Fadoua, A. A., & Siham, B. (2012). Natural language processing for Amazigh language: Challenges and future directions. Language Technology for Normalisation of Less-Resourced Languages,19, 23.
Google Scholar
Hajarolasvadi, N., & Demirel, H. (2019). 3D CNN-based speech emotion recognition using k-means clustering and spectrograms. Entropy,21(5), 479. https://doi.org/10.3390/e21050479
Article Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Idhssaine, A., & El Kirat, Y. (2021). Amazigh language use, perceptions and revitalisation in morocco: The case of rabat-sale region. Journal of North African Studies,26(3), 465–479. https://doi.org/10.1080/13629387.2019.1690996
Article Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM,60(6), 84–90. https://doi.org/10.1145/3065386
Article Google Scholar
Lee, J., & Tashev, I. (2015). High-level feature representation using recurrent neural network for speech emotion recognition. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH, 2015-Janua (pp. 1537–1540). https://doi.org/10.21437/interspeech.2015-336
Oruh, J., Viriri, S., & Adegun, A. (2022). Long short-term memory recurrent neural network for automatic speech recognition. IEEE Access,10, 30069–30079. https://doi.org/10.1109/ACCESS.2022.3159339
Article Google Scholar
Ouhnini, A., Aksasse, B., & Ouanan, M. (2023). Towards an automatic speech-to-text transcription system: Amazigh language. International Journal of Advanced Computer Science and Applications,14(2), 413–418. https://doi.org/10.14569/IJACSA.2023.0140250
Article Google Scholar
Satori, H., & Elhaoussi, F. (2014). Investigation amazigh speech recognition using CMU tools. International Journal of Speech Technology,17(3), 235–243. https://doi.org/10.1007/s10772-014-9223-y
Article Google Scholar
Telmem, M., & Ghanou, Y. (2020). A comparative study of HMMs and CNN acoustic model in amazigh recognition system. Advances in Intelligent Systems and Computing,1076, 533–540. https://doi.org/10.1007/978-981-15-0947-6_50
Article Google Scholar
Telmem, M., & Ghanou, Y. (2021). The convolutional neural networks for Amazigh speech recognition system. Telkomnika (Telecommunication Computing Electronics and Control),19(2), 515–522. https://doi.org/10.12928/TELKOMNIKA.v19i2.16793
Article Google Scholar
Vankdothu, R., Hameed, M. A., & Fatima, H. (2022). A brain tumor identification and classification using deep learning based on CNN-LSTM method. Computers and Electrical Engineering,101(November 2021), 107960. https://doi.org/10.1016/j.compeleceng.2022.107960
Article Google Scholar
Zealouk, O., Satori, H., Laaidi, N., Hamidi, M., & Satori, K. (2020). Noise effect on Amazigh digits in speech recognition system. International Journal of Speech Technology,23(4), 885–892. https://doi.org/10.1007/s10772-020-09764-1
Article Google Scholar

Download references

Funding

No funding was received for conducting this study. The authors declare they have no financial interests.

Author information

Authors and Affiliations

LSA laboratory, ENSAH, SOVIA Team, University of Abdelmalek Essaadi, Tetouan, Morocco
Mohamed Daouad & El Wardani Dadi
CEISIC, The Royal Institute of the Amazigh Culture, Rabat, Morocco
Fadoua Ataa Allah

Authors

Mohamed Daouad
View author publications
You can also search for this author in PubMed Google Scholar
Fadoua Ataa Allah
View author publications
You can also search for this author in PubMed Google Scholar
El Wardani Dadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Daouad.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Daouad, M., Allah, F.A. & Dadi, E.W. An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture. Int J Speech Technol 26, 775–787 (2023). https://doi.org/10.1007/s10772-023-10054-9

Download citation

Received: 22 June 2023
Accepted: 29 September 2023
Published: 11 October 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10772-023-10054-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture

Abstract

Access this article

Similar content being viewed by others

Recognition Method of Wa Language Isolated Words Based on Convolutional Neural Network

Isolated Word Automatic Speech Recognition System

An acoustic model and linguistic analysis for Malayalam disyllabic words: a low resource language

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture

Abstract

Access this article

Similar content being viewed by others

Recognition Method of Wa Language Isolated Words Based on Convolutional Neural Network

Isolated Word Automatic Speech Recognition System

An acoustic model and linguistic analysis for Malayalam disyllabic words: a low resource language

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation