Abstract
In this study, we introduce an experimental framework for Moroccan dialect speech recognition under various additive noise conditions using the open-source tool PocketSphinx. We curated a corpus comprising the ten most commonly used greetings in the Moroccan dialect, extracted from telephone conversations. This corpus was recorded with 60 speakers (30 males and 30 females). Each speaker articulated each expression three times in natural and noisy conditions. Feature extraction utilized Mel Scale Cepstral Coefficients (MFCC), and acoustic modeling, based on monophony, was implemented using Hidden Markov Models (HMM). While automatic speech recognition systems demonstrate commendable performance in noise-free conditions, their efficacy noticeably diminishes in the presence of noise.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data supporting the findings of this study are available upon reasonable request. Researchers interested in accessing the data can contact corresponding author’s Abdelkbir ouisaadane for further information.
References
Al-Anzi, F. S., & AbuZeina, D. (2017). Exploring the language modeling toolkits for Arabic text. International Conference on Electrical and Computing Technologies and Applications (ICECTA), 2017, 1–4. https://doi.org/10.1109/ICECTA.2017.8251935
Aloqayli, F. M., & Alotaibi, Y. A. (2017). Spoken Arabic vowel recognition using ANN. European Modelling Symposium (EMS), 2017, 78–83. https://doi.org/10.1109/EMS.2017.24
Alotaibi, Y. A. (2012). Comparing ANN to HMM in implementing limited Arabic vocabulary ASR systems. International Journal of Speech Technology, 15(1), 25–32. https://doi.org/10.1007/s10772-011-9107-3
El Amrani, M. Y., Rahman, M. M. H., Wahiddin, M. R., & Shah, A. (2016). Building CMU sphinx language model for the holy quran using simplified Arabic phonemes. Egyptian Informatics Journal, 17(3), 305–314. https://doi.org/10.1016/j.eij.2016.04.002
El Ouahabi, S., Atounti, M., & Bellouki, M. (2017). Building HMM independent isolated speech recognizer system for amazigh language. In Á. Rocha, M. Serrhini, & C. Felgueiras (Eds.), Europe and MENA cooperation advances in information and communication technologies (pp. 299–307). Springer.
Elmahdy, M., Hasegawa-Johnson, M., & Mustafawi, E (2012). A baseline speech recognition system for levantine colloquial Arabic. Proceedings of ESOLEC.
Ghazi, A. El, Daoui, C., 523, P. B., & Mellal, M. B. (2012). Automatic speech recognition system concerning the Moroccan dialects (Darija and Tamazight). https://api.semanticscholar.org/CorpusID:212611829
Gupta, K., & Gupta, D. (2016). An analysis on LPC, RASTA and MFCC techniques in automatic speech recognition system. In 2016 6th international conference—cloud system and big data engineering (Confluence), (pp.493–497). https://doi.org/10.1109/CONFLUENCE.2016.7508170
Haton, J.-P., Cerisara, C., Fohr, D., Laprie, Y., & Smaïli, K. (2006). Reconnaissance automatique de la parole du signal à son interprétation. UniverSciences (Paris).
Helali, W., Hajaiej, Z., & Cherif, A. (2018). Arabic corpus implementation: Application to speech recognition. In 2018 international conference on advanced systems and electric technologies (IC_ASET), (pp.50–53). https://doi.org/10.1109/ASET.2018.8379833
Huggins-Daines, D., Kumar, M., Chan, A., Black, A. W., Ravishankar, M., & Rudnicky, A. I. (2006). PocketSphinx: A free, real-time continuous speech recognition system for hand-held devices. In 2006 IEEE international conference on acoustics speech and signal processing proceedings, 1, (pp. I–I). https://doi.org/10.1109/ICASSP.2006.1659988
Karpagavalli, S., Deepika, R., Kokila, P., Usha Rani, K., & Chandra, E. (2012). Isolated Tamil digit speech recognition using template based and hmm based approaches. In P. Venkata Krishna, M. Rajasekhara Babu, & E. Ariwa (Eds.), Global trends in information systems and software applications. Springer.
Kohshelan, & Wahid, N. (2014). Improvement of audio feature extraction techniques in traditional Indian musical instrument. In Recent advances on soft computing and data mining: Proceedings of the first international conference on soft computing and data mining (SCDM-2014), Universiti Tun Hussein Onn Malaysia, Johor, MalaysiaJune 16th-18th, 2014. Springer. https://doi.org/10.1007/978-3-319-07692-8_48
Mouaz, B., Abderrahim, B. H., & Abdelmajid, E. (2019). Speech recognition of Moroccan dialect using Hidden Markov Models. Procedia Computer Science, 151, 985–991. https://doi.org/10.1016/j.procs.2019.04.138
Nasereddin, H. H. O., & Omari, A. A. R. (2017). Classification techniques for automatic speech recognition (ASR) algorithms used with real time speech translation. Computing Conference, 2017, 200–207. https://doi.org/10.1109/SAI.2017.8252104
Nikolay, S. (n.d.). Training an acoustic model for CMUSphinx. https://cmusphinx.github.io/wiki/tutorialam/
Rabiner, L. R. (1989). A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. https://doi.org/10.1109/5.18626
Rabiner, L. R., & Juang, B.-H. F. (1993). Fundamentals of speech recognition. Prentice Hall Signal Processing Series. https://api.semanticscholar.org/CorpusID:2680950
Satori, H., Harti, M., & Chenfour, N. (2007). Arabic speech recognition system based on CMUSphinx. International Symposium on Computational Intelligence and Intelligent Informatics, 2007, 31–35. https://doi.org/10.1109/ISCIII.2007.367358
Sturim, D. E., Campbell, W. M., & Reynolds, D. A. (2007). Classification methods for speaker recognition. In C. Müller (Ed.), Speaker classification (pp. 278–297). Springer. https://doi.org/10.1007/978-3-540-74200-5_16
Telmem, M., & Ghanou, Y. (2018). Amazigh speech recognition system based on CMUSphinx. In M. B. Ahmed & A. A. Boudhir (Eds), Innovations in smart cities and applications: Proceedings of the 2nd Mediterranean symposium on smart city applications. Springer.
Touazi, A., & Debyeche, M. (2017). An experimental framework for Arabic digits speech recognition in noisy environments. International Journal of Speech Technology, 20(2), 205–224. https://doi.org/10.1007/s10772-017-9400-x
Umesh, S. (2011). Studies on inter-speaker variability in speech and its application in automatic speech recognition. Sadhana, 36(5), 853–883.
Wolf, M., & Nadeu, C. (2008). Evaluation of different feature extraction methods for speech recognition in car environment. In 2008 15th international conference on systems, signals and image processing, (pp. 359–362). https://doi.org/10.1109/IWSSIP.2008.4604441
Yin, H., Hohmann, V., & Nadeu, C. (2011). Acoustic features for speech recognition based on gammatone filterbank and instantaneous frequency. Speech Communication, 53(5), 707–715. https://doi.org/10.1016/j.specom.2010.04.008
Yu, D., & Deng, L. (2015). Automatic speech recognition: A deep learning approach. Springer.
Zealouk, O., Hamidi, M., Satori, H., & Satori, K. (2020). Amazigh digits speech recognition system under noise car environment. In C (Eds). Embedded systems and artificial intelligence, Proceedings of ESAI, Fez, Morocco. Springer.
Zealouk, O., Satori, H., Hamidi, M., & Satori, K. (2020b). Pathological detection using HMM speech recognition-based amazigh digits. Springer.
Funding
Funding for this research is not applicable. This study was conducted without any external financial support, and the authors did not receive any funding from external agencies, organizations, or sponsors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, I state that there is no conflict of interests as defined by Springer.
Ethical approval
Ethical approval for this study is not applicable, as it does not involve human subjects, animal subjects, or any sensitive personal data.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ouisaadane, A., Safi, S. & Frikel, M. An experiment of Moroccan dialect speech recognition in noisy environments using PocketSphinx. Int J Speech Technol 27, 329–339 (2024). https://doi.org/10.1007/s10772-024-10103-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-024-10103-x