Abstract
In this paper, we present a set of experiments aiming to improve the recognition of spoken digits for under-resourced dialects of the Maghrebi region, using a hybrid system. Indeed, integrating a Dialect Identification module into an Automatic Speech Recognition (ASR) system has shown its efficiency in previous works. In order to make the ASR system able to recognize digits spoken in different dialects, we trained our hybrid system on Moroccan Berber Dialect “MBD,” Moroccan Arabic Dialect “MAD,” and Algerian Arabic dialect “AAD,” in addition to Modern Standard Arabic. We have investigated five machine learning based classifiers and two deep learning models: the first one is based on Convolutional Neural Network (CNN), and the second one uses two pre-trained models: Residual Deep Neural Network (Resnet50 and Resnet101). The findings show that the CNN model outperforms the other proposed methods and consequently enhances the performance of spoken digit recognition system by 20% for both Algerian and Moroccan dialects.
Similar content being viewed by others
Notes
Kabyl is an Algerian Berber dialect.
mix-sys-1, mix-sys-2, and mix-sys-3: acoustic and language models have been built using a mixture of (MAD and AAD), (MAD, AAD, and MBD), and MAD, AAD, MBD, MSA) corpora, respectively.
References
Azim, M. A., Hussein, W., & Badr, N. L. (2021). Spoken arabic digits recognition system using convolutional neural network. In International Conference on Advanced Machine Learning Technologies and Applications (pp. 164–172). Springer.
Belgacem, M., Antoniadis, G., & Besacier, L. (2010). Automatic identification of Arabic dialects. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/pdf/719_Paper.pdf.
Bougrine, S., Cherroun, H., & Ziadi, D. (2018). Prosody-based spoken Algerian arabic dialect identification. Procedia Computer Science, 128, 9–17.
Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, P. A. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2–3), 210–229.
Chittaragi, N. B., Limaye, A., Chandana, N., Annappa, B., & Koolagudi, S. G. (2019). Automatic text-independent kannada dialect identification system. In Information Systems Design and Intelligent Applications (pp. 79–87). Springer.
Chittaragi, N. B., Prakash, A., & Koolagudi, S. G. (2018). Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arabian Journal for Science and Engineering, 43(8), 4289–4302.
El Ghazi, A., Daoui, C., Idrissi, N., Fakir, M., & Bouikhalene, B. (2011). Speech recognition system based on hidden markov model concerning the moroccan dialect Darija. Global Journal of Computer Science and Technology.
Ezzine, A., Satori, H., Hamidi, M., & Satori, K. (2020). Moroccan dialect speech recognition system based on cmu sphinxtools. In 2020 International Conference on Intelligent Systems and Computer Vision (ISCV) (pp. 1–5). IEEE.
Giannakopoulos, T. (2015). pyaudioanalysis: An open-source python library for audio signal analysis. PLoS ONE, 10(12), e0144610.
Hanani, A., & Naser, R. (2020). Spoken arabic dialect recognition using x-vectors. Natural Language Engineering, 26, 691–700.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Kakouros, S., Hiovain, K., Vainio, M., & Šimko, J. (2020). Dialect identification of spoken north s\(\backslash \)’ami language varieties using prosodic features. arXiv preprint arXiv:2003.10183.
Lachachi, N. E., & Adla, A. (2016). Two approaches-based l2-SVMs reduced to MEB problems for dialect identification. International Journal of Computational Vision and Robotics, 6(1–2), 1–18.
Liu, G. A., & Hansen, J. H. (2011). A systematic strategy for robust automatic dialect identification. In 2011 19th European Signal Processing Conference (pp. 2138–2141). IEEE.
Lounnas, K., Abbas, M., Teffahi, H., & Lichouri, M. (2019). A language identification system based on voxforge speech corpus. In International Conference on Advanced Machine Learning Technologies and Applications (pp. 529–534). Springer.
Lounnas, K., Demri, L., Falek, L., & Teffahi, H. (2018). automatic language identification for berber and arabic languages using prosodic features. In 2018 International Conference on Electrical Sciences and Technologies in Maghreb (CISTEM) (pp. 1–4). IEEE.
Lounnas, K., Satori, H., Teffahi, H., Abbas, M., & Lichouri, M. (2020). Cliasr: a combined automatic speech recognition and language identification system. In 2020 1st International Conference on Innovative Research in Applied Science Engineering and Technology (IRASET) (pp. 1–5). IEEE.
McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference (Vol. 8, pp. 18–25). Citeseer.
Mouaz, B., Abderrahim, B. H., & Abdelmajid, E. (2019). Speech recognition of Moroccan dialect using hidden markov models. Procedia Computer Science, 151, 985–991.
Najafian, M., Khurana, S., Shan, S., Ali, A., & Glass, J. (2018). Exploiting convolutional neural networks for phonotactic based dialect identification. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5174–5178). IEEE.
Nour-Eddine, L., & Abdelkader, A. (2015). Gmm-based maghreb dialect identification system. JIPS, 11(1), 22–38.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12, 2825–2830.
Satori, H., & ElHaoussi, F. (2014). Investigation amazigh speech recognition using CMU tools. International Journal of Speech Technology, 17(3), 235–243.
Sengupta, S., Yasmin, G., & Ghosal, A. (2019). Speaker recognition using occurrence pattern of speech signal. In Recent Trends in Signal and Image Processing (pp. 207–216). Springer.
Sergyan, S. (2008). Color histogram features based image classification in content-based image retrieval systems. In 2008 6th International Symposium on Applied Machine Intelligence and Informatics (pp. 221–224). IEEE.
Shon, S., Ali, A., Samih, Y., Mubarak, H., & Glass, J. (2020). Adi17: a fine-grained arabic dialect identification dataset. In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8244–8248). IEEE.
Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical Signal Processing and Control, 18, 80–90.
Terbeh, N., Maraoui, M., & Zrigui, M. (2018). Arabic dialect identification based on probabilistic-phonetic modeling. Computación y Sistemas, 22(3), 863–870.
Touazi, A., & Debyeche, M. (2017). An experimental framework for arabic digits speech recognition in noisy environments. International Journal of Speech Technology, 20(2), 205–224.
Wazir, A. S. M. B, & Chuah, J. H. (2019). Spoken arabic digits recognition using deep learning. In 2019 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS) (pp. 339–344). IEEE.
Zealouk, O., Satori, H., Hamidi, M., Laaidi, N., & Satori, K. (2018). Vocal parameters analysis of smoker using amazigh language. International Journal of Speech Technology, 21(1), 85–91.
Zerari, N., Abdelhamid, S., Bouzgou, H., & Raymond, C. (2018). Bi-directional recurrent end-to-end neural network classifier for spoken arab digit recognition. In 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP) (pp. 1–6). IEEE.
Žunić, J., Hirota, K., & Rosin, P. L. (2010). A hu moment invariant as a shape circularity measure. Pattern Recognition, 43(1), 47–57.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lounnas, K., Abbas, M., Lichouri, M. et al. Enhancement of spoken digits recognition for under-resourced languages: case of Algerian and Moroccan dialects. Int J Speech Technol 25, 443–455 (2022). https://doi.org/10.1007/s10772-022-09971-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-022-09971-y