Skip to main content
Log in

Enhancement of spoken digits recognition for under-resourced languages: case of Algerian and Moroccan dialects

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we present a set of experiments aiming to improve the recognition of spoken digits for under-resourced dialects of the Maghrebi region, using a hybrid system. Indeed, integrating a Dialect Identification module into an Automatic Speech Recognition (ASR) system has shown its efficiency in previous works. In order to make the ASR system able to recognize digits spoken in different dialects, we trained our hybrid system on Moroccan Berber Dialect “MBD,” Moroccan Arabic Dialect “MAD,” and Algerian Arabic dialect “AAD,” in addition to Modern Standard Arabic. We have investigated five machine learning based classifiers and two deep learning models: the first one is based on Convolutional Neural Network (CNN), and the second one uses two pre-trained models: Residual Deep Neural Network (Resnet50 and Resnet101). The findings show that the CNN model outperforms the other proposed methods and consequently enhances the performance of spoken digit recognition system by 20% for both Algerian and Moroccan dialects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Kabyl is an Algerian Berber dialect.

  2. http://www.fon.hum.uva.nl/praat/.

  3. https://www.audacityteam.org.

  4. https://github.com/tyiannak/pyAudioAnalysis.

  5. https://librosa.org/doc/latest/index.html.

  6. https://github.com/mtobeiyf/audio-classification.

  7. mix-sys-1, mix-sys-2, and mix-sys-3: acoustic and language models have been built using a mixture of (MAD and AAD), (MAD, AAD, and MBD), and MAD, AAD, MBD, MSA) corpora, respectively.

References

  • Azim, M. A., Hussein, W., & Badr, N. L. (2021). Spoken arabic digits recognition system using convolutional neural network. In International Conference on Advanced Machine Learning Technologies and Applications (pp. 164–172). Springer.

  • Belgacem, M., Antoniadis, G., & Besacier, L. (2010). Automatic identification of Arabic dialects. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/pdf/719_Paper.pdf.

  • Bougrine, S., Cherroun, H., & Ziadi, D. (2018). Prosody-based spoken Algerian arabic dialect identification. Procedia Computer Science, 128, 9–17.

    Article  Google Scholar 

  • Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, P. A. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2–3), 210–229.

    Article  Google Scholar 

  • Chittaragi, N. B., Limaye, A., Chandana, N., Annappa, B., & Koolagudi, S. G. (2019). Automatic text-independent kannada dialect identification system. In Information Systems Design and Intelligent Applications (pp. 79–87). Springer.

  • Chittaragi, N. B., Prakash, A., & Koolagudi, S. G. (2018). Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arabian Journal for Science and Engineering, 43(8), 4289–4302.

    Article  Google Scholar 

  • El Ghazi, A., Daoui, C., Idrissi, N., Fakir, M., & Bouikhalene, B. (2011). Speech recognition system based on hidden markov model concerning the moroccan dialect Darija. Global Journal of Computer Science and Technology.

  • Ezzine, A., Satori, H., Hamidi, M., & Satori, K. (2020). Moroccan dialect speech recognition system based on cmu sphinxtools. In 2020 International Conference on Intelligent Systems and Computer Vision (ISCV) (pp. 1–5). IEEE.

  • Giannakopoulos, T. (2015). pyaudioanalysis: An open-source python library for audio signal analysis. PLoS ONE, 10(12), e0144610.

    Article  Google Scholar 

  • Hanani, A., & Naser, R. (2020). Spoken arabic dialect recognition using x-vectors. Natural Language Engineering, 26, 691–700.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Kakouros, S., Hiovain, K., Vainio, M., & Šimko, J. (2020). Dialect identification of spoken north s\(\backslash \)’ami language varieties using prosodic features. arXiv preprint arXiv:2003.10183.

  • Lachachi, N. E., & Adla, A. (2016). Two approaches-based l2-SVMs reduced to MEB problems for dialect identification. International Journal of Computational Vision and Robotics, 6(1–2), 1–18.

    Article  Google Scholar 

  • Liu, G. A., & Hansen, J. H. (2011). A systematic strategy for robust automatic dialect identification. In 2011 19th European Signal Processing Conference (pp. 2138–2141). IEEE.

  • Lounnas, K., Abbas, M., Teffahi, H., & Lichouri, M. (2019). A language identification system based on voxforge speech corpus. In International Conference on Advanced Machine Learning Technologies and Applications (pp. 529–534). Springer.

  • Lounnas, K., Demri, L., Falek, L., & Teffahi, H. (2018). automatic language identification for berber and arabic languages using prosodic features. In 2018 International Conference on Electrical Sciences and Technologies in Maghreb (CISTEM) (pp. 1–4). IEEE.

  • Lounnas, K., Satori, H., Teffahi, H., Abbas, M., & Lichouri, M. (2020). Cliasr: a combined automatic speech recognition and language identification system. In 2020 1st International Conference on Innovative Research in Applied Science Engineering and Technology (IRASET) (pp. 1–5). IEEE.

  • McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference (Vol. 8, pp. 18–25). Citeseer.

  • Mouaz, B., Abderrahim, B. H., & Abdelmajid, E. (2019). Speech recognition of Moroccan dialect using hidden markov models. Procedia Computer Science, 151, 985–991.

    Article  Google Scholar 

  • Najafian, M., Khurana, S., Shan, S., Ali, A., & Glass, J. (2018). Exploiting convolutional neural networks for phonotactic based dialect identification. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5174–5178). IEEE.

  • Nour-Eddine, L., & Abdelkader, A. (2015). Gmm-based maghreb dialect identification system. JIPS, 11(1), 22–38.

    Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12, 2825–2830.

    MathSciNet  MATH  Google Scholar 

  • Satori, H., & ElHaoussi, F. (2014). Investigation amazigh speech recognition using CMU tools. International Journal of Speech Technology, 17(3), 235–243.

    Article  Google Scholar 

  • Sengupta, S., Yasmin, G., & Ghosal, A. (2019). Speaker recognition using occurrence pattern of speech signal. In Recent Trends in Signal and Image Processing (pp. 207–216). Springer.

  • Sergyan, S. (2008). Color histogram features based image classification in content-based image retrieval systems. In 2008 6th International Symposium on Applied Machine Intelligence and Informatics (pp. 221–224). IEEE.

  • Shon, S., Ali, A., Samih, Y., Mubarak, H., & Glass, J. (2020). Adi17: a fine-grained arabic dialect identification dataset. In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8244–8248). IEEE.

  • Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical Signal Processing and Control, 18, 80–90.

    Article  Google Scholar 

  • Terbeh, N., Maraoui, M., & Zrigui, M. (2018). Arabic dialect identification based on probabilistic-phonetic modeling. Computación y Sistemas, 22(3), 863–870.

    Article  Google Scholar 

  • Touazi, A., & Debyeche, M. (2017). An experimental framework for arabic digits speech recognition in noisy environments. International Journal of Speech Technology, 20(2), 205–224.

    Article  Google Scholar 

  • Wazir, A. S. M. B, & Chuah, J. H. (2019). Spoken arabic digits recognition using deep learning. In 2019 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS) (pp. 339–344). IEEE.

  • Zealouk, O., Satori, H., Hamidi, M., Laaidi, N., & Satori, K. (2018). Vocal parameters analysis of smoker using amazigh language. International Journal of Speech Technology, 21(1), 85–91.

    Article  Google Scholar 

  • Zerari, N., Abdelhamid, S., Bouzgou, H., & Raymond, C. (2018). Bi-directional recurrent end-to-end neural network classifier for spoken arab digit recognition. In 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP) (pp. 1–6). IEEE.

  • Žunić, J., Hirota, K., & Rosin, P. L. (2010). A hu moment invariant as a shape circularity measure. Pattern Recognition, 43(1), 47–57.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khaled Lounnas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lounnas, K., Abbas, M., Lichouri, M. et al. Enhancement of spoken digits recognition for under-resourced languages: case of Algerian and Moroccan dialects. Int J Speech Technol 25, 443–455 (2022). https://doi.org/10.1007/s10772-022-09971-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-022-09971-y

Keywords

Navigation