Skip to main content

Advertisement

Log in

The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system

  • Research
  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Feature extraction is an essential phase in the development of Automatic Speech Recognition (ASR) systems. This study examines the performance of different deep neural network architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and (bi-LSTM) models for the Amazigh speech recognition system. When applied a several of feature extraction techniques, specifically Mel-Frequency Cepstral Coefficients (MFCC), Spectrograms, and Mel-Spectrograms, on the performance of different. The results show that the Bi-LSTM with Spectrograms achieved a maximum accuracy of 85%, giving the best performance in our Amazigh Speech Recognition (ASR) study. and we show that each feature type offers specific advantages, influenced by the particular neural network architecture employed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

No datasets were generated or analysed during the current study.

References

  • Abdelhamid, A. A., El-Kenawy, E. S. M., Alotaibi, B., Amer, G. M., Abdelkader, M. Y., Ibrahim, A., & Eid, M. M. (2022). Robust speech emotion recognition using CNN + LSTM based on stochastic fractal search optimization algorithm. IEEE Access, 10, 49265–49284.

    Article  Google Scholar 

  • Abdelmaksoud, E. R., Hassen, A., Hassan, N., & Hesham, M. (2021). Convolutional neural network for Arabic speech recognition. The Egyptian Journal of Language Engineering, 8(1), 27–38.

    Article  Google Scholar 

  • Ahmed, B., & Abdellah, Y. (2024). MDVC corpus: Empowering Moroccan Darija speech recognition. Indonesian Journal of Electrical Engineering and Computer Science, 34(1), 290–301. https://doi.org/10.11591/ijeecs.v34.i1.pp290-301

    Article  Google Scholar 

  • Alamsyah, R. D., & Suyanto, S. (2020, December). Speech gender classification using bidirectional long short term memory. 2020 3rd international seminar on research of information technology and intelligent systems (ISRITI) (pp. 646–649). IEEE.

  • Alkhatib, B., Eddin, M. M. K., & Syria, D. (2023). ASR features extraction using MFCC and LPC: A comparative study. Journal of Digital Information Management, 21(2), 39.

    Article  Google Scholar 

  • Almekhlafi, E., Moeen, A. M., Zhang, E., Wang, J., & Peng, J. (2022). A classification benchmark for Arabic alphabet phonemes with diacritics in deep neural networks. Computer Speech & Language, 71, 101274.

    Article  Google Scholar 

  • Ammar Aouchiche, I. R., Boumahdi, F., Madani, A., & Remmide, M. A. (2023). Hate speech prediction on social media. SN Computer Science, 4(3), 229.

    Article  Google Scholar 

  • Arias-Vergara, T., Klumpp, P., Vasquez-Correa, J. C., Nöth, E., Orozco-Arroyave, J. R., & Schuster, M. (2021). Multi-channel spectrograms for speech processing applications using deep learning methods. Pattern Analysis and Applications, 24, 423–431.

    Article  Google Scholar 

  • Bhangale, K., & Mohanaprasad, K. (2020, November). Speech emotion recognition using MEL frequency log spectrogram and deep convolutional neural network. In International conference on futuristic communication and network technologies (pp. 241–250). Springer.

  • Bogdanoski, K., Mishev, K., Simjanoska, M., & Trajanov, D. (2023). Exploring ASR models in low-resource languages: Use-case the Macedonian language. In International conference on deep learning theory and applications (pp. 254–268). Springer.

  • Boulal, H., Hamidi, M., Abarkan, M., & Barkani, J. (2023). Amazigh spoken digit recognition using a deep learning approach based on MFCC. International Journal of Electrical and Computer Engineering Systems, 14(7), 791–798.

    Article  Google Scholar 

  • Boulal, H., Hamidi, M., Abarkan, M., & Barkani, J. (2024). Amazigh CNN speech recognition system based on MEL spectrogram feature extraction method. International Journal of Speech Technology, 1–10.

  • Ćirić, D., Perić, Z., Nikolić, J., & Vučić, N. (2021, March). Audio signal mapping into spectrogram-based images for deep learning applications. In 2021 20th international symposium INFOTEH-JAHORINA (INFOTEH) (pp. 1–6). IEEE.

  • Demircan, S., & Örnek, H. K. (2020). Comparison of the effects of mel coefficients and spectrogram images via deep learning in emotion classification. Traitement du Signal.

  • Dhanjal, A. S., & Singh, W. (2024). A comprehensive survey on automatic speech recognition using neural networks. Multimedia Tools and Applications, 83(8), 23367–23412.

    Article  Google Scholar 

  • Gupta, V., Juyal, S., & Hu, Y. C. (2022). Understanding human emotions through speech spectrograms using deep neural network. The Journal of Supercomputing, 78(5), 6944–6973.

    Article  Google Scholar 

  • Joshi, S., & Chatterjee, S. (2023). Accuracy test on 2D CNN on entire MFCC and MEL-spectrogram with and without data augmentation. The Journal of Acoustical Society of India, 76.

  • Koduru, A., Valiveti, H. B., & Budati, A. K. (2020). Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, 23(1), 45–55.

    Article  Google Scholar 

  • Mahmood, A., & Köse, U. (2021). Speech recognition based on convolutional neural networks and MFCC algorithm. Advances in Artificial Intelligence Research, 1(1), 6–12.

  • Kim, Y., Kim, H., & Park, S. (2023). Advancements in ASR for Acoustic Di-versity. IEEE Transactions on Neural Networks, 35(1), 97-110.

  • Meghanani, A., Anoop, C. S., & Ramakrishnan, A. G. (2021). An exploration of log-mel spectrogram and MFCC features for Alzheimer’s dementia recognition from spontaneous speech. 2021 IEEE spoken language technology workshop (SLT) (pp. 670–677). IEEE.

  • Mehrish, A., Majumder, N., Bharadwaj, R., Mihalcea, R., & Poria, S. (2023). A review of deep learning techniques for speech processing. Information Fusion, 99, 101869.

    Article  Google Scholar 

  • Rathor, S., & Agrawal, S. (2021). A robust model for domain recognition of acoustic communication using bidirectional LSTM and deep neural network. Neural Computing and Applications, 33(17), 11223–11232.

    Article  Google Scholar 

  • Rawat, P., Bajaj, M., Vats, S., & Sharma, V. (2023). A comprehensive study based on MFCC and spectrogram for audio classification. Journal of Information and Optimization Sciences, 44(6), 1057–1074.

    Article  Google Scholar 

  • Revathi, A., Nagakrishnan, R., & Sasikaladevi, N. (2022). Comparative analysis of dysarthric speech recognition: Multiple features and robust templates. Multimedia Tools and Applications, 81(22), 31245–31259.

    Article  Google Scholar 

  • Seo, S., Kim, C., & Kim, J. H. (2022). Convolutional neural networks using log mel-spectrogram separation for audio event classification with unknown devices. Journal of Web Engineering, 21(2), 497–522.

    Google Scholar 

  • Strake, M., Defraene, B., Fluyt, K., Tirry, W., & Fingscheidt, T. (2020). Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration. EURASIP Journal on Advances in Signal Processing, 2020, 1–26.

    Article  Google Scholar 

  • Telmem, M., & Ghanou, Y. (2020). A comparative study of HMMs and CNN acoustic model in amazigh recognition system. In Embedded systems and artificial intelligence: Proceedings of ESAI 2019, Fez, Morocco (pp. 533–540). Springer.

  • Telmem, M., Laaidi, N., Ghanou, Y., Hamiane, S., & Satori, H. (2024). Comparative study of CNN, LSTM and hybrid CNN-LSTM model in Amazigh speech recognition using spectrogram feature extraction and different gender and age dataset. International Journal of Speech Technology, 1–13.

  • Ustubioglu, A., Ustubioglu, B., & Ulutas, G. (2023). Mel spectrogram-based audio forgery detection using CNN. Signal Image and Video Processing, 17(5), 2211–2219.

    Article  Google Scholar 

  • Wang, Q., Feng, C., Xu, Y., Zhong, H., & Sheng, V. S. (2020). A novel privacy-preserving speech recognition framework using bidirectional LSTM. Journal of Cloud Computing, 9(1), 36.

    Article  Google Scholar 

  • Yadav, S., Kumar, A., Yaduvanshi, A., & Meena, P. (2023). A review of feature extraction and classification techniques in speech recognition. SN Computer Science, 4(6), 777.

    Article  Google Scholar 

  • Zhang, T., Feng, G., Liang, J., & An, T. (2021). Acoustic scene classification based on mel spectrogram decomposition and model merging. Applied Acoustics, 182, 108258.

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Meryam Telmem: ABCDEFNaouar Laaidi: ABCDEFHassan Satori: ABCDEF All authors reviewed the manuscript.

Corresponding author

Correspondence to Meryam Telmem.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Telmem, M., Laaidi, N. & Satori, H. The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system. Int J Speech Technol 28, 299–312 (2025). https://doi.org/10.1007/s10772-025-10183-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-025-10183-3

Keywords