The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system

Telmem, Meryam; Laaidi, Naouar; Satori, Hassan

doi:10.1007/s10772-025-10183-3

The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system

Research
Published: 26 March 2025

Volume 28, pages 299–312, (2025)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Meryam Telmem¹,
Naouar Laaidi² &
Hassan Satori²

127 Accesses
Explore all metrics

Abstract

Feature extraction is an essential phase in the development of Automatic Speech Recognition (ASR) systems. This study examines the performance of different deep neural network architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and (bi-LSTM) models for the Amazigh speech recognition system. When applied a several of feature extraction techniques, specifically Mel-Frequency Cepstral Coefficients (MFCC), Spectrograms, and Mel-Spectrograms, on the performance of different. The results show that the Bi-LSTM with Spectrograms achieved a maximum accuracy of 85%, giving the best performance in our Amazigh Speech Recognition (ASR) study. and we show that each feature type offers specific advantages, influenced by the particular neural network architecture employed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Group Attack Dingo Optimizer for enhancing speech recognition in noisy environments

Article 21 December 2023

Comparative study of CNN, LSTM and hybrid CNN-LSTM model in amazigh speech recognition using spectrogram feature extraction and different gender and age dataset

Article 29 October 2024

Automatic speech recognition: a survey

Article 10 November 2020

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

No datasets were generated or analysed during the current study.

References

Abdelhamid, A. A., El-Kenawy, E. S. M., Alotaibi, B., Amer, G. M., Abdelkader, M. Y., Ibrahim, A., & Eid, M. M. (2022). Robust speech emotion recognition using CNN + LSTM based on stochastic fractal search optimization algorithm. IEEE Access, 10, 49265–49284.
Article Google Scholar
Abdelmaksoud, E. R., Hassen, A., Hassan, N., & Hesham, M. (2021). Convolutional neural network for Arabic speech recognition. The Egyptian Journal of Language Engineering, 8(1), 27–38.
Article Google Scholar
Ahmed, B., & Abdellah, Y. (2024). MDVC corpus: Empowering Moroccan Darija speech recognition. Indonesian Journal of Electrical Engineering and Computer Science, 34(1), 290–301. https://doi.org/10.11591/ijeecs.v34.i1.pp290-301
Article Google Scholar
Alamsyah, R. D., & Suyanto, S. (2020, December). Speech gender classification using bidirectional long short term memory. 2020 3rd international seminar on research of information technology and intelligent systems (ISRITI) (pp. 646–649). IEEE.
Alkhatib, B., Eddin, M. M. K., & Syria, D. (2023). ASR features extraction using MFCC and LPC: A comparative study. Journal of Digital Information Management, 21(2), 39.
Article Google Scholar
Almekhlafi, E., Moeen, A. M., Zhang, E., Wang, J., & Peng, J. (2022). A classification benchmark for Arabic alphabet phonemes with diacritics in deep neural networks. Computer Speech & Language, 71, 101274.
Article Google Scholar
Ammar Aouchiche, I. R., Boumahdi, F., Madani, A., & Remmide, M. A. (2023). Hate speech prediction on social media. SN Computer Science, 4(3), 229.
Article Google Scholar
Arias-Vergara, T., Klumpp, P., Vasquez-Correa, J. C., Nöth, E., Orozco-Arroyave, J. R., & Schuster, M. (2021). Multi-channel spectrograms for speech processing applications using deep learning methods. Pattern Analysis and Applications, 24, 423–431.
Article Google Scholar
Bhangale, K., & Mohanaprasad, K. (2020, November). Speech emotion recognition using MEL frequency log spectrogram and deep convolutional neural network. In International conference on futuristic communication and network technologies (pp. 241–250). Springer.
Bogdanoski, K., Mishev, K., Simjanoska, M., & Trajanov, D. (2023). Exploring ASR models in low-resource languages: Use-case the Macedonian language. In International conference on deep learning theory and applications (pp. 254–268). Springer.
Boulal, H., Hamidi, M., Abarkan, M., & Barkani, J. (2023). Amazigh spoken digit recognition using a deep learning approach based on MFCC. International Journal of Electrical and Computer Engineering Systems, 14(7), 791–798.
Article Google Scholar
Boulal, H., Hamidi, M., Abarkan, M., & Barkani, J. (2024). Amazigh CNN speech recognition system based on MEL spectrogram feature extraction method. International Journal of Speech Technology, 1–10.
Ćirić, D., Perić, Z., Nikolić, J., & Vučić, N. (2021, March). Audio signal mapping into spectrogram-based images for deep learning applications. In 2021 20th international symposium INFOTEH-JAHORINA (INFOTEH) (pp. 1–6). IEEE.
Demircan, S., & Örnek, H. K. (2020). Comparison of the effects of mel coefficients and spectrogram images via deep learning in emotion classification. Traitement du Signal.
Dhanjal, A. S., & Singh, W. (2024). A comprehensive survey on automatic speech recognition using neural networks. Multimedia Tools and Applications, 83(8), 23367–23412.
Article Google Scholar
Gupta, V., Juyal, S., & Hu, Y. C. (2022). Understanding human emotions through speech spectrograms using deep neural network. The Journal of Supercomputing, 78(5), 6944–6973.
Article Google Scholar
Joshi, S., & Chatterjee, S. (2023). Accuracy test on 2D CNN on entire MFCC and MEL-spectrogram with and without data augmentation. The Journal of Acoustical Society of India, 76.
Koduru, A., Valiveti, H. B., & Budati, A. K. (2020). Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology, 23(1), 45–55.
Article Google Scholar
Mahmood, A., & Köse, U. (2021). Speech recognition based on convolutional neural networks and MFCC algorithm. Advances in Artificial Intelligence Research, 1(1), 6–12.
Kim, Y., Kim, H., & Park, S. (2023). Advancements in ASR for Acoustic Di-versity. IEEE Transactions on Neural Networks, 35(1), 97-110.
Meghanani, A., Anoop, C. S., & Ramakrishnan, A. G. (2021). An exploration of log-mel spectrogram and MFCC features for Alzheimer’s dementia recognition from spontaneous speech. 2021 IEEE spoken language technology workshop (SLT) (pp. 670–677). IEEE.
Mehrish, A., Majumder, N., Bharadwaj, R., Mihalcea, R., & Poria, S. (2023). A review of deep learning techniques for speech processing. Information Fusion, 99, 101869.
Article Google Scholar
Rathor, S., & Agrawal, S. (2021). A robust model for domain recognition of acoustic communication using bidirectional LSTM and deep neural network. Neural Computing and Applications, 33(17), 11223–11232.
Article Google Scholar
Rawat, P., Bajaj, M., Vats, S., & Sharma, V. (2023). A comprehensive study based on MFCC and spectrogram for audio classification. Journal of Information and Optimization Sciences, 44(6), 1057–1074.
Article Google Scholar
Revathi, A., Nagakrishnan, R., & Sasikaladevi, N. (2022). Comparative analysis of dysarthric speech recognition: Multiple features and robust templates. Multimedia Tools and Applications, 81(22), 31245–31259.
Article Google Scholar
Seo, S., Kim, C., & Kim, J. H. (2022). Convolutional neural networks using log mel-spectrogram separation for audio event classification with unknown devices. Journal of Web Engineering, 21(2), 497–522.
Google Scholar
Strake, M., Defraene, B., Fluyt, K., Tirry, W., & Fingscheidt, T. (2020). Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration. EURASIP Journal on Advances in Signal Processing, 2020, 1–26.
Article Google Scholar
Telmem, M., & Ghanou, Y. (2020). A comparative study of HMMs and CNN acoustic model in amazigh recognition system. In Embedded systems and artificial intelligence: Proceedings of ESAI 2019, Fez, Morocco (pp. 533–540). Springer.
Telmem, M., Laaidi, N., Ghanou, Y., Hamiane, S., & Satori, H. (2024). Comparative study of CNN, LSTM and hybrid CNN-LSTM model in Amazigh speech recognition using spectrogram feature extraction and different gender and age dataset. International Journal of Speech Technology, 1–13.
Ustubioglu, A., Ustubioglu, B., & Ulutas, G. (2023). Mel spectrogram-based audio forgery detection using CNN. Signal Image and Video Processing, 17(5), 2211–2219.
Article Google Scholar
Wang, Q., Feng, C., Xu, Y., Zhong, H., & Sheng, V. S. (2020). A novel privacy-preserving speech recognition framework using bidirectional LSTM. Journal of Cloud Computing, 9(1), 36.
Article Google Scholar
Yadav, S., Kumar, A., Yaduvanshi, A., & Meena, P. (2023). A review of feature extraction and classification techniques in speech recognition. SN Computer Science, 4(6), 777.
Article Google Scholar
Zhang, T., Feng, G., Liang, J., & An, T. (2021). Acoustic scene classification based on mel spectrogram decomposition and model merging. Applied Acoustics, 182, 108258.
Article Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Université Moulay Ismail de Meknes, Meknes, Morocco
Meryam Telmem
Sidi Mohamed Ben Abdellah University, Fes, Morocco
Naouar Laaidi & Hassan Satori

Authors

Meryam Telmem
View author publications
You can also search for this author inPubMed Google Scholar
Naouar Laaidi
View author publications
You can also search for this author inPubMed Google Scholar
Hassan Satori
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Meryam Telmem: ABCDEFNaouar Laaidi: ABCDEFHassan Satori: ABCDEF All authors reviewed the manuscript.

Corresponding author

Correspondence to Meryam Telmem.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Telmem, M., Laaidi, N. & Satori, H. The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system. Int J Speech Technol 28, 299–312 (2025). https://doi.org/10.1007/s10772-025-10183-3

Download citation

Received: 12 November 2024
Accepted: 13 March 2025
Published: 26 March 2025
Issue Date: March 2025
DOI: https://doi.org/10.1007/s10772-025-10183-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Group Attack Dingo Optimizer for enhancing speech recognition in noisy environments

Comparative study of CNN, LSTM and hybrid CNN-LSTM model in amazigh speech recognition using spectrogram feature extraction and different gender and age dataset

Automatic speech recognition: a survey

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now