Speech emotion recognition using MFCC-based entropy feature

Mishra, Siba Prasad; Warule, Pankaj; Deb, Suman

doi:10.1007/s11760-023-02716-7

Speech emotion recognition using MFCC-based entropy feature

Original Paper
Published: 22 August 2023

Volume 18, pages 153–161, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

551 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The prime objective of speech emotion recognition is to accurately recognize the emotion from the speech signal. It is a challenging task to accomplish. Speech emotion recognition (SER) has many applications, including medicine, online marketing, strengthening human–computer interaction (HCI), online education, and many more. Hence, it has been a topic of interest for many researchers for last three decades. The researchers used different methodologies to improve the classification accuracy of emotions. In this study, we tried to improve emotion classification accuracy using mel-frequency cepstral coefficient (MFCC)-based entropy features. First, we extracted the MFCC coefficient matrix from every speech of the EMO-DB, RAVDESS and SAVEE datasets, and then we calculated the proposed features: statistical mean (\(\mathrm{{MFCC}}_{\mathrm{{mean}}}\)), MFCC-based approximate entropy (\(\mathrm{{MFCC}}_{\mathrm{{AE}}}\)), and MFCC-based spectral entropy (\(\mathrm{{MFCC}}_{\mathrm{{SE}}}\)), from the MFCC coefficient matrix of every utterance. The performance of the proposed features is accessed using the DNN classifier. We achieved a classification accuracy of 87.48%, 75.9%, and 79.64% using the combination of \(\mathrm{{MFCC}}_{\mathrm{{mean}}}\) and \(\mathrm{{MFCC}}_{\mathrm{{SE}}}\) features and obtained classification accuracies of 85.61%, 77.54%, and 76.26% using the combination of \(\mathrm{{MFCC}}_{\mathrm{{mean}}}\), \(\mathrm{{MFCC}}_{\mathrm{{AE}}}\), and \(\mathrm{{MFCC}}_{\mathrm{{SE}}}\) features for the EMO-DB, RAVDESS, and SAVEE datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Improving Speech Emotion Recognition System Using Spectral and Prosodic Features

MFCC Global Features Selection in Improving Speech Emotion Recognition Rate

A Comparative Study on MFCC and Fundamental Frequency Based Speech Emotion Classification

Availability of data and materials

The data that support the findings of this study are openly available.

References

Zão, L., Cavalcante, D., Coelho, R.: Time-frequency feature and ams-gmm mask for acoustic emotion classification. IEEE signal processing letters 21(5), 620–624 (2014)
Article Google Scholar
Ancilin, J., Milton, A.: Improved speech emotion recognition with mel frequency magnitude coefficient. Applied Acoustics 179, 108046 (2021)
Article Google Scholar
Mishra, S. P., Warule, P., Deb, S.: Deep learning based emotion classification using mel frequency magnitude coefficient, in: 2023 1st International Conference on Innovations in High Speed Communication and Signal Processing (IHCSP), IEEE, (2023), pp. 93–98
Warule, P., Mishra, S.P., Deb, S., Krajewski, J.: Sinusoidal model-based diagnosis of the common cold from the speech signal. Biomedical Signal Processing and Control 83, 104653 (2023)
Article Google Scholar
Zhao, X., Zhang, S., Lei, B.: Robust emotion recognition in noisy speech via sparse representation. Neural Computing and Applications 24(7), 1539–1553 (2014)
Article Google Scholar
Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W., Speech emotion recognition from spectrograms with deep convolutional neural network, in: 2017 international conference on platform technology and service (PlatCon), IEEE, 1–5 (2017)
Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control 59, 101894 (2020)
Article Google Scholar
Misra, H., Ikbal, S., Bourlard, H., Hermansky, H.: Spectral entropy based feature for robust asr, in: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, IEEE, 2004, pp. I–193
Huang, C., Liang, R., Wang, Q., Xi, J., Zha, C., Zhao, L.: Practical speech emotion recognition based on online learning: From acted data to elicited data, Mathematical Problems in Engineering 2013 (2013)
Wu, C., Huang, C., Chen, H.: Text-independent speech emotion recognition using frequency adaptive features. Multimedia Tools and Applications 77, 24353–24363 (2018)
Article Google Scholar
Huang, C., Song, B., Zhao, L.: Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering. International Journal of Speech Technology 19, 805–816 (2016)
Article Google Scholar
Xiaodan, Z., Chengwei, H., Li, Z., Cairong, Z.: Recognition of practical speech emotion using improved shuffled frog leaping algorithm. Chinese Journal of Acoustics 33(4), 441–441 (2014)
Google Scholar
Deb, S., Dandapat, S.: Emotion classification using residual sinusoidal peak amplitude, in: 2016 International conference on signal processing and communications (SPCOM), IEEE, 1–5 (2016)
Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of mfcc. Journal of Computer science and Technology 16, 582–589 (2001)
Mohammadi, M., Mohammadi, H. R. S.: Robust features fusion for text independent speaker verification enhancement in noisy environments, in: 2017 Iranian Conference on Electrical Engineering (ICEE), IEEE, 2017, pp. 1863–1868
Deb, S., Dandapat, S., Krajewski, J.: Analysis and classification of cold speech using variational mode decomposition. IEEE transactions on affective computing 11(2), 296–307 (2017)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. acm sigmobile mob. Comput. Commun. Rev 5(1), 3–55 (2001)
Article Google Scholar
Metzger, R. A., Doherty, J. F., Jenkins, D. M.: Using approximate entropy as a speech quality measure for a speaker recognition system, in: 2016 Annual Conference on Information Science and Systems (CISS), IEEE, (2016), pp. 292–297
Fu, L., He, Z.Y., Mai, R.K., Bo, Z., Approximate entropy and its application to fault detection and identification in power swing, in: 2009 IEEE Power & Energy Society General Meeting, IEEE, 1–8 (2009)
Andayani, F., Theng, L.B., Tsun, M.T., Chua, C.: Hybrid lstm-transformer model for emotion recognition from speech audio files. IEEE Access 10, 36018–36027 (2022)
Article Google Scholar
Bhavan, A., Chauhan, P., Shah, R.R., et al.: Bagged support vector machines for emotion recognition from speech. Knowledge-Based Systems 184, 104886 (2019)
Article Google Scholar
Scheidwasser-Clow, N., Kegler, M., Beckmann, P., Cernak, M.: Serab: A multi-lingual benchmark for speech emotion recognition, in: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, (2022), pp. 7697–7701
Atmaja, B.T., Akagi, M., On the differences between song and speech emotion recognition: Effect of feature sets, feature types, and classifiers, in: 2020 IEEE REGION 10 CONFERENCE (TENCON), IEEE, 968–972 (2020)
Luna-Jiménez, C., Kleinlein, R., Griol, D., Callejas, Z., Montero, J.M., Fernández-Martínez, F.: A proposal for multimodal emotion recognition using aural transformers and action units on ravdess dataset. Applied Sciences 12(1), 327 (2021)
Article Google Scholar
Wagner, J., Triantafyllopoulos, A., Wierstorf, H., Schmitt, M., Eyben, F., Schuller, B.: Model for dimensional speech emotion recognition based on wav2vec 2.0 (1.1. 0) (2022)
Flower, T.M.L., Jaya, T.: Speech emotion recognition using ramanujan fourier transform. Applied Acoustics 201, 109133 (2022)
Article Google Scholar
Özseven, T.: A novel feature selection method for speech emotion recognition. Applied Acoustics 146, 320–326 (2019)
Article Google Scholar
Zhao, J., Mao, X., Chen, L.: Speech emotion recognition using deep 1d & 2d cnn lstm networks. Biomedical signal processing and control 47, 312–323 (2019)
Article Google Scholar
Özseven, T.: Investigation of the effect of spectrogram images and different texture analysis methods on speech emotion recognition. Applied Acoustics 142, 70–77 (2018)
Article Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat, India
Siba Prasad Mishra, Pankaj Warule & Suman Deb

Authors

Siba Prasad Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Pankaj Warule
View author publications
You can also search for this author in PubMed Google Scholar
Suman Deb
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SPM conducted research and wrote the paper. PW and SD participated in the writing and preparation of the paper. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Siba Prasad Mishra.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mishra, S.P., Warule, P. & Deb, S. Speech emotion recognition using MFCC-based entropy feature. SIViP 18, 153–161 (2024). https://doi.org/10.1007/s11760-023-02716-7

Download citation

Received: 03 January 2023
Revised: 16 July 2023
Accepted: 24 July 2023
Published: 22 August 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11760-023-02716-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech emotion recognition using MFCC-based entropy feature

Abstract

Access this article

Similar content being viewed by others

Improving Speech Emotion Recognition System Using Spectral and Prosodic Features

MFCC Global Features Selection in Improving Speech Emotion Recognition Rate

A Comparative Study on MFCC and Fundamental Frequency Based Speech Emotion Classification

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech emotion recognition using MFCC-based entropy feature

Abstract

Access this article

Similar content being viewed by others

Improving Speech Emotion Recognition System Using Spectral and Prosodic Features

MFCC Global Features Selection in Improving Speech Emotion Recognition Rate

A Comparative Study on MFCC and Fundamental Frequency Based Speech Emotion Classification

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation