Abstract
The prime objective of speech emotion recognition is to accurately recognize the emotion from the speech signal. It is a challenging task to accomplish. Speech emotion recognition (SER) has many applications, including medicine, online marketing, strengthening human–computer interaction (HCI), online education, and many more. Hence, it has been a topic of interest for many researchers for last three decades. The researchers used different methodologies to improve the classification accuracy of emotions. In this study, we tried to improve emotion classification accuracy using mel-frequency cepstral coefficient (MFCC)-based entropy features. First, we extracted the MFCC coefficient matrix from every speech of the EMO-DB, RAVDESS and SAVEE datasets, and then we calculated the proposed features: statistical mean (\(\mathrm{{MFCC}}_{\mathrm{{mean}}}\)), MFCC-based approximate entropy (\(\mathrm{{MFCC}}_{\mathrm{{AE}}}\)), and MFCC-based spectral entropy (\(\mathrm{{MFCC}}_{\mathrm{{SE}}}\)), from the MFCC coefficient matrix of every utterance. The performance of the proposed features is accessed using the DNN classifier. We achieved a classification accuracy of 87.48%, 75.9%, and 79.64% using the combination of \(\mathrm{{MFCC}}_{\mathrm{{mean}}}\) and \(\mathrm{{MFCC}}_{\mathrm{{SE}}}\) features and obtained classification accuracies of 85.61%, 77.54%, and 76.26% using the combination of \(\mathrm{{MFCC}}_{\mathrm{{mean}}}\), \(\mathrm{{MFCC}}_{\mathrm{{AE}}}\), and \(\mathrm{{MFCC}}_{\mathrm{{SE}}}\) features for the EMO-DB, RAVDESS, and SAVEE datasets, respectively.
Similar content being viewed by others
Availability of data and materials
The data that support the findings of this study are openly available.
References
Zão, L., Cavalcante, D., Coelho, R.: Time-frequency feature and ams-gmm mask for acoustic emotion classification. IEEE signal processing letters 21(5), 620–624 (2014)
Ancilin, J., Milton, A.: Improved speech emotion recognition with mel frequency magnitude coefficient. Applied Acoustics 179, 108046 (2021)
Mishra, S. P., Warule, P., Deb, S.: Deep learning based emotion classification using mel frequency magnitude coefficient, in: 2023 1st International Conference on Innovations in High Speed Communication and Signal Processing (IHCSP), IEEE, (2023), pp. 93–98
Warule, P., Mishra, S.P., Deb, S., Krajewski, J.: Sinusoidal model-based diagnosis of the common cold from the speech signal. Biomedical Signal Processing and Control 83, 104653 (2023)
Zhao, X., Zhang, S., Lei, B.: Robust emotion recognition in noisy speech via sparse representation. Neural Computing and Applications 24(7), 1539–1553 (2014)
Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W., Speech emotion recognition from spectrograms with deep convolutional neural network, in: 2017 international conference on platform technology and service (PlatCon), IEEE, 1–5 (2017)
Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control 59, 101894 (2020)
Misra, H., Ikbal, S., Bourlard, H., Hermansky, H.: Spectral entropy based feature for robust asr, in: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, IEEE, 2004, pp. I–193
Huang, C., Liang, R., Wang, Q., Xi, J., Zha, C., Zhao, L.: Practical speech emotion recognition based on online learning: From acted data to elicited data, Mathematical Problems in Engineering 2013 (2013)
Wu, C., Huang, C., Chen, H.: Text-independent speech emotion recognition using frequency adaptive features. Multimedia Tools and Applications 77, 24353–24363 (2018)
Huang, C., Song, B., Zhao, L.: Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering. International Journal of Speech Technology 19, 805–816 (2016)
Xiaodan, Z., Chengwei, H., Li, Z., Cairong, Z.: Recognition of practical speech emotion using improved shuffled frog leaping algorithm. Chinese Journal of Acoustics 33(4), 441–441 (2014)
Deb, S., Dandapat, S.: Emotion classification using residual sinusoidal peak amplitude, in: 2016 International conference on signal processing and communications (SPCOM), IEEE, 1–5 (2016)
Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of mfcc. Journal of Computer science and Technology 16, 582–589 (2001)
Mohammadi, M., Mohammadi, H. R. S.: Robust features fusion for text independent speaker verification enhancement in noisy environments, in: 2017 Iranian Conference on Electrical Engineering (ICEE), IEEE, 2017, pp. 1863–1868
Deb, S., Dandapat, S., Krajewski, J.: Analysis and classification of cold speech using variational mode decomposition. IEEE transactions on affective computing 11(2), 296–307 (2017)
Shannon, C.E.: A mathematical theory of communication. acm sigmobile mob. Comput. Commun. Rev 5(1), 3–55 (2001)
Metzger, R. A., Doherty, J. F., Jenkins, D. M.: Using approximate entropy as a speech quality measure for a speaker recognition system, in: 2016 Annual Conference on Information Science and Systems (CISS), IEEE, (2016), pp. 292–297
Fu, L., He, Z.Y., Mai, R.K., Bo, Z., Approximate entropy and its application to fault detection and identification in power swing, in: 2009 IEEE Power & Energy Society General Meeting, IEEE, 1–8 (2009)
Andayani, F., Theng, L.B., Tsun, M.T., Chua, C.: Hybrid lstm-transformer model for emotion recognition from speech audio files. IEEE Access 10, 36018–36027 (2022)
Bhavan, A., Chauhan, P., Shah, R.R., et al.: Bagged support vector machines for emotion recognition from speech. Knowledge-Based Systems 184, 104886 (2019)
Scheidwasser-Clow, N., Kegler, M., Beckmann, P., Cernak, M.: Serab: A multi-lingual benchmark for speech emotion recognition, in: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, (2022), pp. 7697–7701
Atmaja, B.T., Akagi, M., On the differences between song and speech emotion recognition: Effect of feature sets, feature types, and classifiers, in: 2020 IEEE REGION 10 CONFERENCE (TENCON), IEEE, 968–972 (2020)
Luna-Jiménez, C., Kleinlein, R., Griol, D., Callejas, Z., Montero, J.M., Fernández-Martínez, F.: A proposal for multimodal emotion recognition using aural transformers and action units on ravdess dataset. Applied Sciences 12(1), 327 (2021)
Wagner, J., Triantafyllopoulos, A., Wierstorf, H., Schmitt, M., Eyben, F., Schuller, B.: Model for dimensional speech emotion recognition based on wav2vec 2.0 (1.1. 0) (2022)
Flower, T.M.L., Jaya, T.: Speech emotion recognition using ramanujan fourier transform. Applied Acoustics 201, 109133 (2022)
Özseven, T.: A novel feature selection method for speech emotion recognition. Applied Acoustics 146, 320–326 (2019)
Zhao, J., Mao, X., Chen, L.: Speech emotion recognition using deep 1d & 2d cnn lstm networks. Biomedical signal processing and control 47, 312–323 (2019)
Özseven, T.: Investigation of the effect of spectrogram images and different texture analysis methods on speech emotion recognition. Applied Acoustics 142, 70–77 (2018)
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
SPM conducted research and wrote the paper. PW and SD participated in the writing and preparation of the paper. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mishra, S.P., Warule, P. & Deb, S. Speech emotion recognition using MFCC-based entropy feature. SIViP 18, 153–161 (2024). https://doi.org/10.1007/s11760-023-02716-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02716-7