Skip to main content
Log in

Speech emotion recognition using MFCC-based entropy feature

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The prime objective of speech emotion recognition is to accurately recognize the emotion from the speech signal. It is a challenging task to accomplish. Speech emotion recognition (SER) has many applications, including medicine, online marketing, strengthening human–computer interaction (HCI), online education, and many more. Hence, it has been a topic of interest for many researchers for last three decades. The researchers used different methodologies to improve the classification accuracy of emotions. In this study, we tried to improve emotion classification accuracy using mel-frequency cepstral coefficient (MFCC)-based entropy features. First, we extracted the MFCC coefficient matrix from every speech of the EMO-DB, RAVDESS and SAVEE datasets, and then we calculated the proposed features: statistical mean (\(\mathrm{{MFCC}}_{\mathrm{{mean}}}\)), MFCC-based approximate entropy (\(\mathrm{{MFCC}}_{\mathrm{{AE}}}\)), and MFCC-based spectral entropy (\(\mathrm{{MFCC}}_{\mathrm{{SE}}}\)), from the MFCC coefficient matrix of every utterance. The performance of the proposed features is accessed using the DNN classifier. We achieved a classification accuracy of 87.48%, 75.9%, and 79.64% using the combination of \(\mathrm{{MFCC}}_{\mathrm{{mean}}}\) and \(\mathrm{{MFCC}}_{\mathrm{{SE}}}\) features and obtained classification accuracies of 85.61%, 77.54%, and 76.26% using the combination of \(\mathrm{{MFCC}}_{\mathrm{{mean}}}\), \(\mathrm{{MFCC}}_{\mathrm{{AE}}}\), and \(\mathrm{{MFCC}}_{\mathrm{{SE}}}\) features for the EMO-DB, RAVDESS, and SAVEE datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are openly available.

References

  1. Zão, L., Cavalcante, D., Coelho, R.: Time-frequency feature and ams-gmm mask for acoustic emotion classification. IEEE signal processing letters 21(5), 620–624 (2014)

    Article  Google Scholar 

  2. Ancilin, J., Milton, A.: Improved speech emotion recognition with mel frequency magnitude coefficient. Applied Acoustics 179, 108046 (2021)

    Article  Google Scholar 

  3. Mishra, S. P., Warule, P., Deb, S.: Deep learning based emotion classification using mel frequency magnitude coefficient, in: 2023 1st International Conference on Innovations in High Speed Communication and Signal Processing (IHCSP), IEEE, (2023), pp. 93–98

  4. Warule, P., Mishra, S.P., Deb, S., Krajewski, J.: Sinusoidal model-based diagnosis of the common cold from the speech signal. Biomedical Signal Processing and Control 83, 104653 (2023)

    Article  Google Scholar 

  5. Zhao, X., Zhang, S., Lei, B.: Robust emotion recognition in noisy speech via sparse representation. Neural Computing and Applications 24(7), 1539–1553 (2014)

    Article  Google Scholar 

  6. Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W., Speech emotion recognition from spectrograms with deep convolutional neural network, in: 2017 international conference on platform technology and service (PlatCon), IEEE, 1–5 (2017)

  7. Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control 59, 101894 (2020)

    Article  Google Scholar 

  8. Misra, H., Ikbal, S., Bourlard, H., Hermansky, H.: Spectral entropy based feature for robust asr, in: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, IEEE, 2004, pp. I–193

  9. Huang, C., Liang, R., Wang, Q., Xi, J., Zha, C., Zhao, L.: Practical speech emotion recognition based on online learning: From acted data to elicited data, Mathematical Problems in Engineering 2013 (2013)

  10. Wu, C., Huang, C., Chen, H.: Text-independent speech emotion recognition using frequency adaptive features. Multimedia Tools and Applications 77, 24353–24363 (2018)

    Article  Google Scholar 

  11. Huang, C., Song, B., Zhao, L.: Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering. International Journal of Speech Technology 19, 805–816 (2016)

    Article  Google Scholar 

  12. Xiaodan, Z., Chengwei, H., Li, Z., Cairong, Z.: Recognition of practical speech emotion using improved shuffled frog leaping algorithm. Chinese Journal of Acoustics 33(4), 441–441 (2014)

    Google Scholar 

  13. Deb, S., Dandapat, S.: Emotion classification using residual sinusoidal peak amplitude, in: 2016 International conference on signal processing and communications (SPCOM), IEEE, 1–5 (2016)

  14. Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of mfcc. Journal of Computer science and Technology 16, 582–589 (2001)

  15. Mohammadi, M., Mohammadi, H. R. S.: Robust features fusion for text independent speaker verification enhancement in noisy environments, in: 2017 Iranian Conference on Electrical Engineering (ICEE), IEEE, 2017, pp. 1863–1868

  16. Deb, S., Dandapat, S., Krajewski, J.: Analysis and classification of cold speech using variational mode decomposition. IEEE transactions on affective computing 11(2), 296–307 (2017)

    Article  Google Scholar 

  17. Shannon, C.E.: A mathematical theory of communication. acm sigmobile mob. Comput. Commun. Rev 5(1), 3–55 (2001)

    Article  Google Scholar 

  18. Metzger, R. A., Doherty, J. F., Jenkins, D. M.: Using approximate entropy as a speech quality measure for a speaker recognition system, in: 2016 Annual Conference on Information Science and Systems (CISS), IEEE, (2016), pp. 292–297

  19. Fu, L., He, Z.Y., Mai, R.K., Bo, Z., Approximate entropy and its application to fault detection and identification in power swing, in: 2009 IEEE Power & Energy Society General Meeting, IEEE, 1–8 (2009)

  20. Andayani, F., Theng, L.B., Tsun, M.T., Chua, C.: Hybrid lstm-transformer model for emotion recognition from speech audio files. IEEE Access 10, 36018–36027 (2022)

    Article  Google Scholar 

  21. Bhavan, A., Chauhan, P., Shah, R.R., et al.: Bagged support vector machines for emotion recognition from speech. Knowledge-Based Systems 184, 104886 (2019)

    Article  Google Scholar 

  22. Scheidwasser-Clow, N., Kegler, M., Beckmann, P., Cernak, M.: Serab: A multi-lingual benchmark for speech emotion recognition, in: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, (2022), pp. 7697–7701

  23. Atmaja, B.T., Akagi, M., On the differences between song and speech emotion recognition: Effect of feature sets, feature types, and classifiers, in: 2020 IEEE REGION 10 CONFERENCE (TENCON), IEEE, 968–972 (2020)

  24. Luna-Jiménez, C., Kleinlein, R., Griol, D., Callejas, Z., Montero, J.M., Fernández-Martínez, F.: A proposal for multimodal emotion recognition using aural transformers and action units on ravdess dataset. Applied Sciences 12(1), 327 (2021)

    Article  Google Scholar 

  25. Wagner, J., Triantafyllopoulos, A., Wierstorf, H., Schmitt, M., Eyben, F., Schuller, B.: Model for dimensional speech emotion recognition based on wav2vec 2.0 (1.1. 0) (2022)

  26. Flower, T.M.L., Jaya, T.: Speech emotion recognition using ramanujan fourier transform. Applied Acoustics 201, 109133 (2022)

    Article  Google Scholar 

  27. Özseven, T.: A novel feature selection method for speech emotion recognition. Applied Acoustics 146, 320–326 (2019)

    Article  Google Scholar 

  28. Zhao, J., Mao, X., Chen, L.: Speech emotion recognition using deep 1d & 2d cnn lstm networks. Biomedical signal processing and control 47, 312–323 (2019)

    Article  Google Scholar 

  29. Özseven, T.: Investigation of the effect of spectrogram images and different texture analysis methods on speech emotion recognition. Applied Acoustics 142, 70–77 (2018)

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

SPM conducted research and wrote the paper. PW and SD participated in the writing and preparation of the paper. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Siba Prasad Mishra.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mishra, S.P., Warule, P. & Deb, S. Speech emotion recognition using MFCC-based entropy feature. SIViP 18, 153–161 (2024). https://doi.org/10.1007/s11760-023-02716-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02716-7

Keywords

Navigation