Speech emotion classification using feature-level and classifier-level fusion

Mishra, Siba Prasad; Warule, Pankaj; Deb, Suman

doi:10.1007/s12530-023-09550-9

Speech emotion classification using feature-level and classifier-level fusion

Original Paper
Published: 03 November 2023

Volume 15, pages 541–554, (2024)
Cite this article

Evolving Systems Aims and scope Submit manuscript

190 Accesses
Explore all metrics

Abstract

Emotion plays a vital role in every living being. Understanding emotion is a very complex task for everyone, but if possible, it will work like a miracle to solve thousands of problems and save many lives. Emotion is reflected not only in the gesture but also in work and in producing an efficient result. Hence, the recognition of emotion using speech has been a topic of interest for many researchers for the last three decades. In our study, we used three features, mel frequency cepstral coefficient (MFCC), spectrogram, and mel-spectrogram, as a one-dimensional input vector to the convolutional neural network (CNN) and deep neural network (DNN) for speech emotion classification. We evaluated the accuracy of SER using the features individually and in combination with the deep learning classifiers CNN and DNN. For both CNN and DNN classifiers, the combination of features performed better than the individual features. The combination of features using the DNN classifier achieved an accuracy of 76.60%, 87.10%, 79.79%, and 100%, and using the CNN classifier achieved classification accuracy of 75%, 84.11%, 78.13%, and 100% for the RAVDESS, EMO-DB, SAVEE, and TESS datasets,respectively. Then we applied a proposed feature and classifier-level fusion method using CNN and DNN to improve emotion classification performance and achieved classification accuracy of 80.42%, 87.48%, and 80.99% on the RAVDESS, EMO-DB, and SAVEE datasets, respectively. The performance of the proposed feature and classifier-level fusion method was compared with the other methods, and it was found that the proposed method performed better than the state-of-the-art methods.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Automatic speech recognition: a survey

Article 10 November 2020

Availability of data and materials

The data that support the findings of this study are openly available.

References

Abdel-Hamid O, Mohamed A-R, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Language Process 22(10):1533–1545
Article Google Scholar
Abdelhamid AA, El-Kenawy E-SM, Alotaibi B, Amer GM, Abdelkader MY, Ibrahim A, Eid MM (2022) Robust speech emotion recognition using CNN+ lSTM based on stochastic fractal search optimization algorithm. IEEE Access 10:49265–49284
Article Google Scholar
Ancilin J, Milton A (2021) Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl Acoust 179:108046
Article Google Scholar
Andayani F, Theng LB, Tsun MT, Chua C (2022) Hybrid lSTM-transformer model for emotion recognition from speech audio files. IEEE Access 10:36018–36027
Article Google Scholar
Badshah A M, Ahmad J, Rahim v, Baik S W (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 international conference on platform technology and service (PlatCon), IEEE, pp 1–5
Bansal M, Yadav S, Vishwakarma D K (2021) A language-independent speech sentiment analysis using prosodic features. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), IEEE, pp 1210–1216
Chen M, He X, Yang J, Zhang H (2018) 3-d convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Process Lett 25(10):1440–1444
Article Google Scholar
Choi G-H, Bak E-S, Pan S-B (2019) User identification system using 2d resized spectrogram features of ECG. IEEE Access 7:34862–34873
Article Google Scholar
Deb S, Dandapat S (2016) Classification of speech under stress using harmonic peak to energy ratio. Comput Electric Eng 55:12–23
Article Google Scholar
Deb S, Dandapat S (2016) Emotion classification using residual sinusoidal peak amplitude. In: 2016 International conference on signal processing and communications (SPCOM), IEEE, pp 1–5
Deb S, Dandapat S (2017) Exploration of phase information for speech emotion classification. In: 2017 Twenty-third National Conference on Communications (NCC), IEEE, pp 1–5
Dolka H, VM AX, Juliet S (2021) Speech emotion recognition using ann on mfcc features. In: 2021 3rd International Conference on Signal Processing and Communication (ICPSC), IEEE, pp 431–435
Ezzameli K, Mahersia H (2023) Emotion recognition from unimodal to multimodal analysis: a review. Inf Fusion 101847
Fahad MS, Ranjan A, Yadav J, Deepak A (2021) A survey of speech emotion recognition in natural environment. Digital Signal Process 110:102951
Article Google Scholar
Fu W, Yang X, Wang Y (2010) Heart sound diagnosis based on DTW and MFCC. In: 2010 3rd International Congress on Image and Signal Processing, Vol. 6, IEEE, pp 2920–2923
Huang Z, Dong M, Mao Q, Zhan Y (2014) Speech emotion recognition using CNN. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 801–804
Issa D, Demirci MF, Yazici A (2020) Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control 59:101894
Article Google Scholar
Ittichaichareon C, Suksri S, Yingthawornsuk T (2012) Speech recognition using mfcc. In: International conference on computer graphics, simulation and modeling, Vol. 9
Kwon S (2019) A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1):183
Article Google Scholar
Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303
Article Google Scholar
Liu Z-T, Rehman A, Wu M, Cao W-H, Hao M (2021) Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence. Inf Sci 563:309–325
Article Google Scholar
Lukose S, Upadhya SS (2017) Music player based on emotion recognition of voice signals. 2017 International Conference on Intelligent Computing. Instrumentation and Control Technologies (ICICICT), IEEE, pp 1751–1754
Mekruksavanich S, Jitpattanakul A, Hnoohom N (2020) Negative emotion recognition using deep learning for Thai language. In: 2020 joint international conference on digital arts, media and technology with ECTI northern section conference on electrical, electronics, computer and telecommunications engineering (ECTI DAMT & NCON), IEEE, pp 71–74
Milton A, Roy SS, Selvi ST (2013) Svm scheme for speech emotion recognition using MFCC feature. Int J Comput Appl 69(9)
Mishra S P, Warule P, Deb S (2023) Deep learning based emotion classification using Mel frequency magnitude coefficient. In: 2023 1st International Conference on Innovations in High Speed Communication and Signal Processing (IHCSP), IEEE, pp 93–98
Nassif AB, Shahin I, Hamsa S, Nemmour N, Hirose K (2021) Casa-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions. Appl Soft Comput 103:107141
Article Google Scholar
Özseven T (2018) Investigation of the effect of spectrogram images and different texture analysis methods on speech emotion recognition. Appl Acoust 142:70–77
Article Google Scholar
Pandey SK, Shekhawat HS, Prasanna SM (2019) Deep learning techniques for speech emotion recognition: a review. In: 2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA), IEEE, pp 1–6
Sajjad M, Kwon S et al (2020) Clustering-based speech emotion recognition by incorporating learned features and deep Bilstm. IEEE Access 8:79861–79875
Article Google Scholar
Satt A, Rozenberg S, Hoory R (2017) Efficient emotion recognition from speech using deep learning on spectrograms. In: Interspeech, pp 1089–1093
Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9–10):1062–1087
Article Google Scholar
Sönmez YÜ, Varol A (2020) A speech emotion recognition model based on multi-level local binary and local ternary patterns. IEEE Access 8:190784–190796
Article Google Scholar
Sun L, Chen J, Xie K, Gu T (2018) Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition. Int J Speech Technol 21(4):931–940
Article Google Scholar
Sun L, Zou B, Fu S, Chen J, Wang F (2019) Speech emotion recognition based on DNN-decision tree SVM model. Speech Commun 115:29–37
Article Google Scholar
Tiwari V (2010) Mfcc and its applications in speaker recognition. Int J Emerg Technol 1(1):19–22
Google Scholar
Valles D, Matin R (2021) An audio processing approach using ensemble learning for speech-emotion recognition for children with ASD. In: 2021 IEEE World AI IoT Congress (AIIoT), IEEE, pp 0055–0061
Ververidis D, Kotropoulos C (2003) A state of the art review on emotional speech databases. In: Proceedings of 1st Richmedia Conference, Citeseer, pp 109–119
Warule P, Mishra SP, Deb S, Krajewski J (2023) Sinusoidal model-based diagnosis of the common cold from the speech signal. Biomed Signal Process Control 83:104653
Article Google Scholar
Warule P, Mishra S P, Deb S (2022) Classification of cold and non-cold speech using vowel-like region segments. In: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM), IEEE, pp 1–5
Warule P, Mishra S P, Deb S (2023) Time-frequency analysis of speech signal using chirplet transform for automatic diagnosis of Parkinson’s disease. Biomed Eng Lett 1–11
Yildirim S, Kaya Y, Kılıç F (2021) A modified feature selection method based on metaheuristic algorithms for speech emotion recognition. Appl Acoust 173:107721
Article Google Scholar
Zão L, Cavalcante D, Coelho R (2014) Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Process Lett 21(5):620–624
Article Google Scholar
Zeng Y, Mao H, Peng D, Yi Z (2019) Spectrogram based multi-task audio classification. Multimed Tools Appl 78(3):3705–3722
Article Google Scholar
Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1d & 2d CNN lSTM networks. Biomed Signal Process Control 47:312–323
Article Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat, India
Siba Prasad Mishra, Pankaj Warule & Suman Deb

Authors

Siba Prasad Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Pankaj Warule
View author publications
You can also search for this author in PubMed Google Scholar
Suman Deb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siba Prasad Mishra.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mishra, S.P., Warule, P. & Deb, S. Speech emotion classification using feature-level and classifier-level fusion. Evolving Systems 15, 541–554 (2024). https://doi.org/10.1007/s12530-023-09550-9

Download citation

Received: 27 April 2023
Accepted: 13 October 2023
Published: 03 November 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s12530-023-09550-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions