Abstract
In advancement of machine learning aspect, speech based emotional states identification must have a profound impact on artificial intelligence. Proper feature selection performs a vital role on such emotion recognition. Therefore, feature fusion technology has been proposed in this study for obtaining high prediction accuracy by prioritizing the extraction of sole features. Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coefficients (LPC), energy, Zero Crossing Rate (ZCR) and pitch are extracted and four different models are constructed for experimenting the impact of feature fusion techniques on four standard machine learning classifier namely Support Vector Machine (SVM), Linear Discriminative Analysis (LDA), Decision-Tree (D-Tree) and K Nearest Neighbour (KNN). Successful application of feature fusion techniques on our proposed classifiers give satisfactory recognition rate 96.90% on the Bengali (Indian Regional language) based dataset SUST Bangla Emotional Speech Corpus (SUBESCO), 99.82% on Toronto Emotional Speech Set (TESS) (English), 95% on Ryerson Audio-Visual Database for Emotional Speech and Song (RAVDEES) (English) and 95.33% on Berlin Database of Emotional Speech (EMO-DB) (Berlin) dataset. The presented model indicates that the proper fusion of features has a positive impact on emotion detection systems by increasing their accuracy and applicability.
Similar content being viewed by others
Data Availability
The SUST Bangla Emotional Speech Corpus (SUBESCO) dataset is available in the weblink: https://www.kaggle.com/datasets/sushmit0109/subescobangla-speech-emotion-dataset. The Toronto Emotional Speech Set (TESS) dataset is available from the weblink: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess. The Ryerson Audio-Visual Database for Emotional Speech and Song (RAVDEES) dataset is available from the weblink: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotionalspeech-audio. The Berlin Database of Emotional Speech (EMO-DB) dataset is available from the weblink: https://www.kaggle.com/datasets/piyushagni5/berlin-database-of-emotional-speech-emodb.
References
Aggarwal A, Srivastava A, Agarwal A, Chahal N, Singh D, Alnuaim AA, Alhadlaq A, Lee H-N (2022) Two-way feature extraction for speech emotion recognition using deep learning. Sensors 22(6):2378
Ancilin J, Milton A (2021) Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl Acoust 179:108046
Basu S, Chakraborty J, Bag A, Aftabuddin M (2017) A review on emotion recognition using speech. In: 2017 International conference on inventive communication and computational technologies (ICICCT). IEEE, pp 109–114. https://doi.org/10.1109/ICICCT.2017.7975169
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Interspeech, vol 5, pp 1517–1520. https://doi.org/10.21437/Interspeech.2005-446
Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1(20):6–9
Chen L, Mao X, Xue Y, Cheng LL (2012) Speech emotion recognition: Features and classification models. Digital Signal Proc 22(6):1154–1160
Choudhury AR, Ghosh A, Pandey R, Barman S (2018) Emotion recognition from speech signals using excitation source and spectral features. In: 2018 IEEE Applied Signal Processing Conference (ASPCON). IEEE, pp 257–261. https://doi.org/10.1109/ASPCON.2018.8748626
Dhar P, Guha S (2021) A system to predict emotion from Bengali speech. International Journal of Mathematical Sciences and Computing (IJMSC) 7(1):26–35. https://doi.org/10.5815/ijmsc.2021.01.04
Dupuis K, Pichora-Fuller MK (2011) Recognition of emotional speech for younger and older talkers: Behavioural findings from the Toronto emotional speech set. Can Acoust 39(3):182–183
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn 44(3):572–587
Harimi A, Esmaileyan Z (2014) A database for automatic Persian speech emotion recognition: collection, processing and evaluation. Int J Eng 27(1):79–90
Ingale AB, Chaudhari D (2012) Speech emotion recognition. Int J Soft Comput Eng (IJSCE) 2(1):235–238
Issa D, Demirci MF, Yazici A (2020) Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control 59:101894
Kim HK, Choi SH, Lee HS (2000) On approximating line spectral frequencies to LPC cepstral coefficients. IEEE Trans Speech Audio Proc 8(2):195–199
Koduru A, Valiveti HB, Budati AK (2020) Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23(1):45–55
Koolagudi SG, Rao KS (2012) Emotion recognition from speech using source, system, and prosodic features. Int J Speech Technol 15(2):265–289
Kuchibhotla S, Vankayalapati HD, Anne KR (2016) An optimal two stage feature selection for speech emotion recognition using acoustic features. Int J Speech Technol 19(4):657–667
Kumaran U, Radha Rammohan S, Nagarajan SM, Prathik A (2021) Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int J Speech Technol 24:303–314
Lanjewar RB, Mathurkar S, Patel N (2015) Implementation and comparison of speech emotion recognition system using Gaussian Mixture Model (GMM) and K-Nearest Neighbor (K-NN) techniques. Procedia Comput Sci 49:50–57
Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS One 13(5):e0196391
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623
Ooi CS, Seng KP, Ang L-M, Chew LW (2014) A new approach of audio emotion recognition. Expert Syst Appl 41(13):5858–5869
Özseven T (2019) A novel feature selection method for speech emotion recognition. Appl Acoust 146:320–326
Palo HK, Mohanty MN (2018) Comparative analysis of neural networks for speech emotion recognition. Int J Eng Technol 7(4):111–126
Pan Y, Shen P, Shen L (2012) Speech emotion recognition using support vector machine. Int J Smart Home 6(2):101–108
Rao KS, Kumar TP, Anusha K, Leela B, Bhavana I, Gowtham S (2012) Emotion recognition from speech. Int J Comput Sci Inf Technol 3(2):3603–3607
Rong J, Li G, Chen Y-PP (2009) Acoustic feature selection for automatic emotion recognition from speech. Inf Process Manage 45(3):315–328
Shah RD, Suthar AC, Student ME (2007) Speech emotion recognition based on SVM using MATLAB. Int J Innov Res Comput Commun Eng (An ISO Certif. Organ). https://doi.org/10.15680/IJIRCCE.2016.0403004
Shambhavi S, Nitnaware V (2015) Emotion speech recognition using MFCC and SVM. Int J Eng Res Technol 4(6):1067–1070
Slimi A, Hamroun M, Zrigui M, Nicolas H (2020) Emotion recognition from speech using spectrograms and shallow neural networks. In: Proceedings of the 18th International Conference on Advances in Mobile Computing & Multimedia, pp 35–39. https://doi.org/10.1145/3428690.3429153
Sultana S, Rahman MS, Selim MR, Iqbal MZ (2021) SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla. Plos One 16(4):e0250173
Wang K, An N, Li BN, Zhang Y, Li L (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6(1):69–75
Wu S, Falk TH, Chan W-Y (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53(5):768–785
Xu M, Zhang F, Zhang W (2021) Head Fusion: Improving the accuracy and robustness of speech emotion recognition on the IEMOCAP and RAVDESS Dataset. IEEE Access 9:74539–74549
Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90(5):1415–1423
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of Interest between the authors regarding the manuscript preparation and submission.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Paul, B., Bera, S., Dey, T. et al. Machine learning approach of speech emotions recognition using feature fusion technique. Multimed Tools Appl 83, 8663–8688 (2024). https://doi.org/10.1007/s11042-023-16036-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16036-y