Machine learning approach of speech emotions recognition using feature fusion technique

Paul, Bachchu; Bera, Somnath; Dey, Tanushree; Phadikar, Santanu

doi:10.1007/s11042-023-16036-y

Machine learning approach of speech emotions recognition using feature fusion technique

Published: 19 June 2023

Volume 83, pages 8663–8688, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Bachchu Paul¹,
Somnath Bera ORCID: orcid.org/0000-0002-7865-8238¹,
Tanushree Dey¹ &
…
Santanu Phadikar²

319 Accesses
1 Altmetric
Explore all metrics

Abstract

In advancement of machine learning aspect, speech based emotional states identification must have a profound impact on artificial intelligence. Proper feature selection performs a vital role on such emotion recognition. Therefore, feature fusion technology has been proposed in this study for obtaining high prediction accuracy by prioritizing the extraction of sole features. Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coefficients (LPC), energy, Zero Crossing Rate (ZCR) and pitch are extracted and four different models are constructed for experimenting the impact of feature fusion techniques on four standard machine learning classifier namely Support Vector Machine (SVM), Linear Discriminative Analysis (LDA), Decision-Tree (D-Tree) and K Nearest Neighbour (KNN). Successful application of feature fusion techniques on our proposed classifiers give satisfactory recognition rate 96.90% on the Bengali (Indian Regional language) based dataset SUST Bangla Emotional Speech Corpus (SUBESCO), 99.82% on Toronto Emotional Speech Set (TESS) (English), 95% on Ryerson Audio-Visual Database for Emotional Speech and Song (RAVDEES) (English) and 95.33% on Berlin Database of Emotional Speech (EMO-DB) (Berlin) dataset. The presented model indicates that the proper fusion of features has a positive impact on emotion detection systems by increasing their accuracy and applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech emotion recognition using multimodal feature fusion with machine learning approach

Article 21 April 2023

A comparative analysis of classifiers in emotion recognition through acoustic features

Article 15 June 2014

An optimal two stage feature selection for speech emotion recognition using acoustic features

Article 02 August 2016

Data Availability

The SUST Bangla Emotional Speech Corpus (SUBESCO) dataset is available in the weblink: https://www.kaggle.com/datasets/sushmit0109/subescobangla-speech-emotion-dataset. The Toronto Emotional Speech Set (TESS) dataset is available from the weblink: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess. The Ryerson Audio-Visual Database for Emotional Speech and Song (RAVDEES) dataset is available from the weblink: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotionalspeech-audio. The Berlin Database of Emotional Speech (EMO-DB) dataset is available from the weblink: https://www.kaggle.com/datasets/piyushagni5/berlin-database-of-emotional-speech-emodb.

References

Aggarwal A, Srivastava A, Agarwal A, Chahal N, Singh D, Alnuaim AA, Alhadlaq A, Lee H-N (2022) Two-way feature extraction for speech emotion recognition using deep learning. Sensors 22(6):2378
Article Google Scholar
Ancilin J, Milton A (2021) Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl Acoust 179:108046
Article Google Scholar
Basu S, Chakraborty J, Bag A, Aftabuddin M (2017) A review on emotion recognition using speech. In: 2017 International conference on inventive communication and computational technologies (ICICCT). IEEE, pp 109–114. https://doi.org/10.1109/ICICCT.2017.7975169
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Interspeech, vol 5, pp 1517–1520. https://doi.org/10.21437/Interspeech.2005-446
Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1(20):6–9
Google Scholar
Chen L, Mao X, Xue Y, Cheng LL (2012) Speech emotion recognition: Features and classification models. Digital Signal Proc 22(6):1154–1160
Article MathSciNet Google Scholar
Choudhury AR, Ghosh A, Pandey R, Barman S (2018) Emotion recognition from speech signals using excitation source and spectral features. In: 2018 IEEE Applied Signal Processing Conference (ASPCON). IEEE, pp 257–261. https://doi.org/10.1109/ASPCON.2018.8748626
Dhar P, Guha S (2021) A system to predict emotion from Bengali speech. International Journal of Mathematical Sciences and Computing (IJMSC) 7(1):26–35. https://doi.org/10.5815/ijmsc.2021.01.04
Dupuis K, Pichora-Fuller MK (2011) Recognition of emotional speech for younger and older talkers: Behavioural findings from the Toronto emotional speech set. Can Acoust 39(3):182–183
Google Scholar
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn 44(3):572–587
Article Google Scholar
Harimi A, Esmaileyan Z (2014) A database for automatic Persian speech emotion recognition: collection, processing and evaluation. Int J Eng 27(1):79–90
Google Scholar
Ingale AB, Chaudhari D (2012) Speech emotion recognition. Int J Soft Comput Eng (IJSCE) 2(1):235–238
Google Scholar
Issa D, Demirci MF, Yazici A (2020) Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control 59:101894
Article Google Scholar
Kim HK, Choi SH, Lee HS (2000) On approximating line spectral frequencies to LPC cepstral coefficients. IEEE Trans Speech Audio Proc 8(2):195–199
Article Google Scholar
Koduru A, Valiveti HB, Budati AK (2020) Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23(1):45–55
Article Google Scholar
Koolagudi SG, Rao KS (2012) Emotion recognition from speech using source, system, and prosodic features. Int J Speech Technol 15(2):265–289
Article Google Scholar
Kuchibhotla S, Vankayalapati HD, Anne KR (2016) An optimal two stage feature selection for speech emotion recognition using acoustic features. Int J Speech Technol 19(4):657–667
Article Google Scholar
Kumaran U, Radha Rammohan S, Nagarajan SM, Prathik A (2021) Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int J Speech Technol 24:303–314
Article Google Scholar
Lanjewar RB, Mathurkar S, Patel N (2015) Implementation and comparison of speech emotion recognition system using Gaussian Mixture Model (GMM) and K-Nearest Neighbor (K-NN) techniques. Procedia Comput Sci 49:50–57
Article Google Scholar
Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS One 13(5):e0196391
Article Google Scholar
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623
Article Google Scholar
Ooi CS, Seng KP, Ang L-M, Chew LW (2014) A new approach of audio emotion recognition. Expert Syst Appl 41(13):5858–5869
Article Google Scholar
Özseven T (2019) A novel feature selection method for speech emotion recognition. Appl Acoust 146:320–326
Article Google Scholar
Palo HK, Mohanty MN (2018) Comparative analysis of neural networks for speech emotion recognition. Int J Eng Technol 7(4):111–126
Google Scholar
Pan Y, Shen P, Shen L (2012) Speech emotion recognition using support vector machine. Int J Smart Home 6(2):101–108
Google Scholar
Rao KS, Kumar TP, Anusha K, Leela B, Bhavana I, Gowtham S (2012) Emotion recognition from speech. Int J Comput Sci Inf Technol 3(2):3603–3607
Google Scholar
Rong J, Li G, Chen Y-PP (2009) Acoustic feature selection for automatic emotion recognition from speech. Inf Process Manage 45(3):315–328
Article Google Scholar
Shah RD, Suthar AC, Student ME (2007) Speech emotion recognition based on SVM using MATLAB. Int J Innov Res Comput Commun Eng (An ISO Certif. Organ). https://doi.org/10.15680/IJIRCCE.2016.0403004
Shambhavi S, Nitnaware V (2015) Emotion speech recognition using MFCC and SVM. Int J Eng Res Technol 4(6):1067–1070
Google Scholar
Slimi A, Hamroun M, Zrigui M, Nicolas H (2020) Emotion recognition from speech using spectrograms and shallow neural networks. In: Proceedings of the 18th International Conference on Advances in Mobile Computing & Multimedia, pp 35–39. https://doi.org/10.1145/3428690.3429153
Sultana S, Rahman MS, Selim MR, Iqbal MZ (2021) SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla. Plos One 16(4):e0250173
Article Google Scholar
Wang K, An N, Li BN, Zhang Y, Li L (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6(1):69–75
Article Google Scholar
Wu S, Falk TH, Chan W-Y (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53(5):768–785
Article Google Scholar
Xu M, Zhang F, Zhang W (2021) Head Fusion: Improving the accuracy and robustness of speech emotion recognition on the IEMOCAP and RAVDESS Dataset. IEEE Access 9:74539–74549
Article Google Scholar
Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90(5):1415–1423
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Vidyasagar University, Midnapore, West Bengal, 721102, India
Bachchu Paul, Somnath Bera & Tanushree Dey
Department of Computer Science, Maulana Abul Kalam Azad University of Technology, Nadia, 741249, West Bengal, India
Santanu Phadikar

Authors

Bachchu Paul
View author publications
You can also search for this author in PubMed Google Scholar
Somnath Bera
View author publications
You can also search for this author in PubMed Google Scholar
Tanushree Dey
View author publications
You can also search for this author in PubMed Google Scholar
Santanu Phadikar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Somnath Bera.

Ethics declarations

Conflict of interest

There is no conflict of Interest between the authors regarding the manuscript preparation and submission.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Paul, B., Bera, S., Dey, T. et al. Machine learning approach of speech emotions recognition using feature fusion technique. Multimed Tools Appl 83, 8663–8688 (2024). https://doi.org/10.1007/s11042-023-16036-y

Download citation

Received: 15 January 2022
Revised: 02 April 2023
Accepted: 12 June 2023
Published: 19 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-16036-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning approach of speech emotions recognition using feature fusion technique

Abstract

Access this article

Similar content being viewed by others

Speech emotion recognition using multimodal feature fusion with machine learning approach

A comparative analysis of classifiers in emotion recognition through acoustic features

An optimal two stage feature selection for speech emotion recognition using acoustic features

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine learning approach of speech emotions recognition using feature fusion technique

Abstract

Access this article

Similar content being viewed by others

Speech emotion recognition using multimodal feature fusion with machine learning approach

A comparative analysis of classifiers in emotion recognition through acoustic features

An optimal two stage feature selection for speech emotion recognition using acoustic features

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation