Skip to main content
Log in

Machine learning approach of speech emotions recognition using feature fusion technique

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In advancement of machine learning aspect, speech based emotional states identification must have a profound impact on artificial intelligence. Proper feature selection performs a vital role on such emotion recognition. Therefore, feature fusion technology has been proposed in this study for obtaining high prediction accuracy by prioritizing the extraction of sole features. Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coefficients (LPC), energy, Zero Crossing Rate (ZCR) and pitch are extracted and four different models are constructed for experimenting the impact of feature fusion techniques on four standard machine learning classifier namely Support Vector Machine (SVM), Linear Discriminative Analysis (LDA), Decision-Tree (D-Tree) and K Nearest Neighbour (KNN). Successful application of feature fusion techniques on our proposed classifiers give satisfactory recognition rate 96.90% on the Bengali (Indian Regional language) based dataset SUST Bangla Emotional Speech Corpus (SUBESCO), 99.82% on Toronto Emotional Speech Set (TESS) (English), 95% on Ryerson Audio-Visual Database for Emotional Speech and Song (RAVDEES) (English) and 95.33% on Berlin Database of Emotional Speech (EMO-DB) (Berlin) dataset. The presented model indicates that the proper fusion of features has a positive impact on emotion detection systems by increasing their accuracy and applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The SUST Bangla Emotional Speech Corpus (SUBESCO) dataset is available in the weblink: https://www.kaggle.com/datasets/sushmit0109/subescobangla-speech-emotion-dataset. The Toronto Emotional Speech Set (TESS) dataset is available from the weblink: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess. The Ryerson Audio-Visual Database for Emotional Speech and Song (RAVDEES) dataset is available from the weblink: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotionalspeech-audio. The Berlin Database of Emotional Speech (EMO-DB) dataset is available from the weblink: https://www.kaggle.com/datasets/piyushagni5/berlin-database-of-emotional-speech-emodb.

References

  1. Aggarwal A, Srivastava A, Agarwal A, Chahal N, Singh D, Alnuaim AA, Alhadlaq A, Lee H-N (2022) Two-way feature extraction for speech emotion recognition using deep learning. Sensors 22(6):2378

    Article  Google Scholar 

  2. Ancilin J, Milton A (2021) Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl Acoust 179:108046

    Article  Google Scholar 

  3. Basu S, Chakraborty J, Bag A, Aftabuddin M (2017) A review on emotion recognition using speech. In: 2017 International conference on inventive communication and computational technologies (ICICCT). IEEE, pp 109–114. https://doi.org/10.1109/ICICCT.2017.7975169

  4. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Interspeech, vol 5, pp 1517–1520. https://doi.org/10.21437/Interspeech.2005-446

  5. Chavhan Y, Dhore M, Yesaware P (2010) Speech emotion recognition using support vector machine. Int J Comput Appl 1(20):6–9

    Google Scholar 

  6. Chen L, Mao X, Xue Y, Cheng LL (2012) Speech emotion recognition: Features and classification models. Digital Signal Proc 22(6):1154–1160

    Article  MathSciNet  Google Scholar 

  7. Choudhury AR, Ghosh A, Pandey R, Barman S (2018) Emotion recognition from speech signals using excitation source and spectral features. In: 2018 IEEE Applied Signal Processing Conference (ASPCON). IEEE, pp 257–261. https://doi.org/10.1109/ASPCON.2018.8748626

  8. Dhar P, Guha S (2021) A system to predict emotion from Bengali speech. International Journal of Mathematical Sciences and Computing (IJMSC) 7(1):26–35. https://doi.org/10.5815/ijmsc.2021.01.04

  9. Dupuis K, Pichora-Fuller MK (2011) Recognition of emotional speech for younger and older talkers: Behavioural findings from the Toronto emotional speech set. Can Acoust 39(3):182–183

    Google Scholar 

  10. El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn 44(3):572–587

    Article  Google Scholar 

  11. Harimi A, Esmaileyan Z (2014) A database for automatic Persian speech emotion recognition: collection, processing and evaluation. Int J Eng 27(1):79–90

    Google Scholar 

  12. Ingale AB, Chaudhari D (2012) Speech emotion recognition. Int J Soft Comput Eng (IJSCE) 2(1):235–238

    Google Scholar 

  13. Issa D, Demirci MF, Yazici A (2020) Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control 59:101894

    Article  Google Scholar 

  14. Kim HK, Choi SH, Lee HS (2000) On approximating line spectral frequencies to LPC cepstral coefficients. IEEE Trans Speech Audio Proc 8(2):195–199

    Article  Google Scholar 

  15. Koduru A, Valiveti HB, Budati AK (2020) Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23(1):45–55

    Article  Google Scholar 

  16. Koolagudi SG, Rao KS (2012) Emotion recognition from speech using source, system, and prosodic features. Int J Speech Technol 15(2):265–289

    Article  Google Scholar 

  17. Kuchibhotla S, Vankayalapati HD, Anne KR (2016) An optimal two stage feature selection for speech emotion recognition using acoustic features. Int J Speech Technol 19(4):657–667

    Article  Google Scholar 

  18. Kumaran U, Radha Rammohan S, Nagarajan SM, Prathik A (2021) Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int J Speech Technol 24:303–314

    Article  Google Scholar 

  19. Lanjewar RB, Mathurkar S, Patel N (2015) Implementation and comparison of speech emotion recognition system using Gaussian Mixture Model (GMM) and K-Nearest Neighbor (K-NN) techniques. Procedia Comput Sci 49:50–57

    Article  Google Scholar 

  20. Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS One 13(5):e0196391

    Article  Google Scholar 

  21. Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623

    Article  Google Scholar 

  22. Ooi CS, Seng KP, Ang L-M, Chew LW (2014) A new approach of audio emotion recognition. Expert Syst Appl 41(13):5858–5869

    Article  Google Scholar 

  23. Özseven T (2019) A novel feature selection method for speech emotion recognition. Appl Acoust 146:320–326

    Article  Google Scholar 

  24. Palo HK, Mohanty MN (2018) Comparative analysis of neural networks for speech emotion recognition. Int J Eng Technol 7(4):111–126

    Google Scholar 

  25. Pan Y, Shen P, Shen L (2012) Speech emotion recognition using support vector machine. Int J Smart Home 6(2):101–108

    Google Scholar 

  26. Rao KS, Kumar TP, Anusha K, Leela B, Bhavana I, Gowtham S (2012) Emotion recognition from speech. Int J Comput Sci Inf Technol 3(2):3603–3607

    Google Scholar 

  27. Rong J, Li G, Chen Y-PP (2009) Acoustic feature selection for automatic emotion recognition from speech. Inf Process Manage 45(3):315–328

    Article  Google Scholar 

  28. Shah RD, Suthar AC, Student ME (2007) Speech emotion recognition based on SVM using MATLAB. Int J Innov Res Comput Commun Eng (An ISO Certif. Organ). https://doi.org/10.15680/IJIRCCE.2016.0403004

  29. Shambhavi S, Nitnaware V (2015) Emotion speech recognition using MFCC and SVM. Int J Eng Res Technol 4(6):1067–1070

    Google Scholar 

  30. Slimi A, Hamroun M, Zrigui M, Nicolas H (2020) Emotion recognition from speech using spectrograms and shallow neural networks. In: Proceedings of the 18th International Conference on Advances in Mobile Computing & Multimedia, pp 35–39. https://doi.org/10.1145/3428690.3429153

  31. Sultana S, Rahman MS, Selim MR, Iqbal MZ (2021) SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla. Plos One 16(4):e0250173

    Article  Google Scholar 

  32. Wang K, An N, Li BN, Zhang Y, Li L (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6(1):69–75

    Article  Google Scholar 

  33. Wu S, Falk TH, Chan W-Y (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53(5):768–785

    Article  Google Scholar 

  34. Xu M, Zhang F, Zhang W (2021) Head Fusion: Improving the accuracy and robustness of speech emotion recognition on the IEMOCAP and RAVDESS Dataset. IEEE Access 9:74539–74549

    Article  Google Scholar 

  35. Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90(5):1415–1423

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Somnath Bera.

Ethics declarations

Conflict of interest

There is no conflict of Interest between the authors regarding the manuscript preparation and submission.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paul, B., Bera, S., Dey, T. et al. Machine learning approach of speech emotions recognition using feature fusion technique. Multimed Tools Appl 83, 8663–8688 (2024). https://doi.org/10.1007/s11042-023-16036-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16036-y

Keywords

Navigation