Skip to main content

Advertisement

Log in

A hybrid deep feature selection framework for emotion recognition from human speeches

  • 1226: Deep-Patterns Emotion Recognition in the Wild
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Speech Emotion Recognition (SER) is an active area of signal processing research that aims at identifying emotional states from audio speech signals. Applications of SER range from psychological diagnosis to human-computer interaction and as such, a robust framework is needed for accurate classification. To this end, we propose a two-stage hybrid deep feature selection (HDFS) framework that combines deep learning with automated feature engineering for emotion recognition from human speeches, which shines both in terms of accuracy and computational efficiency. Our pipeline extracts self-learned features using a customized Wide-ResNet-50-2 deep learning model from mel-spectrograms of raw audio signals, whose dimensionality is reduced using a hybrid deep feature selection algorithm that comprises a fuzzy entropy and similarity-based feature ranking method, followed by Whale optimization algorithm, which is a popular meta-heuristic optimization algorithm in literature. A k-nearest neighbor classifier is used to classify the optimized feature subset into the respective emotion classes. The proposed pipeline is evaluated on three publicly available SER datasets using a 5-fold cross-validation scheme, where it is found to outperform several state-of-the-art existing works in literature by significant margins thus, justifying the superiority and reliability of the proposed research. The source codes of the proposed method can be found at: https://github.com/soumitri2001/Wrapper-Filter-Speech-Emotion-Recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Abbaschian BJ, Sierra-Sosa D, Elmaghraby A (2021) Deep learning techniques for speech emotion recognition, from databases to models. Sensors

  2. Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609

    Article  MathSciNet  MATH  Google Scholar 

  3. Agrawal P, Abutarboush HF, Ganesh T, Mohamed AW (2021) Metaheuristic algorithms on feature selection: a survey of one decade of research (2009-2019). IEEE Access 9:26766–26791

    Article  Google Scholar 

  4. Ahmed S, Ghosh KK, Garcia-Hernandez L, Abraham A, Sarkar R (2021) Improved coral reefs optimization with adaptive β-hill climbing for feature selection. Neural Comput & Applic 33(12):6467–6486

    Article  Google Scholar 

  5. Akçay MB, Oğuz K (2020) Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Comm 116:56–76

    Article  Google Scholar 

  6. Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech & Lang 25(3):556–570

    Article  Google Scholar 

  7. Alghowinem S, Goecke R, Wagner M, Epps J, Gedeon T, Breakspear M, Parker G (2013) A comparative study of different classifiers for detecting depression from spontaneous speech. In: ICASSP. IEEE

  8. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician

  9. Ancilin J, Milton A (2021) Improved speech emotion recognition with mel frequency magnitude coefficient. Appl Acoust 179:108046

    Article  Google Scholar 

  10. Bhavan A, Chauhan P, Shah RR, et al. (2019) Bagged support vector machines for emotion recognition from speech. Knowl-Based Syst 184:104886

    Article  Google Scholar 

  11. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B, et al. (2005) A database of german emotional speech. In: Interspeech, vol 5, pp 1517–1520

  12. Chattopadhyay S, Kundu R, Singh PK, Mirjalili S, Sarkar R (2021) Pneumonia detection from lung x-ray images using local search aided sine cosine algorithm based deep feature selection method. International Journal of Intelligent Systems, pp 1–38

  13. Daneshfar F, Kabudian SJ (2020) Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm. Multimed Tools Appl 79(1):1261–1289

    Article  Google Scholar 

  14. Danisman T, Alpkocak A (2008) Emotion classification of audio signals using ensemble of support vector machines. In: International tutorial and research workshop on perception and interactive technologies for speech-based systems. pp 205–216. Springer

  15. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: nsga-ii. IEEE Trans Evol Comput 6 (2):182–197

    Article  Google Scholar 

  16. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. pp 248–255. IEEE

  17. Dey A, Chattopadhyay S, Singh PK, Ahmadian A, Ferrara M, Sarkar R (2020) A hybrid meta-heuristic feature selection method using golden ratio and equilibrium optimization algorithms for speech emotion recognition. IEEE Access 8:200953–200970

    Article  Google Scholar 

  18. Farooq M, Hussain F, Baloch NK, Raja FR, Yu H, Zikria YB (2020) Impact of feature selection algorithm on speech emotion recognition using deep convolutional neural network. Sensors 20(21):6008

    Article  Google Scholar 

  19. Fragopanagos N, Taylor JG (2005) Emotion recognition in human–computer interaction. Neural Netw 18(4):389–405

    Article  Google Scholar 

  20. Ghosh KK, Ahmed S, Singh PK, Geem ZW, Sarkar R (2020) Improved binary sailfish optimizer based on adaptive β-hill climbing for feature selection. IEEE Access 8:83548–83560

    Article  Google Scholar 

  21. Ghosh S, Hassan S, Khan AH, Manna A, Bhowmik S, Sarkar R (2021) Application of texture-based features for text non-text classification in printed document images with novel feature selection algorithm. Soft Computing, pp 1–19

  22. Guha R, Ghosh M, Chakrabarti A, Sarkar R, Mirjalili S (2020) Introducing clustering based population in binary gravitational search algorithm for feature selection. Appl Soft Comput 93:106341

    Article  Google Scholar 

  23. Guha R, Khan AH, Singh PK, Sarkar R, Bhattacharjee D (2021) Cga: a new feature selection model for visual human action recognition. Neural Comput & Applic 33(10):5267–5286

    Article  Google Scholar 

  24. Hajarolasvadi N (2019) Demirel, h.: 3d cnn-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21(5):479

    Article  Google Scholar 

  25. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778

  26. Ibrahim H, Loo CK, Alnajjar F (2021) Speech emotion recognition by late fusion for bidirectional reservoir computing with random projection. IEEE Access 9:122855–122871

    Article  Google Scholar 

  27. Kanwal S, Asghar S (2021) Speech emotion recognition using clustering based ga-optimized feature set. IEEE Access 9:125830–125842

    Article  Google Scholar 

  28. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks. vol 4, pp 1942–1948. IEEE

  29. Khalil RA, Jones E, Babar MI, Jan T, Zafar MH, Alhussain T (2019) Speech emotion recognition using deep learning techniques: A review. IEEE Access 7:117327–117345

    Article  Google Scholar 

  30. Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: European conference on machine learning. pp 171–182. Springer

  31. Kwon S, et al. (2021) Att-net: enhanced emotion recognition system using lightweight self-attention module. Appl Soft Comput 102:107101

    Article  Google Scholar 

  32. Latif S, Qadir J, Bilal M (2019) Unsupervised adversarial domain adaptation for cross-lingual speech emotion recognition. In: 2019 8Th international conference on affective computing and intelligent interaction (ACII). pp 732–737. IEEE

  33. Latif S, Qayyum A, Usman M, Qadir J (2018) Cross lingual speech emotion recognition: urdu vs. western languages. In: 2018 International conference on frontiers of information technology (FIT). pp 88–93. IEEE

  34. Liu ZT, Wu M, Cao WH, Mao JW, Xu JP, Tan GZ (2018) Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273:271–280

    Article  Google Scholar 

  35. Liu ZT, Xie Q, Wu M, Cao WH, Mei Y, Mao JW (2018) Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing 309:145–156

    Article  Google Scholar 

  36. Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in north american english. PloS one

  37. Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst Appl 38(4):4600–4607

    Article  Google Scholar 

  38. Luukka P, Saastamoinen K, Kononen V (2001) A classifier based on the maximal fuzzy similarity in the generalized lukasiewicz-structure. In: 10Th IEEE international conference on fuzzy systems. pp 195–198. IEEE

  39. Machado PP, Beutler LE, Greenberg LS (1999) Emotion recognition in psychotherapy: impact of therapist level of experience and emotional awareness. Journal of Clinical Psychology

  40. Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing

  41. Mafarja M, Qasem A, Heidari AA, Aljarah I, Faris H, Mirjalili S (2020) Efficient hybrid nature-inspired binary optimizers for feature selection. Cognitive Computation

  42. Maldonado S, López J. (2018) Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for svm classification. Applied Soft Computing

  43. Mansouri-Benssassi E, Ye J (2019) Speech emotion recognition with early visual cross-modal enhancement using spiking neural networks. In: 2019 International joint conference on neural networks (IJCNN). pp 1–8. IEEE

  44. Mao Q, Dong M, Huang Z, Zhan Y (2014) Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans Multimed 16(8):2203–2213

    Article  Google Scholar 

  45. Meftah IT, Le Thanh N, Amar CB (2012) Detecting depression using multimodal approach of emotion recognition. In: 2012 IEEE International conference on complex systems (ICCS). IEEE

  46. Mirjalili S (2016) Sca: a sine cosine algorithm for solving optimization problems. Knowledge-based systems 96:120–133

    Article  Google Scholar 

  47. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67

    Article  Google Scholar 

  48. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Article  Google Scholar 

  49. Mirsamadi S, Barsoum E, Zhang C (2017) Automatic speech emotion recognition using recurrent neural networks with local attention. In: ICASSP. IEEE

  50. Nguyen BH, Xue B, Zhang M (2020) A survey on swarm intelligence approaches to feature selection in data mining. Swarm Evol Comput 54:100663

    Article  Google Scholar 

  51. Ooi CS, Seng KP, Ang LM, Chew LW (2014) A new approach of audio emotion recognition. Expert Syst Appl 41(13):5858–5869

    Article  Google Scholar 

  52. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037

    Google Scholar 

  53. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions in Pattern Analyis and Machine Intelligence

  54. Ramakrishnan S, El Emary IM (2013) Speech emotion recognition approaches in human computer interaction. Telecommun Syst 52(3):1467–1478

    Article  Google Scholar 

  55. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) Gsa: a gravitational search algorithm. Inf Sci 179(13):2232–2248

    Article  MATH  Google Scholar 

  56. Sarkar SS, Sheikh KH, Mahanty A, Mali K, Ghosh A, Sarkar R (2021) A harmony search-based wrapper-filter feature selection approach for microstructural image classification. Integr Mater Manuf Innov 10(1):1–19

    Article  Google Scholar 

  57. Schipor OA, Pentiuc SG, Schipor MD (2011) Towards a multimodal emotion recognition framework to be integrated in a computer based speech therapy system. In: 2011 6Th conference on speech technology and human-computer dialogue (sped). IEEE

  58. Sen S, Saha S, Chatterjee S, Mirjalili S, Sarkar R (2021) A bi-stage feature selection approach for covid-19 prediction using chest ct images. Applied Intelligence, pp 1–16

  59. Sheikh KH, Ahmed S, Mukhopadhyay K, Singh PK, Yoon JH, Geem ZW, Sarkar R (2020) Ehhm: electrical harmony based hybrid meta-heuristic for feature selection. IEEE Access 8:158125–158141

    Article  Google Scholar 

  60. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.

  61. Song P, Zheng W (2018) Feature selection based transfer subspace learning for speech emotion recognition. IEEE Trans Affect Comput 11(3):373–382

    Article  Google Scholar 

  62. Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning. pp 1139–1147. PMLR

  63. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol

  64. Tuncer T, Dogan S, Acharya UR (2021) Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowledge-Based Systems

  65. Yang XS, Deb S (2009) Cuckoo search via lévy flights. In: 2009 World congress on nature & biologically inspired computing (naBIC). IEEE

  66. Yildirim S, Kaya Y, Kılıç F (2021) A modified feature selection method based on metaheuristic algorithms for speech emotion recognition. Appl Acoust 173:107721

    Article  Google Scholar 

  67. Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv:1605.07146.

  68. Zehra W, Javed AR, Jalil Z, Khan HU, Gadekallu TR (2021) Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex & Intelligent Systems, pp 1–10

  69. Zhang R, Nie F, Li X, Wei X (2019) Feature selection with multi-view data: a survey. Inf Fusion 50:158–167

    Article  Google Scholar 

  70. Zhang H, Zhang R, Nie F, Li X (2018) A generalized uncorrelated ridge regression with nonnegative labels for unsupervised feature selection. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp 2781–2785. IEEE.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pawan Kumar Singh.

Ethics declarations

Conflict of Interests

All the authors declare that there are no conflicts of interest in the publication of this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marik, A., Chattopadhyay, S. & Singh, P.K. A hybrid deep feature selection framework for emotion recognition from human speeches. Multimed Tools Appl 82, 11461–11487 (2023). https://doi.org/10.1007/s11042-022-14052-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14052-y

Keywords

Navigation