Speaker independent feature selection for speech emotion recognition: A multi-task approach

Kalhor, Elham; Bakhtiari, Behzad

doi:10.1007/s11042-020-10119-w

Speaker independent feature selection for speech emotion recognition: A multi-task approach

Published: 31 October 2020

Volume 80, pages 8127–8146, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

602 Accesses
11 Citations
Explore all metrics

Abstract

Nowadays, automatic speech emotion recognition has numerous applications. One of the important steps of these systems is the feature selection step. Because it is not known which acoustic features of person’s speech are related to speech emotion, much effort has been made to introduce several acoustic features. However, since employing all of these features will lower the learning efficiency of classifiers, it is necessary to select some features. Moreover, when there are several speakers, choosing speaker-independent features is required. For this reason, the present paper attempts to select features which are not only related to the emotion of speech, but are also speaker-independent. For this purpose, the current study proposes a multi-task approach which selects the proper speaker-independent features for each pair of classes. The selected features are then given to the classifier. Finally, the outputs of the classifiers are appropriately combined to achieve an output of a multi-class problem. Simulation results reveal that the proposed approach outperforms other methods and offers higher efficiency in terms of detection accuracy and runtime.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

Priyadarsini Samal & Mohammad Farukh Hashmi

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Essam H. Houssein, Asmaa Hammad & Abdelmgeid A. Ali

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

Notes

References

Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp 41–48
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202
Article MathSciNet Google Scholar
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology, pp 1516–1520
Charoendee M, Suchato A, Punyabukkana P (2017) Speech emotion recognition using derived features from speech segment and kernel principal component analysis. In: Computer Science and Software Engineering (JCSSE), 2017 14th International Joint Conference on IEEE, pp 1–6
Chen L, Wu M, Zhou M, Liu Z, She J, Hirota K (2017) Dynamic emotion understanding in human-robot interaction based on two-layer fuzzy SVR-TS model. IEEE Trans Syst Man Cybern Syst 50(99):1–12
Google Scholar
Dang T, Sethu V, Ambikairajah E (2016) Factor analysis based speaker normalisation for continuous emotion prediction. In: INTERSPEECH, pp 913–917
Demircan S, Kahramanli HJNC, Applications, (2018) Application of fuzzy C-means clustering algorithm to spectral features for emotion classification from speech. Neural Comput Appl 29(8):59–66
Article Google Scholar
Dibeklioğlu H, Hammal Z, Cohn JF (2018) Dynamic multimodal measurement of depression severity using deep autoencoding. IEEE J Biomed Health Inf 22(2):525–536
Article Google Scholar
Escalera S, Pujol O, Radeva P (2010) On the decoding process in ternary error-correcting output codes. IEEE Trans Pattern Anal Mach Intell 32(1):120–134
Article Google Scholar
Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on Multimedia. ACM, New York, pp 1459–1462
Farrús M, Ejarque P, Temko A, Hernando J (2007) Histogram equalization in svm multimodal person verification. In: International Conference on Biometrics. Springer, Berlin, pp 819–827
Fredrickson BL (2001) The role of positive emotions in positive psychology: The broaden-and-build theory of positive emotions. Am Psychol 56(3):218
Article Google Scholar
Fu J, Mao Q, Tu J, Zhan Y (2019) Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis. Multimed Syst 25(5):451–461
Article Google Scholar
Fürnkranz J (2002) Round robin classification. J Mach Learn Res 2(Mar):721–747
MathSciNet MATH Google Scholar
Gajsek R, Štruc V, Mihelič F (2010) Multi-modal emotion recognition using canonical correlations and acoustic features. In: 2010 20th International Conference on Pattern Recognition. IEEE, pp 4133–4136
Gao L, Qi L, Chen E, Guan L (2014) A fisher discriminant framework based on Kernel Entropy Component Analysis for feature extraction and emotion recognition. In: 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) IEEE, pp 1–6
Jin Y, Song P, Zheng W, Zhao L (2014) A feature selection and feature fusion combination method for speaker-independent speech emotion recognition. In: Acoustics, Speech and Signal Processing (ICASSP) (2014) IEEE International Conference on. IEEE, pp 4808–4812
Kaya H, Karpov AA (2018) Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing 275:1028–1034
Article Google Scholar
Kaya H, Eyben F, Salah AA, Schuller B (2014) CCA based feature selection with application to continuous depression recognition from acoustic speech features. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE, pp 3729–3733
Kok BE, Coffey KA, Cohn MA, Catalino LI, Vacharkulksemsuk T, Algoe SB, Brantley M, Fredrickson BL (2016) How positive emotions build physical health: Perceived positive social connections account for the upward spiral between positive emotions and vagal tone: Corrigendum. Psychol Sci 27(6):931
Article Google Scholar
Kotti M, Paternò F (2012) Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. Int J Speech Technol 15(2):131–150
Article Google Scholar
Kotti M, Paterno F, Kotropoulos C (2010) Speaker-independent negative emotion recognition. In: 2010 2nd International Workshop on Cognitive Information Processing IEEE, pp 417–422
Liu J, Ji S, Ye J (2012) Multi-task feature learning via efficient l2, 1-norm minimization. Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp 339–338
Liu Z-T, Xie Q, Wu M, Cao W-H, Mei Y, Mao J-W (2018) Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing 309:145–156
Article Google Scholar
Liu Z-T, Wu M, Cao W-H, Mao J-W, Xu J-P, Tan G-Z (2018) Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273:271–280
Article Google Scholar
Lugger M, Yang B (2007) The relevance of voice quality features in speaker independent emotion recognition. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, IEEE, pp 17–20
Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. In: Data Engineering Workshops (2006) Proceedings. 22nd International Conference on, IEEE, pp 8–8
Nemirovskii A, Nesterov Y (1994) Interior point polynomial algorithms in convex programming. SIAM 36(4):682–683
Nicolaou MA, Panagakis Y, Zafeiriou S, Pantic M (2014) Robust canonical correlation analysis: Audio-visual fusion for learning continuous interest. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE, pp 1522–1526
Obozinski G, Taskar B, Jordan M (2006) Multi-task feature selection. Statistics Department, Berkeley UC, Tech Rep 2 (2.2):2
Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: From unimodal analysis to multimodal fusion. Inf Fusion 37:98–125
Article Google Scholar
Rottenberg J (2017) Emotions in depression: What do we really know? Annu Rev Clin Psychol 13:241–263
Article Google Scholar
Sarvestani RR, Boostani R (2017) FF-SKPCCA: Kernel probabilistic canonical correlation analysis. Appl Intell 46(2):438–454
Article Google Scholar
Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G (2010) Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Trans Affect Comput 1(2):119–131
Article Google Scholar
Shi C, Ruan Q, An G, Zhao R (2014) Hessian semi-supervised sparse feature selection based on L2, 1/2 -matrix norm. IEEE Trans Multimed 17(1):16–28
Article Google Scholar
Shirani A, Nilchi ARN (2016) Speech emotion recognition based on SVM as both feature selector and classifier. Int J Image Graph Sig Process 8(4):39–45
Google Scholar
Song X, Zhang J, Han Y, Jiang J (2016) Semi-supervised feature selection via hierarchical regression for web image classification. Multimed Syst 22(1):41–49
Article Google Scholar
Tang J, Liu H (2012) Unsupervised feature selection for linked social media data. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 904–912
Xie Z, Guan L (2013) Multimodal information fusion of audio emotion recognition based on kernel entropy component analysis. Int J Semant Comput 7(01):25–42
Article Google Scholar
Xu X, Huang C, Wu C, Zhao L (2016) Locally discriminant diffusion projection and its application in speech emotion recognition. Automatika 57(1):37–45
Article Google Scholar
Yaacob S, Muthusamy H, Polat K (2015) Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals. PLoS One 10(3):1–20
Google Scholar
Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90(5):1415–1423
Article Google Scholar
Yang N, Yuan J, Zhou Y, Demirkol I, Duan Z, Heinzelman W, Sturge-Apple M (2017) Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification. Int J Speech Technol 20(1):27–41
Article Google Scholar
Yang X, Garcia KM, Jung Y, Whitlow CT, McRae K, Waugh CE (2018) vmPFC activation during a stressor predicts positive emotions during stress recovery. Soc Cognit Affect Neurosci 13(3):256–268
Article Google Scholar
Yeh Y-c, Lai G-J, Lin CF, Lin C-W, Sun H-C (2015) How stress influences creativity in game-based situations: Analysis of stress hormones, negative emotions, and working memory. Comput Educ 81:143–153
Article Google Scholar
Yogesh C, Hariharan M, Ngadiran R, Adom AH, Yaacob S, Polat K (2017) Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech. Appl Soft Comput 56:217–232
Article Google Scholar
Yogesh C, Hariharan M, Yuvaraj R, Ngadiran R, Yaacob S, Polat K (2017) Bispectral features and mean shift clustering for stress and emotion recognition from natural speech. Comput Electr Eng 62(2):676–691
Google Scholar
Yogesh C, Hariharan M, Ngadiran R, Adom AH, Yaacob S, Berkai C, Polat K (2017) A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal. Expert Syst Appl 69(1):149–158
Google Scholar
Zhang S, Zhao X, Lei B (2013) Speech emotion recognition using an enhanced kernel isomap for human-robot interaction. Int J Adv Rob Syst 10(2):114
Article Google Scholar
Zhang B, Provost EM, Essl G (2016) Cross-corpus acoustic emotion recognition from singing and speaking: A multi-task learning approach. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Piscataway, pp 5805–5809
Zhang B, Provost EM, Essl G (2017) Cross-corpus acoustic emotion recognition with multi-task learning: Seeking common ground while preserving differences. IEEE Trans Affect Comput 10(1):85–99
Article Google Scholar
Zhou J, Chen J, Ye J (2011) Malsar: Multi-task learning via structural regularization. Arizona State University, Tempe, 21
Zou D, Wang J (2015) Speech recognition using locality preserving projection based on multi kernel learning supervision. In: 2015 International Symposium on Computers & Informatics, vol 2352-538X. Atlantis Press, Amsterdam, pp 1508–1516

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Sadjad University of Technology, No. 64 Jalal Al Ahmad St, 9188148848, Mashhad, Iran
Elham Kalhor & Behzad Bakhtiari

Authors

Elham Kalhor
View author publications
You can also search for this author in PubMed Google Scholar
Behzad Bakhtiari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Behzad Bakhtiari.

Ethics declarations

Conflict of interest

The authors have no potential conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalhor, E., Bakhtiari, B. Speaker independent feature selection for speech emotion recognition: A multi-task approach. Multimed Tools Appl 80, 8127–8146 (2021). https://doi.org/10.1007/s11042-020-10119-w

Download citation

Received: 04 May 2019
Revised: 25 August 2020
Accepted: 19 October 2020
Published: 31 October 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10119-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speaker independent feature selection for speech emotion recognition: A multi-task approach

Abstract

Access this article

Similar content being viewed by others

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speaker independent feature selection for speech emotion recognition: A multi-task approach

Abstract

Access this article

Similar content being viewed by others

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Automatic speech recognition: a survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation