Ensemble softmax regression model for speech emotion recognition

Sun, Yaxin; Wen, Guihua

doi:10.1007/s11042-016-3487-y

Ensemble softmax regression model for speech emotion recognition

Published: 02 April 2016

Volume 76, pages 8305–8328, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yaxin Sun¹ &
Guihua Wen²

980 Accesses
36 Citations
Explore all metrics

Abstract

Automatic emotion recognition from speech signals is one of the important research areas. Most speech emotion recognition methods have been proposed, among which ensemble learning is an effective way. However, they are still confronted with problems, such as the curse of dimensionality and the diversity of the base classifiers hardly ensured. To overcome the problems, this paper proposes an ensemble Softmax regression model for speech emotion recognition (ESSER). It applies the feature extraction methods with much different principles to generate the subspaces for the base classifier, so that the diversity of the base classifiers could be ensured. Furthermore, a feature selection method that selects features according to global structure of the data is used to reduce the dimension of subspaces, which can further increase the diversity of the base classifiers and overcome the curse of dimensionality. As in the case of the diversity of the base classifiers ensured, the performance of ensemble classifier highly depends on the ability of the base classifier, it is reasonable for ESSER to select Softmax as the base classifier as Softmax has shown its superiority in speech emotion recognition. The conducted experiments validate the proposed approach in term of the performance of speech emotion recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handling high dimensional features by ensemble learning for emotion identification from speech signal

Article 02 November 2021

Diversity subspace generation based on feature selection for speech emotion recognition

Article 17 August 2023

Four-stage feature selection to recognize emotion from speech signals

Article 29 July 2015

References

Attabi Y, Dumouchel P (2013) Anchor models for emotion recognition from speech. IEEE Trans Affect Comput 4(3):280–290
Article Google Scholar
Brown G, Pocock A, Zhao M-J, Lujan M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:27–66
MathSciNet MATH Google Scholar
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, Weiss B (2005) A database of German emotional speech. In: Proc. INTERSPEECH, Lisbon, p 1517–1520
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proc. international conference on knowledge discovery and data mining, p 333–342
Cao H, Verma R, Nenkov A (2014) Speaker-sensitive emotion recognition via ranking: studies on acted and spontaneous speech. Comput Speech Lang in Press, Fan
Chang C-C, Lin C-J (2011) LIBSVM -- a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Comparing multiple classifiers for speech-based detection of self confidence—a pilot study
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
Danisman T, Alpkocak A (2008) Emotion classification of audio signals using ensemble of support vector machines. Percep Multimodal Dialogue Syst 5078:205–216
Article Google Scholar
El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587
Article MATH Google Scholar
Ellis DPW (2005) PLP and RASTA in Matlab. http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/
Eyben F, Wöllmer M, Schuller B (2010) openSMILE-The Munich versatile and fast open-source audio feature extractor. In: ACM Multimedia (MM), Florence, p 1459–1462
Haq S, Jackson PJB (2009) Speaker-dependent audio-visual emotion recognition. In: Proc. International Conference on Auditory Visual Speech Processing (AVSP), p 53–58
Hassan A, Damper RI (2012) Classification of emotional speech using 3DEC hierarchical classifier. Speech Comm 54(7):903–916
Article Google Scholar
Hassan A, Damper R, Niranjan M (2013) On acoustic emotion recognition: compensating for covariate shift. IEEE Trans Audio Speech Lang Process 21(7):1458–1468
Article Google Scholar
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752
Article Google Scholar
Hermansky H, Morgan N, Bayya A, Kohn P (1992) RASTA-PLP speech analysis technique. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Dallas, p 121–124
Huang Y, Guobao Z, Xu X (2009) Speech emotion recognition research based on the stacked generalization ensemble neural network for robot pet. In: Proc. Chinese Conference on Pattern Recognition (CCPR), p 1–5
Huang D-Y, Zhang Z, Ge SS (2014) Speaker state classification based on fusion of asymmetric simple partial least squares (SIMPLS) and support vector machines. Expert Syst Appl 28(2):392–419
Google Scholar
Kobayashi VB, Calag VB (2013) Detection of affective states from speech signals using ensembles of classifiers. In: Proc. IET Intelligent Signal Processing Conference (ISP), p 1–9
Kockmann M, Burget L, Cernocky J (2009) Brno University of Technology System for Interspeech 2009 emotion challenge. In: Proc. INTERSPEECH
Leea C-C, Mowera E, Bussob C, Leea S, Narayanana S (2011) Emotion recognition using a hierarchical binary decision tree approach. Speech Comm 53(9–10):1162–1171
Article Google Scholar
Mariooryad S, Busso C (2014) Compensating for speaker or lexical variabilities in speech for emotion recognition. Speech Comm 57:1–9
Article Google Scholar
Milton A, Tamil Selvi S (2014) Class-specific multiple classifiers scheme to recognize emotions from speech signals. Comput Speech Lang 28(3):727–742
Article Google Scholar
Morrison D, De Silva LC (2007) Voting ensembles for spoken affect classification. J Netw Comput Appl 30(4):1356–1365
Article Google Scholar
Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centers. Speech Comm 49(2):98–112
Article Google Scholar
Natalie van der Wal C, Kowalczyk W (2013) Detecting changing emotions in human speech by machine and humans. Neural Comput & Applic 39(4):675–691
Google Scholar
Nocedal J, Damper R, Niranjan M (1980) Updating quasi-Newton matrices with limited storage. Math Comput 35(151):773–782
Article MathSciNet MATH Google Scholar
Ntalampiras S, Fakotakis N (2012) Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Trans Affect Comput 3(1):116–125
Article Google Scholar
Ooi CS, Seng KP, Ang L-M, Chew LW (2014) A new approach of audio emotion recognition. Expert Syst Appl 14(13):5858–5869
Article Google Scholar
Park J-S, Kim J-H, Yung-Hwan O (2009) Feature vector classification based speech emotion recognition for service robots. IEEE Trans Consum Electron 55(3):1590–1596
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Qian Y, Ying L, Pingping J (2013) Speech emotion recognition using supervised manifold learning based on all class and pairwise-class feature extraction. In: Proc. IEEE Conference Anthology, p 1–5
Rozgic V, Ananthakrishnan S, Saleem S, Kumar R, Prasad R (2012) Ensemble of SVM trees for multimodal emotion recognition. In: Signal & information processing association annual summit and conference (APSIPA ASC), Hollywood, p 1–4
Sarker MK, Alam KMR, ArifuzzamanM (2014) Emotion recognition from speech based on relevant feature and majority voting. In: Proc. International Conference on Informatics, Electronics & Vision, p 1–5
Schuller B, Reiter S, Muller R, Al-Hames M, Lang M, Rigoll G (2005) Speaker independent speech emotion recognition by ensemble classification. In: Proc. IEEE International Conference on Multimedia and Expo(ICME), Amsterdam, p 864–867
Schuller, S. Steidl, A. Batliner (2009) The INTERSPEECH 2009 emotion challenge. In: Proc. INTERSPEECH
Schuller B, Steidl S, Batliner A (2010) The INTERSPEECH 2010 paralinguistic challenge. In: proc. INTERSPEECH, p 2794–2797
Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G (2012) Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput 1(2):119–131
Article Google Scholar
Steidl S (2009) Automatic classification of emotion related user states in spontaneous children’s speech. Logos Verlag
Vlasenko B, Prylipko D, Böck R, Wendemuth A (2014) Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Comput Speech Lang 28(2):483–500
Article Google Scholar
Voicebox: speech processing toolbox for MATLAB, http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/doc/voicebox/index.html
Wagner J, Lingenfelser F, Andre E, Kim J (2011) Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans Affect Comput 4(2):206–218
Article Google Scholar
Weiss GM, Provost F (2001) The effect of class distribution on classifier learning, technical report, Department of Computer Science, Rutgers University
Wu S, Falk TH, Chan W-Y (2011) Automatic speech emotion recognition using modulation spectral features. Speech Comm 24(7):768–785
Article Google Scholar
Yan R, Liu Y, Jin R, Hauptmann A (2003) On predicting rare cases with SVM ensembles in scene classification. In: Proc. International Conference on Acoustics, Speech, and Signal Processing
Yuanlu K, Li L (2013) Speech emotion recognition of decision fusion based on DS evidence theory. In: Proc. International Conference on Software Engineering and Service Science, p 795–798
Zhao X, Zhang S, Lei B (2014) Robust emotion recognition in noisy speech via sparse representation. Neural Comput & Applic 24(7-8):1539–1553
Article Google Scholar
Zheng W, Xin M, Wang X, Wang B (2014) A novel speech emotion recognition method via incomplete sparse least square regression. IEEE Signal Process Lett 21(5):569–572
Article Google Scholar

Download references

Acknowledgments

This work was supported by China National Science Foundation under Grants 60973083, 61273363, State Key Laboratory of Brain and Cognitive Science under grants 08B12.

Author information

Authors and Affiliations

Jiaxing University, Jiaxing, 314001, China
Yaxin Sun
South China University of Technology, Guangzhou, 510006, China
Guihua Wen

Authors

Yaxin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Guihua Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaxin Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Y., Wen, G. Ensemble softmax regression model for speech emotion recognition. Multimed Tools Appl 76, 8305–8328 (2017). https://doi.org/10.1007/s11042-016-3487-y

Download citation

Received: 06 December 2015
Revised: 10 March 2016
Accepted: 21 March 2016
Published: 02 April 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11042-016-3487-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble softmax regression model for speech emotion recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Handling high dimensional features by ensemble learning for emotion identification from speech signal

Diversity subspace generation based on feature selection for speech emotion recognition

Four-stage feature selection to recognize emotion from speech signals

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Ensemble softmax regression model for speech emotion recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Handling high dimensional features by ensemble learning for emotion identification from speech signal

Diversity subspace generation based on feature selection for speech emotion recognition

Four-stage feature selection to recognize emotion from speech signals

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation