Abstract
Since the contextual information has an important impact on the speaker’s emotional state, how to use emotion-related context information to conduct feature learning is a key problem. The existing speech emotion recognition algorithms achieve the relatively high recognition rate; these algorithms are not very good application to the real-life speech emotion recognition systems. Therefore, in order to address the abovementioned issues, a novel speech emotion recognition algorithm based on improved stacked kernel sparse deep model is proposed in this paper, which is based on auto-encoder, denoising auto-encoder, and sparse auto-encoder to improve the Chinese speech emotion recognition. The first layer of the structure uses a denoising auto-encoder to learn a hidden feature with a larger dimension than the dimension of the input features, and the second layer employs a sparse auto-encoder to learn sparse features. Finally, a wavelet-kernel sparse SVM classifier is applied to classify the features. The proposed algorithm is evaluated on the testing dataset, which contains the speech emotion data of spontaneous, non-prototypical, and long-term. The experimental results show that the proposed algorithm outperforms the existing state-of-the-art algorithms in speech emotion recognition.





Similar content being viewed by others
References
Wang K, An N, Li BN, et al (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6(1):69–75
Fayek HM, Lech M, Cavedon L (2017) Evaluating deep learning architectures for speech emotion recognition. Neural Netw 92:60–68. S089360801730059X
Motamed S, Setayeshi S, Rabiee A (2017) Speech emotion recognition based on a modified brain emotional learning model. Biol Inspired Cogn Architectures 19:32–38
Liu ZT, Wu M, Cao WH, et al (2017) Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273:271–280. S0925231217313565
Avila AR, Momin Z. Santos AJF, O'Shaughnessy D, Falk TH (2018) Feature pooling of modulation spectrum features for improved speech emotion recognition in the wild. In: IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2018.2858255
Mohammadi Z, Frounchi J, Amiri M (2016) Wavelet-based emotion recognition system using eeg signal. Neural Comput Applic 12(2):112–134
Liu ZT, Xie Q, Wu M, et al (2018) Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing 309:145–156. https://doi.org/10.1016/j.neucom.2018.05.005
Darekar RV, Dhande A (2018) Emotion recognition from Marathi speech database using adaptive artificial neural network. Biol Cogn Architectures 23:35–42. S2212683X17301214
Yogesh CK, Hariharan M, Ngadiran R, et al (2017) Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech. Appl Soft Comput 56:217–232
Jain N, Kumar S, Kumar A, et al (2018) Hybrid deep neural networks for face emotion recognition. Pattern Recogn Lett 115:101–106. S0167865518301302
He X, Zhang W (2018) Emotion recognition by assisted learning with convolutional neural networks. Neurocomputing 291:187–194. S0925231218302406
Xia R, Liu Y (2017) A multi-task learning framework for emotion recognition using 2D continuous space. In: IEEE Transactions on affective computing, vol 8, no 1, pp 3–14
Xu B, Fu Y, Jiang YG, Li B, Sigal L, et al (2018) Heterogeneous knowledge transfer in video emotion recognition, attribution and summarization. IEEE Trans Affect Comput 9(2):255–270
Li J, Zhang Z, He H (2017) Hierarchical convolutional neural networks for EEG-based emotion recognition. Cogn Comput 10(2):368–380
Torres-Valencia C, álvarez-López M, Orozco-Gutiérrez á (2017) SVM-based feature selection methods for emotion recognition from multimodal data. J Multimodal User Interfaces 11(1):9–23
Mo S, Niu J, Su Y, Das SK, et al (2018) A novel feature set for video emotion recognition. Neurocomputing 291:11–20
Ruiz-Garcia A, Elshaw M, Altahhan A, Palade V, et al (2018) A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots. Neural Comput Applic 29:359–373
Boubenna H, Lee D (2018) Image-based emotion recognition using evolutionary algorithms. Biol Inspired Cogn Architectures 24:70–76. S2212683X18300185
Zhang T, Zheng W, Cui Z, Zong Y, Li Y (2017) Spatial–temporal recurrent neural network for emotion recognition. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2017.2788081
García HF, álvarez MA, Orozco á A (2017) Dynamic facial landmarking selection for emotion recognition using Gaussian processes. J Multimodal User Interfaces 11(4):327–340
Mistry K, Zhang L, Neoh SC, et al (2016) A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Trans Cybern 47(6):1–14
Zhong Y, Yongxiong W, Li L, et al (2017) Cross-subject EEG feature selection for emotion recognition using transfer recursive feature elimination. Front Neurorobot 11:19
Lee SH, Ro YM (2017) Partial matching of facial expression sequence using over-complete transition dictionary for emotion recognition. IEEE Trans Affect Comput 7(4):389–408
Jacob A (2016) Speech emotion recognition based on minimal voice quality features. In: 2016 International Conference on Communication and Signal Processing (ICCSP), IEEE, Melmaruvathur, pp 0886–0890
Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2, ICME 2003. IEEE Computer Society, pp 401–404
Zhou J, Wang G, Yang Y, Chen P (2006) Speech Emotion Recognition Based on Rough Set and SVM. In: Proceeding of Fifth IEEE International Conference on Cognitive Informatics. IEEE Computer Society Press, Los Alamitos, pp 53–61
Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and Recurrent Neural Networks. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp 1–4
Neumann M, Vu NT (2017) Attentive convolutional neural network based speech emotion recognition: a study on the impact of input deatures, signal length, and acted speech. Comput Therm Sci 12:52
Huang Z, Xue W, Mao Q, Zhan Y, et al (2017) Unsupervised domain adaptation for speech emotion recognition using PCANet. Multimed Tools Appl 76(5):6785–6799
Mirsamadi S, Barsoum E, Zhang C (2017) Automatic speech emotion recognition using recurrent neural networks with local attention. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2227–2231
Funding
This work was supported by Chongqing Big Data Engineering Laboratory for Children, Chongqing Electronics Engineering Technology Research Center for Interactive Learning, and Project of Science and Technology Research Program of Chongqing Education Commission of China (N0. KJZD-K201801601).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wei, P., Zhao, Y. A novel speech emotion recognition algorithm based on wavelet kernel sparse classifier in stacked deep auto-encoder model. Pers Ubiquit Comput 23, 521–529 (2019). https://doi.org/10.1007/s00779-019-01246-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-019-01246-9