Skip to main content
Log in

A novel speech emotion recognition algorithm based on wavelet kernel sparse classifier in stacked deep auto-encoder model

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Since the contextual information has an important impact on the speaker’s emotional state, how to use emotion-related context information to conduct feature learning is a key problem. The existing speech emotion recognition algorithms achieve the relatively high recognition rate; these algorithms are not very good application to the real-life speech emotion recognition systems. Therefore, in order to address the abovementioned issues, a novel speech emotion recognition algorithm based on improved stacked kernel sparse deep model is proposed in this paper, which is based on auto-encoder, denoising auto-encoder, and sparse auto-encoder to improve the Chinese speech emotion recognition. The first layer of the structure uses a denoising auto-encoder to learn a hidden feature with a larger dimension than the dimension of the input features, and the second layer employs a sparse auto-encoder to learn sparse features. Finally, a wavelet-kernel sparse SVM classifier is applied to classify the features. The proposed algorithm is evaluated on the testing dataset, which contains the speech emotion data of spontaneous, non-prototypical, and long-term. The experimental results show that the proposed algorithm outperforms the existing state-of-the-art algorithms in speech emotion recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Wang K, An N, Li BN, et al (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6(1):69–75

    Article  Google Scholar 

  2. Fayek HM, Lech M, Cavedon L (2017) Evaluating deep learning architectures for speech emotion recognition. Neural Netw 92:60–68. S089360801730059X

  3. Motamed S, Setayeshi S, Rabiee A (2017) Speech emotion recognition based on a modified brain emotional learning model. Biol Inspired Cogn Architectures 19:32–38

    Article  Google Scholar 

  4. Liu ZT, Wu M, Cao WH, et al (2017) Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273:271–280. S0925231217313565

  5. Avila AR, Momin Z. Santos AJF, O'Shaughnessy D, Falk TH (2018) Feature pooling of modulation spectrum features for improved speech emotion recognition in the wild. In: IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2018.2858255

  6. Mohammadi Z, Frounchi J, Amiri M (2016) Wavelet-based emotion recognition system using eeg signal. Neural Comput Applic 12(2):112–134

  7. Liu ZT, Xie Q, Wu M, et al (2018) Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing 309:145–156. https://doi.org/10.1016/j.neucom.2018.05.005

  8. Darekar RV, Dhande A (2018) Emotion recognition from Marathi speech database using adaptive artificial neural network. Biol Cogn Architectures 23:35–42. S2212683X17301214

  9. Yogesh CK, Hariharan M, Ngadiran R, et al (2017) Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech. Appl Soft Comput 56:217–232

    Article  Google Scholar 

  10. Jain N, Kumar S, Kumar A, et al (2018) Hybrid deep neural networks for face emotion recognition. Pattern Recogn Lett 115:101–106. S0167865518301302

  11. He X, Zhang W (2018) Emotion recognition by assisted learning with convolutional neural networks. Neurocomputing 291:187–194. S0925231218302406

    Article  Google Scholar 

  12. Xia R, Liu Y (2017) A multi-task learning framework for emotion recognition using 2D continuous space. In: IEEE Transactions on affective computing, vol 8, no 1, pp 3–14

  13. Xu B, Fu Y, Jiang YG, Li B, Sigal L, et al (2018) Heterogeneous knowledge transfer in video emotion recognition, attribution and summarization. IEEE Trans Affect Comput 9(2):255–270

    Article  Google Scholar 

  14. Li J, Zhang Z, He H (2017) Hierarchical convolutional neural networks for EEG-based emotion recognition. Cogn Comput 10(2):368–380

    Article  Google Scholar 

  15. Torres-Valencia C, álvarez-López M, Orozco-Gutiérrez á (2017) SVM-based feature selection methods for emotion recognition from multimodal data. J Multimodal User Interfaces 11(1):9–23

    Article  Google Scholar 

  16. Mo S, Niu J, Su Y, Das SK, et al (2018) A novel feature set for video emotion recognition. Neurocomputing 291:11–20

    Article  Google Scholar 

  17. Ruiz-Garcia A, Elshaw M, Altahhan A, Palade V, et al (2018) A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots. Neural Comput Applic 29:359–373

    Article  Google Scholar 

  18. Boubenna H, Lee D (2018) Image-based emotion recognition using evolutionary algorithms. Biol Inspired Cogn Architectures 24:70–76. S2212683X18300185

  19. Zhang T, Zheng W, Cui Z, Zong Y, Li Y (2017) Spatial–temporal recurrent neural network for emotion recognition. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2017.2788081

  20. García HF, álvarez MA, Orozco á A (2017) Dynamic facial landmarking selection for emotion recognition using Gaussian processes. J Multimodal User Interfaces 11(4):327–340

    Article  Google Scholar 

  21. Mistry K, Zhang L, Neoh SC, et al (2016) A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Trans Cybern 47(6):1–14

    Google Scholar 

  22. Zhong Y, Yongxiong W, Li L, et al (2017) Cross-subject EEG feature selection for emotion recognition using transfer recursive feature elimination. Front Neurorobot 11:19

    Google Scholar 

  23. Lee SH, Ro YM (2017) Partial matching of facial expression sequence using over-complete transition dictionary for emotion recognition. IEEE Trans Affect Comput 7(4):389–408

    Article  Google Scholar 

  24. Jacob A (2016) Speech emotion recognition based on minimal voice quality features. In: 2016 International Conference on Communication and Signal Processing (ICCSP), IEEE, Melmaruvathur, pp 0886–0890

  25. Schuller B, Rigoll G, Lang M (2003) Hidden Markov model-based speech emotion recognition. In: Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2, ICME 2003. IEEE Computer Society, pp 401–404

  26. Zhou J, Wang G, Yang Y, Chen P (2006) Speech Emotion Recognition Based on Rough Set and SVM. In: Proceeding of Fifth IEEE International Conference on Cognitive Informatics. IEEE Computer Society Press, Los Alamitos, pp 53–61

  27. Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and Recurrent Neural Networks. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp 1–4

  28. Neumann M, Vu NT (2017) Attentive convolutional neural network based speech emotion recognition: a study on the impact of input deatures, signal length, and acted speech. Comput Therm Sci 12:52

    Google Scholar 

  29. Huang Z, Xue W, Mao Q, Zhan Y, et al (2017) Unsupervised domain adaptation for speech emotion recognition using PCANet. Multimed Tools Appl 76(5):6785–6799

    Article  Google Scholar 

  30. Mirsamadi S, Barsoum E, Zhang C (2017) Automatic speech emotion recognition using recurrent neural networks with local attention. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2227–2231

Download references

Funding

This work was supported by Chongqing Big Data Engineering Laboratory for Children, Chongqing Electronics Engineering Technology Research Center for Interactive Learning, and Project of Science and Technology Research Program of Chongqing Education Commission of China (N0. KJZD-K201801601).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengcheng Wei.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, P., Zhao, Y. A novel speech emotion recognition algorithm based on wavelet kernel sparse classifier in stacked deep auto-encoder model. Pers Ubiquit Comput 23, 521–529 (2019). https://doi.org/10.1007/s00779-019-01246-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-019-01246-9

Keywords

Navigation