Skip to main content

Advertisement

Log in

Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Feature extraction and classification decision play an important role in speech emotion recognition. To improve the performance of the multi-classification speech emotion recognition (SER) system, a two-stage bottleneck features selection model and a novel multi-classifier joint decision (MCJD) algorithm are proposed. In two-stage bottleneck features selection model, firstly, bottleneck features at different hidden layers are extracted from deep neural network (DNN) and fused using genetic algorithm (GA). Secondly, principal component analysis (PCA) is used to eliminate the dimension disaster caused by high-dimensional feature vectors. In addition, to make up for the shortcomings of single SVM classifier in SER, we use different feature sets to train multiple SVM classifiers based on classification targets. The final recognition result is obtained by joint decision of SVMs according to MCJD algorithm. Five-fold cross-validation is used, and an average accuracy of 84.89% is achieved using the two-stage bottleneck features selection model and traditional support vector machines (SVM) classifier. Then, using the MCJD algorithm, the average SER rate of the multi-classification SER system for seven kinds of emotions is 87.08% on Berlin Database, which further improves the performance of SER system and shows the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig.5

Similar content being viewed by others

References

  1. Zhang, Z., Coutinho, E., Deng, J., et al.: Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans. Audio Speech Lang. Proc. 23(1), 115–126 (2015)

    Google Scholar 

  2. Tahon, M., Devillers, L.: Towards a small set of robust acoustic features for emotion recognition: challenges. IEEE/ACM Trans. Audio Speech Lang. Proc. 24(1), 16–28 (2016)

    Article  Google Scholar 

  3. Sun, L., Fu, S., Wang, F.: Decision tree SVM model with Fisher feature selection for speech emotion recognition. J Audio Speech Music Proc. 2019, 2 (2019)

    Article  Google Scholar 

  4. Chuang, Z.J., Wu, C.H.: Emotion recognition using acoustic features and textual content. In: 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763), Taipei, vol. 1, pp. 53–56 (2004).

  5. Sun, L., Zou, B., Fu, S., et al.: Speech emotion recognition based on DNN-decision tree SVM model. Speech Commun. 115, 29–37 (2019)

    Article  Google Scholar 

  6. Liu, G., He, W., Jin, B.: Feature fusion of speech emotion recognition based on deep learning. In: 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC), Guiyang, pp. 193–197 (2018)

  7. Hifny, Y., Ali, A.: Efficient Arabic emotion recognition using deep neural networks. In: ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, pp. 6710–6714 (2019)

  8. Tzirakis, P., Zhang, J., Schuller, B. W.: End-to-end speech emotion recognition using deep neural networks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, , pp. 5089–5093 (2018).

  9. Kim, E., Shin, J.W.: DNN-based emotion recognition based on bottleneck acoustic features and lexical features. In: ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, pp. 6720–6724 (2019)

  10. Lee, K.H., Kyun Choi, H., Jang, B.T.: A study on speech emotion recognition using a deep neural network. In: 2019 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, pp. 1162–1165 (2019)

  11. Harár, P., Burget, R., Dutta, M.K.: Speech emotion recognition with deep learning. In: 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, pp. 137–140 (2017)

  12. Wu, A., Huang, Y., Zhang, G.: Feature fusion methods for robust speech emotion recognition based on deep belief networks. In: Proceedings of the Fifth International Conference on Network, Communication and Computing (ICNCC '16). Association for Computing Machinery, New York, pp. 6–10 (2018)

  13. Long, X., Qu, D. Joint bottleneck feature and attention model for speech recognition. In: Proceedings of 2018 International Conference on Mathematics and Artificial Intelligence (ICMAI '18). Association for Computing Machinery, New York, pp 46–50 (2018)

  14. Wöllmer, M., Schuller, B.: Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks. Neurocomputing 132, 113–120 (2014)

    Article  Google Scholar 

  15. Ke, X., Cao, B., Bai, J. et al: Speech emotion recognition based on PCA and CHMM. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, pp. 667–671 (2019).

  16. Jagini, N.P., Rao R.R.: Exploring emotion specific features for emotion recognition system using PCA approach. In: 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, pp. 58–62 (2017)

  17. Zhang, S., Lei, B., Chen, A. et al.: KIsomap-based feature extraction for spoken emotion recognition. In: IEEE 10th International Conference on Signal Processing Proceedings, Beijing, pp. 1374–1377 (2010)

  18. Siegert, I., Böck, R., Wendemuth, A.: Using a PCA-based dataset similarity measure to improve cross-corpus emotion recognition. Comput. Speech Lang. 51, 1–23 (2018)

    Article  Google Scholar 

  19. Kanth, N. R., Saraswathi, S.: Efficient speech emotion recognition using binary support vector machines & multiclass SVM. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Madurai, pp. 1–6 (2015)

  20. Lanjewar, R.B., Mathurkar, S., Patel, N.: Implementation and comparison of speech emotion recognition system using Gaussian mixture model (GMM) and K-nearest neighbor (K-NN) techniques. Proc. Comput. Sci. 49, 50–57 (2015)

    Article  Google Scholar 

  21. Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding[J]. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)

    Article  Google Scholar 

  22. Orłowski, T.: Application of deep belief networks in image semantic analysis and lossy compression for transmission. In: 2013 Signal Processing Symposium (SPS), Serock, pp. 1–5 (2013)

  23. Sim, K.B., Jang, I.H., Park, C.H.: The development of interactive feature selection and GA feature selection method for emotion recognition. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) Knowledge-Based Intelligent Information and Engineering Systems. KES 2007. Lecture Notes in Computer Science, vol 4694. Springer, Berlin (2007)

  24. Le, B.V., Bang, J., Lee, S.: Hierarchical emotion classification using genetic algorithms. In: Proceedings of the Fourth Symposium on Information and Communication Technology (SoICT '13). Association for Computing Machinery, New York, pp. 158–163 (2013)

  25. Daneshfar, F., Kabudian, S.J.: Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm. Multimed. Tools Appl. 79(1), 1261–1289 (2020)

    Article  Google Scholar 

  26. Ntalampiras, S.: Speech emotion recognition via learning analogies. Pattern Recogn. Lett. 144, 21–26 (2021)

    Article  Google Scholar 

  27. Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)

    Article  Google Scholar 

  28. Mustaqeem, M., Sajjad, M., Kwon, S.: Clustering based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access (2020)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61901227), and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 19KJB510049).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linhui Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Huang, Y., Li, Q. et al. Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm. SIViP 16, 1253–1261 (2022). https://doi.org/10.1007/s11760-021-02076-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-02076-0

Keywords

Navigation