Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm

Sun, Linhui; Huang, Yiqing; Li, Qiu; Li, Pingan

doi:10.1007/s11760-021-02076-0

Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm

Original Paper
Published: 12 January 2022

Volume 16, pages 1253–1261, (2022)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Linhui Sun¹,
Yiqing Huang¹,
Qiu Li¹ &
…
Pingan Li¹

472 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Feature extraction and classification decision play an important role in speech emotion recognition. To improve the performance of the multi-classification speech emotion recognition (SER) system, a two-stage bottleneck features selection model and a novel multi-classifier joint decision (MCJD) algorithm are proposed. In two-stage bottleneck features selection model, firstly, bottleneck features at different hidden layers are extracted from deep neural network (DNN) and fused using genetic algorithm (GA). Secondly, principal component analysis (PCA) is used to eliminate the dimension disaster caused by high-dimensional feature vectors. In addition, to make up for the shortcomings of single SVM classifier in SER, we use different feature sets to train multiple SVM classifiers based on classification targets. The final recognition result is obtained by joint decision of SVMs according to MCJD algorithm. Five-fold cross-validation is used, and an average accuracy of 84.89% is achieved using the two-stage bottleneck features selection model and traditional support vector machines (SVM) classifier. Then, using the MCJD algorithm, the average SER rate of the multi-classification SER system for seven kinds of emotions is 87.08% on Berlin Database, which further improves the performance of SER system and shows the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Feature dimensionality reduction: a review

Article Open access 21 January 2022

References

Zhang, Z., Coutinho, E., Deng, J., et al.: Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans. Audio Speech Lang. Proc. 23(1), 115–126 (2015)
Google Scholar
Tahon, M., Devillers, L.: Towards a small set of robust acoustic features for emotion recognition: challenges. IEEE/ACM Trans. Audio Speech Lang. Proc. 24(1), 16–28 (2016)
Article Google Scholar
Sun, L., Fu, S., Wang, F.: Decision tree SVM model with Fisher feature selection for speech emotion recognition. J Audio Speech Music Proc. 2019, 2 (2019)
Article Google Scholar
Chuang, Z.J., Wu, C.H.: Emotion recognition using acoustic features and textual content. In: 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763), Taipei, vol. 1, pp. 53–56 (2004).
Sun, L., Zou, B., Fu, S., et al.: Speech emotion recognition based on DNN-decision tree SVM model. Speech Commun. 115, 29–37 (2019)
Article Google Scholar
Liu, G., He, W., Jin, B.: Feature fusion of speech emotion recognition based on deep learning. In: 2018 International Conference on Network Infrastructure and Digital Content (IC-NIDC), Guiyang, pp. 193–197 (2018)
Hifny, Y., Ali, A.: Efficient Arabic emotion recognition using deep neural networks. In: ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, pp. 6710–6714 (2019)
Tzirakis, P., Zhang, J., Schuller, B. W.: End-to-end speech emotion recognition using deep neural networks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, , pp. 5089–5093 (2018).
Kim, E., Shin, J.W.: DNN-based emotion recognition based on bottleneck acoustic features and lexical features. In: ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, pp. 6720–6724 (2019)
Lee, K.H., Kyun Choi, H., Jang, B.T.: A study on speech emotion recognition using a deep neural network. In: 2019 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, pp. 1162–1165 (2019)
Harár, P., Burget, R., Dutta, M.K.: Speech emotion recognition with deep learning. In: 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, pp. 137–140 (2017)
Wu, A., Huang, Y., Zhang, G.: Feature fusion methods for robust speech emotion recognition based on deep belief networks. In: Proceedings of the Fifth International Conference on Network, Communication and Computing (ICNCC '16). Association for Computing Machinery, New York, pp. 6–10 (2018)
Long, X., Qu, D. Joint bottleneck feature and attention model for speech recognition. In: Proceedings of 2018 International Conference on Mathematics and Artificial Intelligence (ICMAI '18). Association for Computing Machinery, New York, pp 46–50 (2018)
Wöllmer, M., Schuller, B.: Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks. Neurocomputing 132, 113–120 (2014)
Article Google Scholar
Ke, X., Cao, B., Bai, J. et al: Speech emotion recognition based on PCA and CHMM. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, pp. 667–671 (2019).
Jagini, N.P., Rao R.R.: Exploring emotion specific features for emotion recognition system using PCA approach. In: 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, pp. 58–62 (2017)
Zhang, S., Lei, B., Chen, A. et al.: KIsomap-based feature extraction for spoken emotion recognition. In: IEEE 10th International Conference on Signal Processing Proceedings, Beijing, pp. 1374–1377 (2010)
Siegert, I., Böck, R., Wendemuth, A.: Using a PCA-based dataset similarity measure to improve cross-corpus emotion recognition. Comput. Speech Lang. 51, 1–23 (2018)
Article Google Scholar
Kanth, N. R., Saraswathi, S.: Efficient speech emotion recognition using binary support vector machines & multiclass SVM. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Madurai, pp. 1–6 (2015)
Lanjewar, R.B., Mathurkar, S., Patel, N.: Implementation and comparison of speech emotion recognition system using Gaussian mixture model (GMM) and K-nearest neighbor (K-NN) techniques. Proc. Comput. Sci. 49, 50–57 (2015)
Article Google Scholar
Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding[J]. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)
Article Google Scholar
Orłowski, T.: Application of deep belief networks in image semantic analysis and lossy compression for transmission. In: 2013 Signal Processing Symposium (SPS), Serock, pp. 1–5 (2013)
Sim, K.B., Jang, I.H., Park, C.H.: The development of interactive feature selection and GA feature selection method for emotion recognition. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) Knowledge-Based Intelligent Information and Engineering Systems. KES 2007. Lecture Notes in Computer Science, vol 4694. Springer, Berlin (2007)
Le, B.V., Bang, J., Lee, S.: Hierarchical emotion classification using genetic algorithms. In: Proceedings of the Fourth Symposium on Information and Communication Technology (SoICT '13). Association for Computing Machinery, New York, pp. 158–163 (2013)
Daneshfar, F., Kabudian, S.J.: Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm. Multimed. Tools Appl. 79(1), 1261–1289 (2020)
Article Google Scholar
Ntalampiras, S.: Speech emotion recognition via learning analogies. Pattern Recogn. Lett. 144, 21–26 (2021)
Article Google Scholar
Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)
Article Google Scholar
Mustaqeem, M., Sajjad, M., Kwon, S.: Clustering based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access (2020)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61901227), and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 19KJB510049).

Author information

Authors and Affiliations

College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, 210003, Jiangsu, China
Linhui Sun, Yiqing Huang, Qiu Li & Pingan Li

Authors

Linhui Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yiqing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qiu Li
View author publications
You can also search for this author in PubMed Google Scholar
Pingan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linhui Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, L., Huang, Y., Li, Q. et al. Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm. SIViP 16, 1253–1261 (2022). https://doi.org/10.1007/s11760-021-02076-0

Download citation

Received: 30 December 2020
Revised: 24 October 2021
Accepted: 27 October 2021
Published: 12 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11760-021-02076-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Facial emotion recognition using convolutional neural networks (FERC)

Feature dimensionality reduction: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-classification speech emotion recognition based on two-stage bottleneck features selection and MCJD algorithm

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Facial emotion recognition using convolutional neural networks (FERC)

Feature dimensionality reduction: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation