DCRNNX: Dual-Channel Recurrent Neural Network with Xgboost for Emotion Identification Using Nonspeech Vocalizations

Liang, Xingwei; Zou, You; Xie, Tian; Zhou, Qi

doi:10.1007/978-3-031-23504-7_2

DCRNNX: Dual-Channel Recurrent Neural Network with Xgboost for Emotion Identification Using Nonspeech Vocalizations

Xingwei Liang¹⁰,
You Zou¹⁰,
Tian Xie¹¹ &
…
Qi Zhou¹¹

Conference paper
First Online: 16 December 2022

193 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13729))

Abstract

The human voice, especially nonspeech vocalizations, inherently convey emotions. However, existing efforts have ignored such emotional expressions for a long time. Based on this, we propose a Dual-channel Recurrent Neural Network with Xgboost (DCRNNX) to solve emotion recognition using nonspeech vocalizations. The DCRNNX mainly combines two Backbone models. The first model is a two-channel neural network model based on the Deep Neural Network (DNN) and Channel Recurrent Neural Network (CRNN). Channel 1 is constructed by CRNN, and the other model is constructed by Xgboost. Additionally, we employ a smoothing mechanism to integrate the outputs of the two classifiers to promote our DCRNNX. Compared with the baselines, DCRNNX combines not only multiple features but also combines multiple models, which ensures the generalization performance of DCRNNX. Experimental results show that our method achieves 45% and 42% UAR (Unweighted Average Recall), on the development dataset. After model fusion, DCRNNX achieves 46.89% UAR and 37.0% UAR on development and test datasets, respectively. The performance of our method on the development dataset is nearly 6% better than the baselines. Especially, there is a considerable gap between the performance of DCRNNX on the development and the test set. It may be the reason for the differences in emotional characteristics of the male and female voices.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Schuller, B.W., et al.: The ACM multimedia 2022 computational paralinguistics challenge: vocalisations, stuttering, activity, & mosquitos. In: Proceedings ACM Multimedia 2022, Lisbon, Portugal, ISCA, October 2022 (to appear)
Google Scholar
Yan, H., He, Q., Xie, W.: CRNN-CTC based mandarin keywords spotting. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7489–7493 (2020)
Google Scholar
Meftah, A.H., Mathkour, H., Kerrache, S., Alotaibi, Y.A.: Speaker identification in different emotional states in Arabic and English. IEEE Access 8, 60070–60083 (2020)
Article Google Scholar
Ma, X., Wu, Z., Jia, J., Xu, M., Meng, H., Cai, L.: Emotion recognition from variable-length speech segments using deep learning on spectrograms. 09, 3683–3687 (2018)
Google Scholar
Satt, A., Rozenberg, S., Hoory, R.: Efficient emotion recognition from speech using deep learning on spectrograms. 08, 1089–1093 (2017)
Google Scholar
Schmitt, M., Schuller, B.: Openxbow - introducing the Passau open-source crossmodal bag-of-words toolkit. J. Mach. Learn. Res. 18, 1–5 (2017)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile - the Munich versatile and fast open-source audio feature extractor. 1459–1462 (2010)
Google Scholar
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine (2014)
Google Scholar
Latif, S., Rana, R., Khalifa, S., Jurdak, R., Epps, J.: Direct modelling of speech emotion from raw speech (2019)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Article Google Scholar
Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584 (2015)
Google Scholar
Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Zafeiriou, S.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: IEEE International Conference on Acoustics (2016)
Google Scholar
Tzirakis, P., Zhang, J., Schuller, B.: End-to-end speech emotion recognition using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)
Google Scholar
Zhu, W., Li, X.: Speech emotion recognition with global-aware fusion on multi-scale feature representation (2022)
Google Scholar
Kim, J., Saurous, R.A.: Emotion recognition from human speech using temporal information and deep learning. In: InterSpeech 2018 (2018)
Google Scholar
Jian, H., Li, Y., Tao, J., Zheng, L.: Speech emotion recognition from variable-length inputs with triplet loss function. In: InterSpeech 2018 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. arXiv (2017)
Google Scholar
Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.P.: Multi-attention recurrent network for human communication comprehension. In: Proceedings of Conference on AAAI Artificial Intelligence, pp. 5642–5649 (2018)
Google Scholar
Luo, D., Zou, Y., Huang, D.: Investigation on joint representation learning for robust feature extraction in speech emotion recognition. In: InterSpeech 2018 (2018)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. JMLR.org (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Konka Corporation, Shenzhen, China
Xingwei Liang & You Zou
Harbin Institute of Technology (Shenzhen), Shenzhen, China
Tian Xie & Qi Zhou

Authors

Xingwei Liang
View author publications
You can also search for this author in PubMed Google Scholar
You Zou
View author publications
You can also search for this author in PubMed Google Scholar
Tian Xie
View author publications
You can also search for this author in PubMed Google Scholar
Qi Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xingwei Liang .

Editor information

Editors and Affiliations

Minzu University of China, Beijing, China
Xiuqin Pan
Hainan University, Haikou, China
Ting Jin
Kingdee International Software Group Co., Ltd., Shenzhen, China
Liang-Jie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, X., Zou, Y., Xie, T., Zhou, Q. (2022). DCRNNX: Dual-Channel Recurrent Neural Network with Xgboost for Emotion Identification Using Nonspeech Vocalizations. In: Pan, X., Jin, T., Zhang, LJ. (eds) Artificial Intelligence and Mobile Services – AIMS 2022. AIMS 2022. Lecture Notes in Computer Science, vol 13729. Springer, Cham. https://doi.org/10.1007/978-3-031-23504-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-23504-7_2
Published: 16 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23503-0
Online ISBN: 978-3-031-23504-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics