Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition

Zhao, Yan; Wang, Jincen; Ye, Ru; Zong, Yuan; Zheng, Wenming; Zhao, Li

doi:10.21437/Interspeech.2022-679

Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition

Yan Zhao, Jincen Wang, Ru Ye, Yuan Zong, Wenming Zheng, Li Zhao

In this paper, we focus on the research of cross-corpus speech emotion recognition (SER), in which the training (source) and testing (target) speech samples come from different corpora leading to a feature distribution gap between them. To solve this problem, we propose a simple yet effective method called deep transductive transfer regression network (DTTRN). The basic idea of DTTRN is to learn a corpus invariant deep neural network to bridge the source and target speech samples and their label information. Following this idea, we make use of a transductive learning way to enforce a deep regressor to build the relationship between the features and emotional labels jointly in both speech corpora. Meanwhile, we also design an emotion guided regularization term for learning DTTRN by aligning source and target speech samples feature distributions from three different scales. Thus, the DTTRN only absorbing the label information provided by source speech samples is able to correctly predict the emotions of the target ones. To evaluate DTTRN, we conduct extensive cross-corpus SER experiments on EmoDB, CASIA, and eNTERFACE corpora. Experimental results show the superior performance of our DTTRN over recent state-of-the-art deep transfer learning methods in dealing with the cross-corpus SER tasks.

doi: 10.21437/Interspeech.2022-679

Cite as: Zhao, Y., Wang, J., Ye, R., Zong, Y., Zheng, W., Zhao, L. (2022) Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition. Proc. Interspeech 2022, 371-375, doi: 10.21437/Interspeech.2022-679

@inproceedings{zhao22h_interspeech,
  author={Yan Zhao and Jincen Wang and Ru Ye and Yuan Zong and Wenming Zheng and Li Zhao},
  title={{Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={371--375},
  doi={10.21437/Interspeech.2022-679}
}