ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier

Huang-Cheng Chou, Chi-Chun Lee, Carlos Busso

Previous studies on speech emotion recognition (SER) with categorical emotions have often formulated the task as a single-label classification problem, where the emotions are considered orthogonal to each other. However, previous studies have indicated that emotions can co-occur, especially for more ambiguous emotional sentences (e.g., a mixture of happiness and surprise). Some studies have regarded SER problems as a multi-label task, predicting multiple emotional classes. However, this formulation does not leverage the relation between emotions during training, since emotions are assumed to be independent. This study explores the idea that emotional classes are not necessarily independent and its implications on training SER models. In particular, we calculate the frequency of co-occurring emotions from perceptual evaluations in the train set to generate a matrix with class-dependent penalties, punishing more mistakes between distant emotional classes. We integrate the penalization matrix into three existing label-learning approaches (hard-label, multi-label, and distribution-label learning) using the proposed modified loss. We train SER models using the penalty loss and commonly used cost functions for SER tasks. The evaluation of our proposed penalization matrix on the MSP-Podcast corpus shows important relative improvements in macro F1-score for hard-label learning (17.12%), multi-label learning (12.79%), and distribution-label learning (25.8%).


doi: 10.21437/Interspeech.2022-11041

Cite as: Chou, H.-C., Lee, C.-C., Busso, C. (2022) Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier. Proc. Interspeech 2022, 161-165, doi: 10.21437/Interspeech.2022-11041

@inproceedings{chou22_interspeech,
  author={Huang-Cheng Chou and Chi-Chun Lee and Carlos Busso},
  title={{Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={161--165},
  doi={10.21437/Interspeech.2022-11041}
}