ISCA Archive Interspeech 2022
ISCA Archive Interspeech 2022

Exploiting Fine-tuning of Self-supervised Learning Models for Improving Bi-modal Sentiment Analysis and Emotion Recognition

Wei Yang, Satoru Fukayama, Panikos Heracleous, Jun Ogata

Speech-based multimodal affective computing has recently attracted significant research attention. Previous experimental results have shown that the audio-only approach exhibits inferior performance than the text-only approach in sentiment analysis and emotion recognition tasks. In this paper, we propose a new strategy to improve the performance of uni-modal and bi-modal affective computing systems via fine-tuning of two pre-trained self-supervised learning models (Text-RoBERTa and Speech-RoBERTa). We fine-tune the models on sentiment analysis and emotion recognition tasks using a shallow architecture, and apply crossmodal attention fusion to the models for further learning and final prediction or classification. We evaluate our proposed method on the CMU-MOSI, CMU-MOSEI and IEMOCAP datasets. The experimental results demonstrate that our approach exhibits superior performance for all benchmarks compared to existing state-of-the-art results, establishing the effectiveness of the proposed method.


doi: 10.21437/Interspeech.2022-10354

Cite as: Yang, W., Fukayama, S., Heracleous, P., Ogata, J. (2022) Exploiting Fine-tuning of Self-supervised Learning Models for Improving Bi-modal Sentiment Analysis and Emotion Recognition. Proc. Interspeech 2022, 1998-2002, doi: 10.21437/Interspeech.2022-10354

@inproceedings{yang22q_interspeech,
  author={Wei Yang and Satoru Fukayama and Panikos Heracleous and Jun Ogata},
  title={{Exploiting Fine-tuning of Self-supervised Learning Models for Improving Bi-modal Sentiment Analysis and Emotion Recognition}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={1998--2002},
  doi={10.21437/Interspeech.2022-10354}
}