M<SUP>3</SUP>: MultiModal Masking Applied to Sentiment Analysis

Georgiou, Efthymios; Paraskevopoulos, Georgios; Potamianos, Alexandros

doi:10.21437/Interspeech.2021-1739

M³: MultiModal Masking Applied to Sentiment Analysis

Efthymios Georgiou, Georgios Paraskevopoulos, Alexandros Potamianos

A common issue when training multimodal architectures is that not all modalities contribute equally to the model’s prediction and the network tends to over-rely on the strongest modality. In this work, we present M³, a training procedure based on modality masking for deep multimodal architectures. During network training, we randomly select one modality and mask its features, forcing the model to make its prediction in the absence of this modality. This structured regularization allows the network to better exploit complementary information in input modalities. We implement M³ as a generic layer that can be integrated with any multimodal architecture. Our experiments show that M³ outperforms other masking schemes and improves performance for our strong baseline. We evaluate M³ for multimodal sentiment analysis on CMU-MOSEI, achieving results comparable to the state-of-the-art.

doi: 10.21437/Interspeech.2021-1739

Cite as: Georgiou, E., Paraskevopoulos, G., Potamianos, A. (2021) M³: MultiModal Masking Applied to Sentiment Analysis. Proc. Interspeech 2021, 2876-2880, doi: 10.21437/Interspeech.2021-1739

@inproceedings{georgiou21_interspeech,
  author={Efthymios Georgiou and Georgios Paraskevopoulos and Alexandros Potamianos},
  title={{M³: MultiModal Masking Applied to Sentiment Analysis}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2876--2880},
  doi={10.21437/Interspeech.2021-1739}
}

M3: MultiModal Masking Applied to Sentiment Analysis

Efthymios Georgiou, Georgios Paraskevopoulos, Alexandros Potamianos

M³: MultiModal Masking Applied to Sentiment Analysis