Abstract:
Multimodal sentiment analysis is an active subfield of natural language processing. It aims to extract and integrate semantic information gathered from multiple modalitie...Show MoreMetadata
Abstract:
Multimodal sentiment analysis is an active subfield of natural language processing. It aims to extract and integrate semantic information gathered from multiple modalities to identify the sentiments expressed by users. Indeed, the complementary and heterogeneous information between modalities influences the prediction results. Recent research proposals employ a single neural network to obtain mutually independent representations of all modalities. However, a problem that may limit previous work from reaching a higher level is that this does not consider the modal heterogeneity problem between different modalities. This, in turn, may lead to the presence of additional noise in the representations before modal fusion. For this reason, we propose a new framework, MICS, which adopts a suitable strategy for each modality and provides a better representation for fusion. Also, we design a multimodal comparative learning interaction module for the fusion phase, which plays a crucial role in the information interaction between modalities. Our extensive experiments on two publicly available and popular benchmarks, MOSI and MOSEI, show that MICS can better focus on the characteristics of different modal data and has significant advantages over previous complex baselines.
Date of Conference: 09-12 October 2022
Date Added to IEEE Xplore: 18 November 2022
ISBN Information: