Abstract:
Modeling intra-modal and cross-modal interactions poses significant challenges in multimodal sentiment analysis. Currently, graph-based methods like HGraph-CL achieve pro...Show MoreMetadata
Abstract:
Modeling intra-modal and cross-modal interactions poses significant challenges in multimodal sentiment analysis. Currently, graph-based methods like HGraph-CL achieve promising performance, which rely on two different levels of graph contrastive learning within and between modalities to explore sentiment correlations. However, HGraph-CL still faces the following drawbacks in graph construction: 1) nodes of the graph are represented at the frame level, only containing low-level information, neglecting the correlations among high-level semantics; 2) edges of the graph are based on the fixed dependency relations between words in the text sequence and the adjacent relations between frame-level nodes in the non-verbal sequences, failing to effectively capture implicit and long-distance correlations. To this end, this letter introduces capsule networks to construct high-level semantic nodes in a graph, uncovering deep sentimental structures. Furthermore, the learnable adjacency matrices are employed to construct edges of graph, thus adaptively learning the relations between nodes. Experimental results on several benchmark datasets for multimodal sentiment analysis demonstrate the effectiveness of the proposed method.
Published in: IEEE Signal Processing Letters ( Volume: 31)