research-article

SWACL: Sentimental Words Aware Contrastive Learning for Multimodal Sentiment Analysis

Authors:

Xia LiAuthors Info & Claims

ICCPR '23: Proceedings of the 2023 12th International Conference on Computing and Pattern Recognition

Pages 1 - 8

https://doi.org/10.1145/3633637.3633638

Published: 28 February 2024 Publication History

Abstract

Multimodal Sentiment Analysis (MSA) aims to predict the emotional polarity of multiple modalities, such as text, video, and audio. Previous studies have focused extensively on fusing multimodal features while ignoring the value of implicit textual knowledge. This implicit knowledge within the text can be incorporated into a multimodal fusion network to improve the simultaneous representation of text, video, and auditory modalities, thereby enhancing the prediction performance of MSA. In this paper, we propose a sentimental words aware cross-modal contrastive learning strategy for multimodal sentiment analysis. It is intended to guide the network to obtain sentimental and common-sense knowledge from the text so that it can be fused with multiple modalities to improve the final representation of multimodal features. We conduct extensive experiments on the CMU-MOSI and CMU-MOSEI public datasets. The experimental results demonstrate the efficacy of our approach in comparison to baseline models corresponding to different fusion techniques.

References

[1]

Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423–443.

Digital Library

[2]

Adrien Bardes, Jean Ponce, and Yann LeCun. 2022. Variance-invariance-covariance regularization for self-supervised learning. ICLR, Vicreg (2022).

[3]

Erik Cambria, Qian Liu, Sergio Decherchi, Frank Xing, and Kenneth Kwok. 2022. SenticNet 7: A commonsense-based neurosymbolic AI framework for explainable sentiment analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference. 3829–3839.

[4]

Minping Chen and Xia Li. 2020. Swafn: Sentimental words aware fusion network for multimodal sentiment analysis. In Proceedings of the 28th international conference on computational linguistics. 1067–1077.

[5]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.

[6]

Gilles Degottex, John Kane, Thomas Drugman, Tuomo Raitio, and Stefan Scherer. 2014. COVAREP—A collaborative voice analysis repository for speech technologies. In 2014 ieee international conference on acoustics, speech and signal processing (icassp). 960–964.

[7]

Huan Deng, Zhenguo Yang, Tianyong Hao, Qing Li, and Wenyin Liu. 2022. Multimodal Affective Computing with Dense Fusion Transformer for Inter-and Intra-modality Interactions. IEEE Transactions on Multimedia (2022).

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.

[9]

Wei Han, Hui Chen, and Soujanya Poria. 2021. Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 9180–9192.

[10]

Devamanyu Hazarika, Roger Zimmermann, and Soujanya Poria. 2020. Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia. 1122–1131.

Digital Library

[11]

Guimin Hu, Ting-En Lin, Yi Zhao, Guangming Lu, Yuchuan Wu, and Yongbin Li. 2022. UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition. In Conference on Empirical Methods in Natural Language Processing.

[12]

Angeliki Lazaridou, Nghia The Pham, and Marco Baroni. 2015. Combining Language and Vision with a Multimodal Skip-gram Model. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 153–163.

[13]

Xia Li and Minping Chen. 2020. Multimodal sentiment analysis with multi-perspective fusion network focusing on sense attentive language. In Chinese Computational Linguistics: 19th China National Conference, CCL 2020. 359–373.

Digital Library

[14]

Ziming Li, Yan Zhou, Weibo Zhang, Yaxin Liu, Chuanpeng Yang, Zheng Lian, and Songlin Hu. 2022. AMOA: Global Acoustic Feature Enhanced Modal-Order-Aware Network for Multimodal Sentiment Analysis. In Proceedings of the 29th International Conference on Computational Linguistics. 7136–7146.

[15]

Paul Pu Liang, Ziyin Liu, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Multimodal Language Analysis with Recurrent Multistage Fusion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 150–161.

[16]

Zijie Lin, Bin Liang, Yunfei Long, Yixue Dang, Min Yang, Min Zhang, and Ruifeng Xu. 2022. Modeling Intra-and Inter-Modal Relations: Hierarchical Graph Contrastive Learning for Multimodal Sentiment Analysis. In Proceedings of the 29th International Conference on Computational Linguistics. 7124–7135.

[17]

Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2247–2256.

[18]

Sijie Mai, Haifeng Hu, and Songlong Xing. 2019. Divide, conquer and combine: Hierarchical feature fusion network with local and global perspectives for multimodal affective computing. In Proceedings of the 57th annual meeting of the association for computational linguistics. 481–492.

[19]

Sijie Mai, Ying Zeng, Shuangjia Zheng, and Haifeng Hu. 2022. Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis. IEEE Transactions on Affective Computing (2022).

[20]

Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11). 689–696.

Digital Library

[21]

Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion. Information fusion 37 (2017), 98–125.

[22]

Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. 2020. Integrating multimodal information in large pretrained transformers. In Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2020. 2359.

[23]

Erika L Rosenberg and Paul Ekman. 2020. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press.

[24]

Hao Sun, Hongyi Wang, Jiaqing Liu, Yen-Wei Chen, and Lanfen Lin. 2022. CubeMLP: An MLP-based model for multimodal sentiment analysis and depression estimation. In Proceedings of the 30th ACM International Conference on Multimedia. 3722–3729.

Digital Library

[25]

Zhongkai Sun, Prathusha Sarma, William Sethares, and Yingyu Liang. 2020. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8992–8999.

[26]

Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2019. 6558.

[27]

Sirisha Velampalli, Chandrashekar Muniyappa, and Ashutosh Saxena. 2022. Performance Evaluation of Sentiment Analysis on Text and Emoji Data Using End-to-End, Transfer Learning, Distributed and Explainable AI Models. Journal of Advances in Information Technology (2022).

[28]

Wenmeng Yu, Hua Xu, Yuan Ziqi, and Wu Jiele. 2021. Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis. In Proceedings of the AAAI Conference on Artificial Intelligence.

[29]

Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1103–1114.

[30]

Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. 2016. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intelligent Systems 31, 6 (2016), 82–88.

Digital Library

[31]

AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2236–2246.

[32]

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. 2021. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning. PMLR, 12310–12320.

[33]

Xianbing Zhao, Yinxin Chen, Sicen Liu, and Buzhou Tang. 2022. Shared-Private Memory Networks for Multimodal Sentiment Analysis. IEEE Transactions on Affective Computing (2022).

[34]

Yongqiang Zheng, Xia Li, and Jian-Yun Nie. 2023. Store, share and transfer: Learning and updating sentiment knowledge for aspect-based sentiment analysis. Information Sciences 635 (2023), 151–168.

Digital Library

Index Terms

SWACL: Sentimental Words Aware Contrastive Learning for Multimodal Sentiment Analysis
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Sentiment analysis
  2. Information systems applications
    1. Multimedia information systems

Recommendations

A Sentimental Prompt Framework with Visual Text Encoder for Multimodal Sentiment Analysis
ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Recently, multimodal sentiment analysis from social media posts has received increasing attention, as it can effectively improve single-modality-based sentiment analysis by leveraging the complementary information between text and images. Despite their ...
CSMF-SPC: Multimodal Sentiment Analysis Model with Effective Context Semantic Modality Fusion and Sentiment Polarity Correction
Abstract
Multimodal sentiment analysis focuses on the fusion of multiple modalities. However, modality representation learning is a key step for better modality fusion, so how to fully learn the sentiment information of non-text modalities is a problem ...
Multimodal Social Media Sentiment Analysis Based on Cross-Modal Hierarchical Attention Fusion
Artificial Intelligence and Mobile Services – AIMS 2021
Abstract
With the diversification of data forms on social media, more and more multimodal information mixed with image and text replaces the traditional single text description. Compared with single-modal data, multimodal data can more fully express people’...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCPR '23: Proceedings of the 2023 12th International Conference on Computing and Pattern Recognition

October 2023

589 pages

ISBN:9798400707988

DOI:10.1145/3633637

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCPR 2023

ICCPR 2023: 2023 12th International Conference on Computing and Pattern Recognition

October 27 - 29, 2023

Qingdao, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
54
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)6

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten