skip to main content
10.1145/3633637.3633638acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccprConference Proceedingsconference-collections
research-article

SWACL: Sentimental Words Aware Contrastive Learning for Multimodal Sentiment Analysis

Published: 28 February 2024 Publication History

Abstract

Multimodal Sentiment Analysis (MSA) aims to predict the emotional polarity of multiple modalities, such as text, video, and audio. Previous studies have focused extensively on fusing multimodal features while ignoring the value of implicit textual knowledge. This implicit knowledge within the text can be incorporated into a multimodal fusion network to improve the simultaneous representation of text, video, and auditory modalities, thereby enhancing the prediction performance of MSA. In this paper, we propose a sentimental words aware cross-modal contrastive learning strategy for multimodal sentiment analysis. It is intended to guide the network to obtain sentimental and common-sense knowledge from the text so that it can be fused with multiple modalities to improve the final representation of multimodal features. We conduct extensive experiments on the CMU-MOSI and CMU-MOSEI public datasets. The experimental results demonstrate the efficacy of our approach in comparison to baseline models corresponding to different fusion techniques.

References

[1]
Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423–443.
[2]
Adrien Bardes, Jean Ponce, and Yann LeCun. 2022. Variance-invariance-covariance regularization for self-supervised learning. ICLR, Vicreg (2022).
[3]
Erik Cambria, Qian Liu, Sergio Decherchi, Frank Xing, and Kenneth Kwok. 2022. SenticNet 7: A commonsense-based neurosymbolic AI framework for explainable sentiment analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference. 3829–3839.
[4]
Minping Chen and Xia Li. 2020. Swafn: Sentimental words aware fusion network for multimodal sentiment analysis. In Proceedings of the 28th international conference on computational linguistics. 1067–1077.
[5]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.
[6]
Gilles Degottex, John Kane, Thomas Drugman, Tuomo Raitio, and Stefan Scherer. 2014. COVAREP—A collaborative voice analysis repository for speech technologies. In 2014 ieee international conference on acoustics, speech and signal processing (icassp). 960–964.
[7]
Huan Deng, Zhenguo Yang, Tianyong Hao, Qing Li, and Wenyin Liu. 2022. Multimodal Affective Computing with Dense Fusion Transformer for Inter-and Intra-modality Interactions. IEEE Transactions on Multimedia (2022).
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.
[9]
Wei Han, Hui Chen, and Soujanya Poria. 2021. Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 9180–9192.
[10]
Devamanyu Hazarika, Roger Zimmermann, and Soujanya Poria. 2020. Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia. 1122–1131.
[11]
Guimin Hu, Ting-En Lin, Yi Zhao, Guangming Lu, Yuchuan Wu, and Yongbin Li. 2022. UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition. In Conference on Empirical Methods in Natural Language Processing.
[12]
Angeliki Lazaridou, Nghia The Pham, and Marco Baroni. 2015. Combining Language and Vision with a Multimodal Skip-gram Model. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 153–163.
[13]
Xia Li and Minping Chen. 2020. Multimodal sentiment analysis with multi-perspective fusion network focusing on sense attentive language. In Chinese Computational Linguistics: 19th China National Conference, CCL 2020. 359–373.
[14]
Ziming Li, Yan Zhou, Weibo Zhang, Yaxin Liu, Chuanpeng Yang, Zheng Lian, and Songlin Hu. 2022. AMOA: Global Acoustic Feature Enhanced Modal-Order-Aware Network for Multimodal Sentiment Analysis. In Proceedings of the 29th International Conference on Computational Linguistics. 7136–7146.
[15]
Paul Pu Liang, Ziyin Liu, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Multimodal Language Analysis with Recurrent Multistage Fusion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 150–161.
[16]
Zijie Lin, Bin Liang, Yunfei Long, Yixue Dang, Min Yang, Min Zhang, and Ruifeng Xu. 2022. Modeling Intra-and Inter-Modal Relations: Hierarchical Graph Contrastive Learning for Multimodal Sentiment Analysis. In Proceedings of the 29th International Conference on Computational Linguistics. 7124–7135.
[17]
Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2247–2256.
[18]
Sijie Mai, Haifeng Hu, and Songlong Xing. 2019. Divide, conquer and combine: Hierarchical feature fusion network with local and global perspectives for multimodal affective computing. In Proceedings of the 57th annual meeting of the association for computational linguistics. 481–492.
[19]
Sijie Mai, Ying Zeng, Shuangjia Zheng, and Haifeng Hu. 2022. Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis. IEEE Transactions on Affective Computing (2022).
[20]
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11). 689–696.
[21]
Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion. Information fusion 37 (2017), 98–125.
[22]
Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. 2020. Integrating multimodal information in large pretrained transformers. In Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2020. 2359.
[23]
Erika L Rosenberg and Paul Ekman. 2020. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press.
[24]
Hao Sun, Hongyi Wang, Jiaqing Liu, Yen-Wei Chen, and Lanfen Lin. 2022. CubeMLP: An MLP-based model for multimodal sentiment analysis and depression estimation. In Proceedings of the 30th ACM International Conference on Multimedia. 3722–3729.
[25]
Zhongkai Sun, Prathusha Sarma, William Sethares, and Yingyu Liang. 2020. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8992–8999.
[26]
Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2019. 6558.
[27]
Sirisha Velampalli, Chandrashekar Muniyappa, and Ashutosh Saxena. 2022. Performance Evaluation of Sentiment Analysis on Text and Emoji Data Using End-to-End, Transfer Learning, Distributed and Explainable AI Models. Journal of Advances in Information Technology (2022).
[28]
Wenmeng Yu, Hua Xu, Yuan Ziqi, and Wu Jiele. 2021. Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis. In Proceedings of the AAAI Conference on Artificial Intelligence.
[29]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1103–1114.
[30]
Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. 2016. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intelligent Systems 31, 6 (2016), 82–88.
[31]
AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2236–2246.
[32]
Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. 2021. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning. PMLR, 12310–12320.
[33]
Xianbing Zhao, Yinxin Chen, Sicen Liu, and Buzhou Tang. 2022. Shared-Private Memory Networks for Multimodal Sentiment Analysis. IEEE Transactions on Affective Computing (2022).
[34]
Yongqiang Zheng, Xia Li, and Jian-Yun Nie. 2023. Store, share and transfer: Learning and updating sentiment knowledge for aspect-based sentiment analysis. Information Sciences 635 (2023), 151–168.

Index Terms

  1. SWACL: Sentimental Words Aware Contrastive Learning for Multimodal Sentiment Analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICCPR '23: Proceedings of the 2023 12th International Conference on Computing and Pattern Recognition
      October 2023
      589 pages
      ISBN:9798400707988
      DOI:10.1145/3633637
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 February 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Cross-modal contrastive learning
      2. Multimodal sentiment analysis
      3. Sentimental words aware

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICCPR 2023

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 54
        Total Downloads
      • Downloads (Last 12 months)54
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media