research-article

Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis

Authors:
Yazhou Zhang

Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing University and Articial Intelligence Laboratory, China Mobile Communication Group Tianjin Co., and Zhengzhou University of Light Industry, China

Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing University and Articial Intelligence Laboratory, China Mobile Communication Group Tianjin Co., and Zhengzhou University of Light Industry, China

0000-0002-5699-0176
View Profile

,
Yang Yu

Software Engineering College, Zhengzhou University of Light Industry, China

Software Engineering College, Zhengzhou University of Light Industry, China

0009-0000-4607-8661
View Profile

,
Mengyao Wang

Software Engineering College, Zhengzhou University of Light Industry, China

Software Engineering College, Zhengzhou University of Light Industry, China

0009-0003-3810-8164
View Profile

,
Min Huang

Software Engineering College, Zhengzhou University of Light Industry, China

Software Engineering College, Zhengzhou University of Light Industry, China

0000-0003-3462-8931
View Profile

,
M. Shamim Hossain

Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Saudi Arabia

Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Saudi Arabia

0000-0001-5906-9422
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20 Issue 5Article No.: 125pp 1–17https://doi.org/10.1145/3635311

Published:11 January 2024Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Sentiment and sarcasm are intimate and complex, as sarcasm often deliberately elicits an emotional response in order to achieve its specific purpose. Current challenges in multi-modal sentiment and sarcasm joint detection mainly include multi-modal representation fusion and the modeling of the intrinsic relationship between sentiment and sarcasm. To address these challenges, we propose a single-input stream self-adaptive representation learning model (SRLM) for sentiment and sarcasm joint recognition. Specifically, we divide the image into blocks to learn its serialized features and fuse textual feature as input to the target model. Then, we introduce an adaptive representation learning network using a gated network approach for sarcasm and sentiment classification. In this framework, each task is equipped with its dedicated expert network responsible for learning task-specific information, while the shared expert knowledge is acquired and weighted through the gating network. Finally, comprehensive experiments conducted on two publicly available datasets, namely Memotion and MUStARD, demonstrate the effectiveness of the proposed model when compared to state-of-the-art baselines. The results reveal a notable improvement on the performance of sentiment and sarcasm tasks.

REFERENCES

[1] Farias DI Hernández and Rosso Paolo. 2017. Irony, sarcasm, and sentiment analysis. Sentiment Analysis in Social Networks. Elsevier, 113–128.Google ScholarCross Ref
[2] Saddik Abdulmotaleb El, Fischer Stefan, and Steinmetz Ralf. 2001. Reusable multimedia content in Web based learning systems. IEEE MultiMedia 8, 3 (2001), 30–38.Google ScholarDigital Library
[3] Moin Anam, Aadil Farhan, Ali Zeeshan, and Kang Dongwann. 2023. Emotion recognition framework using multiple modalities for an effective human–computer interaction. The Journal of Supercomputing 79, 8 (2023), 9320–9349.Google ScholarDigital Library
[4] Zhang Xiaoheng and Li Yang. 2023. A cross-modality context fusion and semantic refinement network for emotion recognition in conversation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 13099–13110.Google ScholarCross Ref
[5] Hossain M. Shamim and Muhammad Ghulam. 2019. Emotion recognition using deep learning approach from audio-visual emotional big data. Information Fusion 49 (2019), 69–78.Google ScholarDigital Library
[6] Yue Deng, Wenxuan Zhang, Sinno Jialin Pan, and Lidong Bing. 2023. Bidirectional generative framework for cross-domain aspect-based sentiment analysis. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 12272–12285.Google Scholar
[7] Liu Yi, Zheng Zengwei, Zhou Binbin, Ma Jianhua, Sun Lin, and Xia Ruichen. 2022. Multimodal sarcasm detection based on multimodal sentiment co-training. In Proceedings of the 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta). IEEE, 508–515.Google ScholarCross Ref
[8] Wen Changsong, Jia Guoli, and Yang Jufeng. 2023. DIP: Dual incongruity perceiving network for sarcasm detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2540–2550.Google ScholarCross Ref
[9] Riloff Ellen, Qadir Ashequl, Surve Prafulla, Silva Lalindra De, Gilbert Nathan, and Huang Ruihong. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 704–714.Google Scholar
[10] Zhao Haojie, Wang Xiao, Wang Dong, Lu Huchuan, and Ruan Xiang. 2023. Transformer vision-language tracking via proxy token guided cross-modal fusion. Pattern Recognition Letters 168 (2023), 10–16.Google ScholarDigital Library
[11] Yazhou Zhang, Yang Yu, Dongming Zhao, Zuhe Li, Bo Wang, Yuexian Hou, Prayag Tiwari, and Jing Qin. 2023. Learning multi-task commonness and uniqueness for multi-modal sarcasm detection and sentiment analysis in conversation. IEEE Transactions on Artificial Intelligence, 1, 1 (2023), 1–13.Google Scholar
[12] Md. Shad Akhtar, Dushyant Singh Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, and Pushpak Bhat-tacharyya. 2019. Multi-task learning for multi-modal emotion recognition and sentiment analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. 370–379.Google Scholar
[13] Yang Bo, Wu Lijun, Zhu Jinhua, Shao Bo, Lin Xiaola, and Liu Tie-Yan. 2022. Multimodal sentiment analysis with two-phase multi-task learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 2015–2024.Google ScholarDigital Library
[14] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth \(16\times 16\) words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations (ICLR’21).Google Scholar
[15] Albraikan Amani, Tobón Diana P., and Saddik Abdulmotaleb El. 2018. Toward user-independent emotion recognition using physiological signals. IEEE sensors Journal 19, 19 (2018), 8402–8412.Google ScholarCross Ref
[16] Albraikan Amani, Hafidh Basim, and Saddik Abdulmotaleb El. 2018. iAware: A real-time emotional biofeedback system based on physiological signals. IEEE Access 6 (2018), 78780–78789.Google ScholarCross Ref
[17] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st International Conference Advances in Neural Information Processing Systems 30 (2017).Google Scholar
[18] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. 4171–4186.Google Scholar
[19] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
[20] He Jiaxuan and Hu Haifeng. 2021. MF-BERT: Multimodal fusion in pre-trained BERT for sentiment analysis. IEEE Signal Processing Letters 29 (2021), 454–458.Google ScholarCross Ref
[21] Kumar Ayush and Vepa Jithendra. 2020. Gated mechanism for attention based multi modal sentiment analysis. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. 4477–4481. DOI:Google ScholarCross Ref
[22] Ghosal Deepanway, Akhtar Md Shad, Chauhan Dushyant, Poria Soujanya, Ekbal Asif, and Bhattacharyya Pushpak. 2018. Contextual inter-modal attention for multi-modal sentiment analysis. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3454–3466.Google ScholarCross Ref
[23] Zhang Qiongan, Shi Lei, Liu Peiyu, Zhu Zhenfang, and Xu Liancheng. 2023. ICDN: Integrating consistency and difference networks by transformer for multimodal sentiment analysis. Applied Intelligence 53, 12 (2023), 16332–16345.Google ScholarDigital Library
[24] Peng Junjie, Wu Ting, Zhang Wenqiang, Cheng Feng, Tan Shuhua, Yi Fen, and Huang Yansong. 2023. A fine-grained modal label-based multi-stage network for multimodal sentiment analysis. Expert Systems with Applications 221 (2023), 119721.Google ScholarDigital Library
[25] Kim Kyeonghun and Park Sanghyun. 2023. AOBERT: All-modalities-in-One BERT for multimodal sentiment analysis. Information Fusion 92 (2023), 37–45.Google ScholarDigital Library
[26] Schifanella Rossano, Juan Paloma De, Tetreault Joel, and Cao Liangliang. 2016. Detecting sarcasm in multimodal social platforms. In Proceedings of the 24th ACM International Conference on Multimedia. 1136–1145.Google ScholarDigital Library
[27] Wang Xinyu, Sun Xiaowen, Yang Tan, and Wang Hongbo. 2020. Building a bridge: A method for image-text sarcasm detection without pretraining on image-text data. In Proceedings of the 1st International Workshop on Natural Language Processing Beyond Text. 19–29.Google ScholarCross Ref
[28] Ding Ning, Tian Sheng-wei, and Yu Long. 2022. A multimodal fusion method for sarcasm detection based on late fusion. Multimedia Tools and Applications 81, 6 (2022), 8597–8616.Google ScholarDigital Library
[29] Cai Yitao, Cai Huiyu, and Wan Xiaojun. 2019. Multi-modal sarcasm detection in twitter with hierarchical fusion model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2506–2515.Google ScholarCross Ref
[30] Qiao Yang, Jing Liqiang, Song Xuemeng, Chen Xiaolin, Zhu Lei, and Nie Liqiang. 2023. Mutual-enhanced incongruity learning network for multi-modal sarcasm detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 9507–9515.Google ScholarDigital Library
[31] Lu Xinkai, Qian Ying, Yang Yan, and Pang Wenrao. 2022. Sarcasm detection of dual multimodal contrastive attention networks. In Proceedings of the 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta). IEEE, 1455–1460.Google ScholarCross Ref
[32] Liang Bin, Lou Chenwei, Li Xiang, Yang Min, Gui Lin, He Yulan, Pei Wenjie, and Xu Ruifeng. 2022. Multi-modal sarcasm detection via cross-modal graph convolutional network. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. Association for Computational Linguistics, 1767–1777.Google ScholarCross Ref
[33] Hui Liu, Wenya Wang, and Haoliang Li. 2022. Towards multi-modal sarcasm detection via hierarchical congruity modeling with knowledge enhancement. In 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP’22). 4995–5006.Google Scholar
[34] Tan Yue, Wang Bo, Liu Anqi, Zhao Dongming, Huang Kun, He Ruifang, and Hou Yuexian. 2023. Guiding dialogue agents to complex semantic targets by dynamically completing knowledge graph. In Findings of the Association for Computational Linguistics: ACL 2023. 6506–6518.Google Scholar
[35] Majumder Navonil, Poria Soujanya, Peng Haiyun, Chhaya Niyati, Cambria Erik, and Gelbukh Alexander. 2019. Sentiment and sarcasm classification with multitask learning. IEEE Intelligent Systems 34, 3 (2019), 38–43.Google ScholarCross Ref
[36] Vitman Oxana, Kostiuk Yevhen, Sidorov Grigori, and Gelbukh Alexander. 2023. Sarcasm detection framework using context, emotion and sentiment features. Expert Systems with Applications 234 (2023), 121068.Google ScholarDigital Library
[37] Yin Chunyan, Chen Yongheng, and Zuo Wanli. 2021. Multi-task deep neural networks for joint sarcasm detection and sentiment analysis. Pattern Recognition and Image Analysis 31 (2021), 103–108.Google ScholarDigital Library
[38] Chauhan Dushyant Singh, Dhanush S.R., Ekbal Asif, and Bhattacharyya Pushpak. 2020. Sentiment and emotion help sarcasm? A multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4351–4360.Google ScholarCross Ref
[39] Yaochen Liu, Yazhou Zhang, and Dawei Song. 2023. A quantum probability driven framework for joint multi-modal sarcasm, sentiment and emotion analysis. IEEE Transactions on Affective Computing 1 (2023), 1–15.Google Scholar
[40] Hossain M. Shamim and Muhammad Ghulam. 2018. Emotion-aware connected healthcare big data towards 5g. IEEE Internet of Things Journal 5, 4 (2018), 2399–2406.Google ScholarCross Ref
[41] Sharma Chhavi, Paka William, Scott Deepesh Bhageria, Das Amitava, Poria Soujanya, Chakraborty Tanmoy, and Gambäck Björn. 2020. Task report: Memotion analysis 1.0@ semeval 2020: The visuo-lingual metaphor. In Proceedings of the 14th International Workshop on Semantic Evaluation, Sep. Association for Computational Linguistics.Google Scholar
[42] Cramer Aurora Linh, Wu Ho-Hsiang, Salamon Justin, and Bello Juan Pablo. 2019. Look, listen, and learn more: Design choices for deep audio embeddings. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3852–3856.Google ScholarCross Ref
[43] Potamias Rolandos Alexandros, Siolas Georgios, and Stafylopatis Andreas-Georgios. 2020. A transformer-based approach to irony and sarcasm detection. Neural Computing and Applications 32 (2020), 17309–17320.Google ScholarDigital Library
[44] Tan Mingxing and Le Quoc. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, 6105–6114.Google Scholar
[45] Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1103–1114.Google Scholar
[46] Pramanick Shraman, Roy Aniket, and Patel Vishal M.. 2022. Multimodal learning using optimal transport for sarcasm and humor detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3930–3940.Google ScholarCross Ref
[47] Tsai Yao-Hung Hubert, Bai Shaojie, Liang Paul Pu, Kolter J. Zico, Morency Louis-Philippe, and Salakhutdinov Ruslan. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the Conference Association for Computational Linguistics. Meeting, Vol. 2019. NIH Public Access, 6558.Google ScholarCross Ref
[48] Wei Yinwei, Wang Xiang, Nie Liqiang, He Xiangnan, Hong Richang, and Chua Tat-Seng. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM International Conference on Multimedia. 1437–1445.Google ScholarDigital Library
[49] Yazhou Zhang, Ao Jia, Bo Wang, Peng Zhang, Dongming Zhao, Pu Li, Yuexian Hou, Xiaojia Jin, Dawei Song, and Jing Qin. 2023. M3gat: A multi-modal, multi-task interactive graph attention network for conversational sentiment analysis and emotion recognition. ACM Transactions on Information Systems, 42, 1 (2023), 1–32.Google Scholar
[50] Chaturvedi Iti, Su Chit Lin, and Welsch Roy E.. 2021. Fuzzy aggregated topology evolution for cognitive multi-tasks. Cognitive Computation 13 (2021), 96–107.Google ScholarCross Ref
[51] George-Alexandru Vlad, George-Eduard Zaharia, Dumitru-Clementin Cercel, Costin Chiru, and Stefan Trausan Matu. 2020. Upb at semeval-2020 task 8: Joint textual and visual modeling in a multi-task learning architecture for memotion analysis. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. 1208–1214.Google Scholar

Index Terms

Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
    2. Natural language processing
2. Networks
  1. Network architectures
    1. Network design principles

Recommendations

Multi-modal Sentiment Feature Learning Based on Sentiment Signal
ChineseCSCW '17: Proceedings of the 12th Chinese Conference on Computer Supported Cooperative Work and Social Computing

The multi-modal characteristic of social media content (e.g. texts and images) significantly challenges traditional text-based sentiment analysis approaches, multi-modal sentiment analysis gets great theoretical value for understanding and analysis of ...
Read More
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Read More
Sentiment Analysis and Sarcasm Detection using Deep Multi-Task Learning
Abstract
Social media platforms such as Twitter and Facebook have become popular channels for people to record and express their feelings, opinions, and feedback in the last decades. With proper extraction techniques such as sentiment analysis, this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 5
May 2024
650 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3613634
Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 January 2024
- Online AM: 5 December 2023
- Accepted: 27 November 2023
- Revised: 23 November 2023
- Received: 27 September 2023
Published in tomm Volume 20, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Multi-modal sentiment analysis
sarcasm detection
representation learning
multi-task learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 285
  Total Downloads
- Downloads (Last 12 months)285
- Downloads (Last 6 weeks)71
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Multi-modal Sentiment Feature Learning Based on Sentiment Signal

Joint sentiment/topic model for sentiment analysis

Sentiment Analysis and Sarcasm Detection using Deep Multi-Task Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Multi-modal Sentiment Feature Learning Based on Sentiment Signal

Joint sentiment/topic model for sentiment analysis

Sentiment Analysis and Sarcasm Detection using Deep Multi-Task Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media