skip to main content
10.1145/3664647.3681617acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

GRACE: GRadient-based Active Learning with Curriculum Enhancement for Multimodal Sentiment Analysis

Published: 28 October 2024 Publication History

Abstract

Multimodal sentiment analysis (MSA) aims to predict sentiment from text, audio, and visual data of videos. Existing works focus on designing fusion strategies or decoupling mechanisms, which suffer from low data utilization and a heavy reliance on large amounts of labeled data. However, acquiring large-scale annotations for multimodal sentiment analysis is extremely labor-intensive and costly. To address this challenge, we propose GRACE, a GRadient-based Active learning method with Curriculum Enhancement, designed for MSA under a multi-task learning framework. Our approach achieves annotation reduction by strategically selecting valuable samples from the unlabeled data pool while maintaining high-performance levels. Specifically, we introduce informativeness and representativeness criteria, calculated from gradient magnitudes and sample distances, to quantify the active value of unlabeled samples. Additionally, an easiness criterion is incorporated to avoid outliers, considering the relationship between modality consistency and sample difficulty. During the learning process, we dynamically balance sample difficulty and active value, guided by the curriculum learning principle. This strategy prioritizes easier, modality-aligned samples for stable initial training, then gradually increases the difficulty by incorporating more challenging samples with modality conflicts. Extensive experiments demonstrate the effectiveness of our approach on both multimodal sentiment regression and classification benchmarks.

Supplemental Material

MP4 File - 5069-video.mp4
Video Presentation about Tackling Data Utilization in Multimodal Sentiment Analysis with GRACE

References

[1]
Jordan T Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. 2020. Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds. In International Conference on Learning Representations.
[2]
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th International Conference on Machine Learning. PMLR, 41--48.
[3]
Wenbin Cai, Ya Zhang, and Jun Zhou. 2013. Maximizing expected model change for active learning in regression. In 2013 IEEE 13th international conference on data mining. IEEE, 51--60.
[4]
Razvan Caramalau, Binod Bhattarai, and Tae-Kyun Kim. 2021. Sequential graph convolutional network for active learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9583--9592.
[5]
Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse curriculum generation for reinforcement learning. In Conference on robot learning. PMLR, 482--495.
[6]
Ziwang Fu, Feng Liu, Qing Xu, Jiayin Qi, Xiangling Fu, Aimin Zhou, and Zhibin Li. 2022. NHFNET: a non-homogeneous fusion network for multimodal sentiment analysis. In 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[7]
Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep bayesian active learning with image data. In Proceedings of the 34th International Conference on Machine Learning. PMLR, 1183--1192.
[8]
Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan Ö Arik, Larry S Davis, and Tomas Pfister. 2020. Consistency-based semi-supervised active learning: Towards minimizing labeling cost. In Proceedings of the European Conference on Computer Vision. 510--526.
[9]
Sheng Guo, Weilin Huang, Haozhi Zhang, Chenfan Zhuang, Dengke Dong, Matthew R Scott, and Dinglong Huang. 2018. Curriculumnet: Weakly supervised learning from large-scale web images. In Proceedings of the European Conference on Computer Vision. 135--150.
[10]
Devamanyu Hazarika, Roger Zimmermann, and Soujanya Poria. 2020. Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia. 1122--1131.
[11]
Wentao Hu, Lianglun Cheng, Guoheng Huang, Xiaochen Yuan, Guo Zhong, Chi-Man Pun, Jian Zhou, and Muyan Cai. 2024. Learning From Incorrectness: Active Learning With Negative Pre-Training and Curriculum Querying for Histological Tissue Classification. IEEE Transactions on Medical Imaging, Vol. 43, 2 (2024), 625--637.
[12]
Lu Jiang, Deyu Meng, Teruko Mitamura, and Alexander G Hauptmann. 2014. Easy samples first: Self-paced reranking for zero-example multimedia search. In Proceedings of the 22nd ACM International Conference on Multimedia. 547--556.
[13]
Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, and Alexander Hauptmann. 2014. Self-paced learning with diversity. Advances in Neural Information Processing Aystems, Vol. 27 (2014).
[14]
Benjamin Kellenberger, Diego Marcos, Sylvain Lobry, and Devis Tuia. 2019. Half a percent of labels is enough: Efficient animal detection in UAV imagery using deep CNNs and active learning. IEEE Transactions on Geoscience and Remote Sensing, Vol. 57, 12 (2019), 9524--9533.
[15]
Kwanyoung Kim, Dongwon Park, Kwang In Kim, and Se Young Chun. 2021. Task-aware variational adversarial active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8166--8175.
[16]
Andreas Kirsch, Joost Van Amersfoort, and Yarin Gal. 2019. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[17]
Yong Li, Yuanzhi Wang, and Zhen Cui. 2023. Decoupled multimodal distilling for emotion recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6631--6640.
[18]
Liang Lin, Keze Wang, Deyu Meng, Wangmeng Zuo, and Lei Zhang. 2017. Active self-paced learning for cost-effective and progressive face identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 1 (2017), 7--19.
[19]
Peng Liu, Hui Zhang, and Kie B Eom. 2016. Active deep learning for classification of hyperspectral images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 10, 2 (2016), 712--724.
[20]
Yihe Liu, Ziqi Yuan, Huisheng Mao, Zhiyun Liang, Wanqiuyue Yang, Yuanzhe Qiu, Tie Cheng, Xiaoteng Li, Hua Xu, and Kai Gao. 2022. Make acoustic and visual cues matter: CH-SIMS v2. 0 dataset and AV-Mixup consistent module. In Proceedings of the 2022 International Conference on Multimodal Interaction. 247--258.
[21]
Feipeng Ma, Yueyi Zhang, and Xiaoyan Sun. 2023. Multimodal Sentiment Analysis with Preferential Fusion and Distance-aware Contrastive Learning. In 2023 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1367--1372.
[22]
Katerina Margatina, Giorgos Vernikos, Lo"ic Barrault, and Nikolaos Aletras. 2021. Active Learning by Acquiring Contrastive Examples. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 650--663.
[23]
Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha. 2020. M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 1359--1367.
[24]
Seyed Hamidreza Mousavi and Seyed Majid Azimi. 2022. A Deep Curriculum Learner in an Active Learning Cycle for Polsar Image Classification. In IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium. 88--91.
[25]
Jicai Pan, Shangfei Wang, and Lin Fang. 2022. Representation learning through multimodal attention and time-sync comments for affective video content analysis. In Proceedings of the 30th ACM International Conference on Multimedia. 42--50.
[26]
Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neubig, Barnabás Poczós, and Tom Mitchell. 2019. Competence-based Curriculum Learning for Neural Machine Translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 1162--1172.
[27]
Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Brij B Gupta, Xiaojiang Chen, and Xin Wang. 2021. A survey of deep active learning. ACM computing surveys (CSUR), Vol. 54, 9 (2021), 1--40.
[28]
Zhipeng Ren, Daoyi Dong, Huaxiong Li, and Chunlin Chen. 2018. Self-paced prioritized curriculum learning with coverage penalty in deep reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, 6 (2018), 2216--2226.
[29]
Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In International Conference on Learning Representations.
[30]
Meng Shen, Yizheng Huang, Jianxiong Yin, Heqing Zou, Deepu Rajan, and Simon See. 2023. Towards Balanced Active Learning for Multimodal Classification. In Proceedings of the 31st ACM International Conference on Multimedia. 3434--3445.
[31]
Yanyao Shen, Hyokun Yun, Zachary C Lipton, Yakov Kronrod, and Animashree Anandkumar. 2017. Deep Active Learning for Named Entity Recognition. In Proceedings of the 2nd Workshop on Representation Learning for NLP. 252--256.
[32]
Tao Shi and Shao-Lun Huang. 2023. MultiEMO: An attention-based correlation-aware multimodal fusion framework for emotion recognition in conversations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 14752--14766.
[33]
Samarth Sinha, Sayna Ebrahimi, and Trevor Darrell. 2019. Variational adversarial active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5972--5981.
[34]
Jun Sun, Shoukang Han, Yu-Ping Ruan, Xiaoning Zhang, Shu-Kai Zheng, Yulong Liu, Yuxin Huang, and Taihao Li. 2023. Layer-wise fusion with modality independence modeling for multi-modal emotion recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 658--670.
[35]
Ying-Peng Tang and Sheng-Jun Huang. 2019. Self-paced active learning: Query the right thing at the right time. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 5117--5124.
[36]
Yi Tay, Shuohang Wang, Anh Tuan Luu, Jie Fu, Minh C Phan, Xingdi Yuan, Jinfeng Rao, Siu Cheung Hui, and Aston Zhang. 2019. Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4922--4931.
[37]
Toan Tran, Thanh-Toan Do, Ian Reid, and Gustavo Carneiro. 2019. Bayesian generative active deep learning. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 6295--6304.
[38]
Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal Transformer for Unaligned Multimodal Language Sequences. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 6558--6569.
[39]
Jennifer Williams, Steven Kleinegesse, Ramona Comanescu, and Oana Radu. 2018. Recognizing emotions in video using multimodal dnn feature fusion. In Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML). 11--19.
[40]
Junru Wu, Yi Liang, Hassan Akbari, Zhangyang Wang, Cong Yu, et al. 2022. Scaling multimodal pre-training via cross-modality gradient harmonization. Advances in Neural Information Processing Systems, Vol. 35 (2022).
[41]
Dingkang Yang, Shuai Huang, Haopeng Kuang, Yangtao Du, and Lihua Zhang. 2022. Disentangled representation learning for multimodal emotion recognition. In Proceedings of the 30th ACM International Conference on Multimedia. 1642--1651.
[42]
Jiuding Yang, Yakun Yu, Di Niu, Weidong Guo, and Yu Xu. 2023. ConFEDE: Contrastive Feature Decomposition for Multimodal Sentiment Analysis. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 7617--7630.
[43]
Kaicheng Yang, Hua Xu, and Kai Gao. 2020. Cm-bert: Cross-modal bert for text-audio sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia. 521--528.
[44]
Wenmeng Yu, Hua Xu, Fanyang Meng, Yilin Zhu, Yixiao Ma, Jiele Wu, Jiyun Zou, and Kaicheng Yang. 2020. Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3718--3727.
[45]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1103--1114.
[46]
Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. 2016. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intelligent Systems, Vol. 31, 6 (2016), 82--88.
[47]
AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2236--2246.
[48]
Wenqiao Zhang, Lei Zhu, James Hallinan, Shengyu Zhang, Andrew Makmur, Qingpeng Cai, and Beng Chin Ooi. 2022. Boostmis: Boosting medical image semi-supervised learning with adaptive pseudo labeling and informative active annotation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20666--20676.
[49]
Fedor Zhdanov. 2019. Diverse mini-batch Active Learning. arxiv: 1901.05954
[50]
Daoming Zong, Chaoyue Ding, Baoxiang Li, Jiakui Li, Ken Zheng, and Qunyan Zhou. 2023. AcFormer: An Aligned and Compact Transformer for Multimodal Sentiment Analysis. In Proceedings of the 31st ACM International Conference on Multimedia. 833--842.

Index Terms

  1. GRACE: GRadient-based Active Learning with Curriculum Enhancement for Multimodal Sentiment Analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
      October 2024
      11719 pages
      ISBN:9798400706868
      DOI:10.1145/3664647
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 October 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. active learning
      2. curriculum learning
      3. multimodal sentiment analysis

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China (NSFC)
      • National Key R&D Program of China

      Conference

      MM '24
      Sponsor:
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne VIC, Australia

      Acceptance Rates

      MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 126
        Total Downloads
      • Downloads (Last 12 months)126
      • Downloads (Last 6 weeks)29
      Reflects downloads up to 10 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media