research-article

Open access

Deconfounded Emotion Guidance Sticker Selection with Causal Inference

Authors:

Qing LiAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 3084 - 3093

https://doi.org/10.1145/3664647.3681522

Published: 28 October 2024 Publication History

Abstract

With the increasing popularity of online social applications, stickers have become common in online chats. Teaching a model to select the appropriate sticker from a set of candidate stickers based on dialogue context is important for optimizing the user experience. Existing methods have proposed leveraging emotional information to facilitate the selection of appropriate stickers. However, considering the frequent co-occurrence among sticker images, words with emotional preference in the dialogue and emotion labels, these methods tend to over-rely on such dataset bias, inducing spurious correlations during training. As a result, these methods may select inappropriate stickers that do not match users' intended expression. In this paper, we introduce a causal graph to explicitly identify the spurious correlations in the sticker selection task. Building upon the analysis, we propose a Causal Knowledge-Enhanced Sticker Selection model to mitigate spurious correlations. Specifically, we design a knowledge-enhanced emotional utterance extractor to identify emotional information within dialogues. Then an interventional visual feature extractor is employed to obtain unbiased visual features, aligning them with the emotional utterances representation. Finally, a standard transformer encoder fuses the multimodal information for emotion recognition and sticker selection. Extensive experiments on the MOD dataset show that our CKS model significantly outperforms the baseline models.

References

[1]

Nasir Ahmed, T. Raj Natarajan, and K. R. Rao. 1974. Discrete Cosine Transform. IEEE Trans. Computers, Vol. 23, 1 (1974), 90--93.

Digital Library

[2]

Mohammad Rafayet Ali, Taylan K. Sen, Benjamin Kane, Shagun Bose, Thomas M. Carroll, Ronald M. Epstein, Lenhart K. Schubert, and Ehsan Hoque. 2023. Novel Computational Linguistic Measures, Dialogue System and the Development of SOPHIE: Standardized Online Patient for Healthcare Interaction Education. IEEE Trans. Affect. Comput., Vol. 14, 1 (2023), 223--235.

Digital Library

[3]

Francesco Barbieri, Miguel Ballesteros, Francesco Ronzano, and Horacio Saggion. 2018. Multimodal Emoji Prediction. In Proc. of NAACL (Short Papers), Marilyn A. Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, 679--686.

[4]

Francesco Barbieri, Miguel Ballesteros, and Horacio Saggion. 2017. Are Emojis Predictable?. In Proc. of EACL, Mirella Lapata, Phil Blunsom, and Alexander Koller (Eds.). Association for Computational Linguistics, 105--111.

[5]

Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, and Yejin Choi. 2019. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. In Proc. of ACL, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 4762--4779.

[6]

Hua Cai, Xuli Shen, Qing Xu, Weilin Shen, Xiaomei Wang, Weifeng Ge, Xiaoqing Zheng, and Xiangyang Xue. 2023. Improving Empathetic Dialogue Generation by Dynamically Infusing Commonsense Knowledge. In Proc. of Findings of ACL, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, 7858--7873.

[7]

Jiali Chen, Zhenjun Guo, Jiayuan Xie, Yi Cai, and Qing Li. 2023. Deconfounded Visual Question Generation with Causal Inference. In Proc. of ACM MM, Abdulmotaleb El-Saddik, Tao Mei, Rita Cucchiara, Marco Bertini, Diana Patricia Tobon Vallejo, Pradeep K. Atrey, and M. Shamim Hossain (Eds.). ACM, 5132--5142.

Digital Library

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of NAACL, Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186.

[9]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proc. of ICLR. OpenReview.net.

[10]

Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, and Haoshuang Wang. 2020. PP-OCR: A Practical Ultra Lightweight OCR System. CoRR, Vol. abs/2009.09941 (2020).

[11]

Paul Ekman. 1992. An argument for basic emotions. Cognition and Emotion, Vol. 6, 3--4 (1992), 169--200.

[12]

Amir Feder, Katherine A. Keith, Emaad Manzoor, Reid Pryzant, Dhanya Sridhar, Zach Wood-Doughty, Jacob Eisenstein, Justin Grimmer, Roi Reichart, Margaret E. Roberts, Brandon M. Stewart, Victor Veitch, and Diyi Yang. 2022. Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond. Trans. Assoc. Comput. Linguistics, Vol. 10 (2022), 1138--1158.

[13]

Zhengcong Fei, Zekang Li, Jinchao Zhang, Yang Feng, and Jie Zhou. 2021. Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark. CoRR, Vol. abs/2109.01839 (2021).

[14]

Mauajama Firdaus, Hardik Chauhan, Asif Ekbal, and Pushpak Bhattacharyya. 2022. EmoSen: Generating Sentiment and Emotion Controlled Responses in a Multimodal Dialogue System. IEEE Trans. Affect. Comput., Vol. 13, 3 (2022), 1555--1566.

[15]

Shen Gao, Xiuying Chen, Chang Liu, Li Liu, Dongyan Zhao, and Rui Yan. 2020. Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog. In Proc. of WWW, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen (Eds.). ACM / IW3C2, 1138--1148.

Digital Library

[16]

Feng Ge, Weizhao Li, Haopeng Ren, and Yi Cai. 2022. Towards Exploiting Sticker for Multimodal Sentiment Analysis in Social Media: A New Dataset and Baseline. In Proc. of COLING, Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na (Eds.). International Committee on Computational Linguistics, 6795--6804.

[17]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. In Proc. of CVPR. Computer Vision Foundation / IEEE, 9726--9735.

[18]

Jiaxing Huang, Dayan Guan, Aoran Xiao, and Shijian Lu. 2021. FSDR: Frequency Space Domain Randomization for Domain Generalization. In Proc. of CVPR. Computer Vision Foundation / IEEE, 6891--6902.

[19]

Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. CoRR, Vol. abs/1508.01991 (2015).

[20]

Jing Huo, Wenbin Li, Yinghuan Shi, Yang Gao, and Hujun Yin. 2018. WebCaricature: a benchmark for caricature recognition. In Proc, of BMVC. BMVA Press, 223.

[21]

Joris H. Janssen, Wijnand A. IJsselsteijn, and Joyce H. D. M. Westerink. 2014. How affective technologies can influence intimate interactions and improve social connectedness. Int. J. Hum. Comput. Stud., Vol. 72, 1 (2014), 33--43.

Digital Library

[22]

Julie Jiang, Ron Dotsch, Mireia Triguero Roura, Yozen Liu, Vítor Silva, Maarten W. Bos, and Francesco Barbieri. 2023. Reciprocity, Homophily, and Social Network Effects in Pictorial Communication: A Case Study of Bitmoji Stickers. In Proc. of CHI, Albrecht Schmidt, Kaisa Väänänen, Tesh Goyal, Per Ola Kristensson, Anicia Peters, Stefanie Mueller, Julie R. Williamson, and Max L. Wilson (Eds.). ACM, 675:1--675:14.

Digital Library

[23]

Shin Katayama, Shunsuke Aoki, Takuro Yonezawa, Tadashi Okoshi, Jin Nakazawa, and Nobuo Kawaguchi. 2022. ER-Chat: A Text-to-Text Open-Domain Dialogue Framework for Emotion Regulation. IEEE Trans. Affect. Comput., Vol. 13, 4 (2022), 2229--2237.

[24]

Wonjae Kim, Bokyung Son, and Ildoo Kim. 2021. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. In Proc. of ICML (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 5583--5594.

[25]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proc. of ICLR.

[26]

Jing Yu Koh, Ruslan Salakhutdinov, and Daniel Fried. 2023. Grounding Language Models to Images for Multimodal Inputs and Outputs. In Proc. of ICML. 17283--17300.

[27]

Abhishek Laddha, Mohamed Hanoosh, Debdoot Mukherjee, Parth Patwa, and Ankur Narang. 2020. Understanding Chat Messages for Sticker Recommendation in Messaging Apps. In Proc. of AAAI. AAAI Press, 13156--13163.

[28]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proc. of ACL, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 7871--7880.

[29]

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In Proc. of ICML. 19730--19742.

[30]

Xiao Liu, Da Yin, Yansong Feng, Yuting Wu, and Dongyan Zhao. 2021. Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis. In Proc. of NAACL, Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tür, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou (Eds.). Association for Computational Linguistics, 1928--1941.

[31]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, Vol. abs/1907.11692 (2019).

[32]

Fangrui Lv, Jian Liang, Shuang Li, Bin Zang, Chi Harold Liu, Ziteng Wang, and Di Liu. 2022. Causality Inspired Representation Learning for Domain Generalization. In Proc. of CVPR. IEEE, 8036--8046.

[33]

Gilhyun Nam, Gyeongjae Choi, and Kyungmin Lee. 2022. GCISG: Guided Causal Invariant Learning for Improved Syn-to-Real Generalization. In Proc. of ECCV (Lecture Notes in Computer Science, Vol. 13693), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer, 656--672.

Digital Library

[34]

Judea Pearl et al. 2000. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress, Vol. 19, 2 (2000), 3.

[35]

Judea Pearl and Dana Mackenzie. 2018. The book of why: the new science of cause and effect. Basic books.

[36]

Jiaxin Qi, Yulei Niu, Jianqiang Huang, and Hanwang Zhang. 2020. Two Causal Principles for Improving Visual Dialog. In Proc. of CVPR. Computer Vision Foundation / IEEE, 10857--10866.

[37]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proc. of ICML (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748--8763.

[38]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.

[39]

Maarten Sap, Ronan Le Bras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. 2019. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. In Proc. of AAAI. AAAI Press, 3027--3035.

Digital Library

[40]

Esha Shandilya, Mingming Fan, and Garreth W. Tigwell. 2022. "I need to be professional until my new team uses emoji, GIFs, or memes first”: New Collaborators' Perspectives on Using Non-Textual Communication in Virtual Workspaces. In Proc. of CHI, Simone D. J. Barbosa, Cliff Lampe, Caroline Appert, David A. Shamma, Steven Mark Drucker, Julie R. Williamson, and Koji Yatani (Eds.). ACM, 528:1--528:13.

[41]

Stephen Vaisey. 2009. Motivation and justification: A dual-process model of culture in action. American journal of sociology, Vol. 114, 6 (2009), 1675--1715.

[42]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Proc. of NeurIPS, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998--6008.

[43]

Tan Wang, Jianqiang Huang, Hanwang Zhang, and Qianru Sun. 2020. Visual Commonsense R-CNN. In Proc. of CVPR. Computer Vision Foundation / IEEE, 10757--10767.

[44]

Xingyao Wang and David Jurgens. 2021. An animated picture says at least a thousand words: Selecting Gif-based Replies in Multimodal Dialog. In Proc. of EMNLP Findings, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 3228--3257.

[45]

Yinwei Wei, Xiang Wang, Liqiang Nie, Shaoyu Li, Dingxian Wang, and Tat-Seng Chua. 2023. Causal Inference for Knowledge Graph Based Recommendation. IEEE Trans. Knowl. Data Eng., Vol. 35, 11 (2023), 11153--11164.

Digital Library

[46]

Ruobing Xie, Zhiyuan Liu, Rui Yan, and Maosong Sun. 2016. Neural Emoji Recommendation in Dialogue Systems. CoRR, Vol. abs/1612.04609 (2016).

[47]

Siyu Yuan, Deqing Yang, Jinxi Liu, Shuyu Tian, Jiaqing Liang, Yanghua Xiao, and Rui Xie. 2023. Causality-aware Concept Extraction based on Knowledge-guided Prompting. In Proc. of ACL, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, 9255--9272.

[48]

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona T. Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. 2022. OPT: Open Pre-trained Transformer Language Models. CoRR, Vol. abs/2205.01068 (2022).

[49]

Zhexin Zhang, Yeshuang Zhu, Zhengcong Fei, Jinchao Zhang, and Jie Zhou. 2022. Selecting Stickers in Open-Domain Dialogue through Multitask Learning. In Proc. of ACL Findings, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 3053--3060.

Index Terms

Deconfounded Emotion Guidance Sticker Selection with Causal Inference
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia content creation

Recommendations

The Role of Empathic Traits in Emotion Recognition and Emotion Contagion of Cozmo Robots
HRI '22: Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction

In this online study, we investigated how well people could recognize emotions displayed by video recordings of a Cozmo robot, and the extent to which emotion recognition is shaped by individuals' empathic traits. We also explored whether participants ...
Multimodal Emotion Expressions of Virtual Agents, Mimic and Vocal Emotion Expressions and Their Effects on Emotion Recognition
ACII '13: Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction

Emotional expressions of virtual agents are widely believed to enhance the interaction with the user by utilizing more natural means of communication. However, as a result of the current technology virtual agents are often only able to produce facial ...
Emotion Recognition Using Physiological Signals
MIDI '15: Proceedings of the Mulitimedia, Interaction, Design and Innnovation

In this paper the problem of emotion recognition using physiological signals is presented. Firstly the problems with acquisition of physiological signals related to specific human emotions are described. It is not a trivial problem to elicit real ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Guangdong Provincial Fund for Basic and Applied Basic Research?Regional Joint Fund Project (Key Project)
the Hong Kong Polytechnic University?s Postdoc Matching Fund
the Science and Technology Planning Project of Guangdong Province
Guangdong Provincial Natural Science Foundation for Outstanding Youth Team Project
the China Computer Federation (CCF)-Zhipu AI Large Model Fund
the Fundamental Research Funds for the Central Universities, South China University of Technology
National Natural Science Foundation of China

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
141
Total Downloads

Downloads (Last 12 months)141
Downloads (Last 6 weeks)68

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents