skip to main content
10.1145/3539618.3591739acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Multimodal Counterfactual Learning Network for Multimedia-based Recommendation

Published: 18 July 2023 Publication History

Abstract

Multimedia-based recommendation (MMRec) utilizes multimodal content (images, textual descriptions, etc.) as auxiliary information on historical interactions to determine user preferences. Most MMRec approaches predict user interests by exploiting a large amount of multimodal contents of user-interacted items, ignoring the potential effect of multimodal content of user-uninteracted items. As a matter of fact, there is a small portion of user preference-irrelevant features in the multimodal content of user-interacted items, which may be a kind of spurious correlation with user preferences, thereby degrading the recommendation performance. In this work, we argue that the multimodal content of user-uninteracted items can be further exploited to identify and eliminate the user preference-irrelevant portion inside user-interacted multimodal content, for example by counterfactual inference of causal theory. Going beyond multimodal user preference modeling only using interacted items, we propose a novel model called Multimodal Counterfactual Learning Network (MCLN), in which user-uninteracted items' multimodal content is additionally exploited to further purify the representation of user preference-relevant multimodal content that better matches the user's interests, yielding state-of-the-art performance. Extensive experiments are conducted to validate the effectiveness and rationality of MCLN. We release the complete codes of MCLN at https://github.com/hfutmars/MCLN.

Supplemental Material

MP4 File
Presentation video of SIGIR'23 paper "Multimodal Counterfactual Learning Network for Multimedia-based Recommendation"

References

[1]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In ICLR. 1--16.
[2]
Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263 (2017).
[3]
Desheng Cai, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2021. Heterogeneous hierarchical feature aggregation network for personalized micro-video recommendation. TMM, Vol. 24 (2021), 805--818.
[4]
Lei Chen, Le Wu, Richang Hong, Kun Zhang, and Meng Wang. 2020. Revisiting graph based collaborative filtering: A linear residual graph convolutional network approach. In AAAI. 27--34.
[5]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. IEEE, 248--255.
[6]
Xiaoyu Du, Zike Wu, Fuli Feng, Xiangnan He, and Jinhui Tang. 2022. Invariant Representation Learning for Multimedia Recommendation. In MM. ACM, 619--628.
[7]
Fuli Feng, Jizhi Zhang, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2021. Empowering language understanding with counterfactual reasoning. In ACL-IJCNLP Findings. ACL, 2226--2236.
[8]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In AISTATS. JMLR, 249--256.
[9]
Dan Guo, Hui Wang, and Meng Wang. 2021. Context-aware graph inference with knowledge distillation for visual dialog. TPAMI, Vol. 44, 10 (2021), 6056--6073.
[10]
Dan Guo, Hui Wang, Shuhui Wang, and Meng Wang. 2020. Textual-visual reference-aware attention network for visual dialog. TIP, Vol. 29 (2020), 6655--6666.
[11]
Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized ranking from implicit feedback. In AAAI. 144--150.
[12]
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. In SIGIR. ACM, 639--648.
[13]
Xiangnan He, Zhankui He, Xiaoyu Du, and Tat-Seng Chua. 2018. Adversarial personalized ranking for recommendation. In SIGIR. ACM, 355--364.
[14]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW. ACM, 173--182.
[15]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR. 1--15.
[16]
Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR. 1--14.
[17]
Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In SIGKDD. ACM, 426--434.
[18]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer, Vol. 42, 8 (2009), 30--37.
[19]
Fan Liu, Huilin Chen, Zhiyong Cheng, Anan Liu, Liqiang Nie, and Mohan Kankanhalli. 2022a. Disentangled Multimodal Representation Learning for Recommendation. TMM (2022), 1--11.
[20]
Kang Liu, Feng Xue, Dan Guo, Peijie Sun, Shengsheng Qian, and Richang Hong. 2023 a. Multimodal Graph Contrastive Learning for Multimedia-Based Recommendation. TMM (2023), 1--13.
[21]
Kang Liu, Feng Xue, Dan Guo, Le Wu, Shujie Li, and Richang Hong. 2023 b. MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation. TOIS, Vol. 41, 2 (2023), 1--27.
[22]
Kang Liu, Feng Xue, Xiangnan He, Dan Guo, and Richang Hong. 2023 c. Joint Multi-Grained Popularity-Aware Graph Convolution Collaborative Filtering for Recommendation. TCSS, Vol. 10, 1 (2023), 72--83.
[23]
Kang Liu, Feng Xue, Shuaiyang Li, Sheng Sang, and Richang Hong. 2022b. Multimodal Hierarchical Graph Collaborative Filtering for Multimedia-Based Recommendation. TCSS (2022), 1--12.
[24]
Yuanxing Liu, Zhaochun Ren, Wei-Nan Zhang, Wanxiang Che, Ting Liu, and Dawei Yin. 2020. Keywords generation improves e-commerce session-based recommendation. In WWW. ACM, 1604--1614.
[25]
Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In EMNLP-IJCNLP. 188--197.
[26]
Judea Pearl. 2009. Causality. Cambridge university press.
[27]
Judea Pearl. 2022. Direct and indirect effects. In Probabilistic and Causal Inference: The Works of Judea Pearl. 373--392.
[28]
Jiaxin Qi, Yulei Niu, Jianqiang Huang, and Hanwang Zhang. 2020. Two causal principles for improving visual dialog. In CVPR. IEEE, 10860--10869.
[29]
Ruihong Qiu, Sen Wang, Zhi Chen, Hongzhi Yin, and Zi Huang. 2021. Causalrec: Causal inference for visual debiasing in visually-aware recommendation. In MM. ACM, 3844--3852.
[30]
Yongming Rao, Guangyi Chen, Jiwen Lu, and Jie Zhou. 2021. Counterfactual attention learning for fine-grained visual categorization and re-identification. In ICCV. IEEE, 1025--1034.
[31]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI. 452--461.
[32]
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In WWW. ACM, 285--295.
[33]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR. 1--14.
[34]
Chien-Lin Tang, Jingxian Liao, Hao-Chuan Wang, Ching-Ying Sung, and Wen-Chieh Lin. 2021. Conceptguide: Supporting online video learning with concept map-based recommendation of learning path. In WWW. ACM, 2757--2768.
[35]
Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased scene graph generation from biased training. In CVPR. IEEE, 3716--3725.
[36]
Zhulin Tao, Yinwei Wei, Xiang Wang, Xiangnan He, Xianglin Huang, and Tat-Seng Chua. 2020. MGAT: multimodal graph attention network for recommendation. IPM, Vol. 57, 5 (2020), 102277.
[37]
Quoc-Tuan Truong, Aghiles Salah, and Hady Lauw. 2021. Multi-modal recommender systems: Hands-on exploration. In RecSys. ACM, 834--837.
[38]
Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In ICLR. 1--12.
[39]
Shoujin Wang, Liang Hu, Yan Wang, Xiangnan He, Quan Z Sheng, Mehmet A Orgun, Longbing Cao, Francesco Ricci, and Philip S Yu. 2021c. Graph learning based recommender systems: A review. In IJCAI. 4644--4652.
[40]
Tan Wang, Jianqiang Huang, Hanwang Zhang, and Qianru Sun. 2020. Visual commonsense r-cnn. In CVPR. IEEE, 10760--10770.
[41]
Wenjie Wang, Fuli Feng, Xiangnan He, Xiang Wang, and Tat-Seng Chua. 2021a. Deconfounded recommendation for alleviating bias amplification. In SIGKDD. ACM, 1717--1725.
[42]
Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2021b. Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue. In SIGIR. ACM, 1288--1297.
[43]
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In SIGIR. ACM, 165--174.
[44]
Tianxin Wei, Fuli Feng, Jiawei Chen, Ziwei Wu, Jinfeng Yi, and Xiangnan He. 2021a. Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system. In SIGKDD. ACM, 1791--1800.
[45]
Yinwei Wei, Xiang Wang, Xiangnan He, Liqiang Nie, Yong Rui, and Tat-Seng Chua. 2021b. Hierarchical user intent graph network for multimedia recommendation. TMM, Vol. 24 (2021), 2701--2712.
[46]
Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In MM. ACM, 1437--1445.
[47]
Le Wu, Xiangnan He, Xiang Wang, Kun Zhang, and Meng Wang. 2023. A survey on accuracy-oriented neural recommendation: From collaborative filtering to information-rich recommendation. TKDE, Vol. 35, 5 (2023), 4425--4445.
[48]
Feng Xue, Xiangnan He, Xiang Wang, Jiandong Xu, Kai Liu, and Richang Hong. 2019. Deep item-based collaborative filtering for top-n recommendation. TOIS, Vol. 37, 3 (2019), 1--25.
[49]
Liangwei Yang, Zhiwei Liu, Yu Wang, Chen Wang, Ziwei Fan, and Philip S Yu. 2022. Large-scale Personalized Video Game Recommendation via Social-aware Contextualized Graph Neural Network. In WWW. ACM, 3376--3386.
[50]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In SIGKDD. ACM, 974--983.
[51]
Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In SIGKDD. ACM, 353--362.
[52]
Yang Zhang, Fuli Feng, Xiangnan He, Tianxin Wei, Chonggang Song, Guohui Ling, and Yongdong Zhang. 2021. Causal intervention for leveraging popularity bias in recommendation. In SIGIR. ACM, 11--20.

Cited By

View all
  • (2025)Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model FusionEntropy10.3390/e2701005627:1(56)Online publication date: 10-Jan-2025
  • (2025)$$\text {H}^2\text {CAN}$$: heterogeneous hypergraph attention network with counterfactual learning for multimodal sentiment analysisComplex & Intelligent Systems10.1007/s40747-025-01806-y11:4Online publication date: 28-Feb-2025
  • (2024)Multimodal Recommender Systems: A SurveyACM Computing Surveys10.1145/369546157:2(1-17)Online publication date: 10-Oct-2024
  • Show More Cited By

Index Terms

  1. Multimodal Counterfactual Learning Network for Multimedia-based Recommendation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2023
      3567 pages
      ISBN:9781450394086
      DOI:10.1145/3539618
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 July 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. counterfactual learning
      2. multimodal user preference
      3. recommender systems
      4. spurious correlation

      Qualifiers

      • Research-article

      Funding Sources

      • Anhui Provincial Major Science and Technology Project, China
      • National Natural Science Foundation of China
      • Seventh Special Support Plan for Innovation and Entrepreneurship in Anhui Province
      • University Synergy Innovation Program of Anhui Province

      Conference

      SIGIR '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)291
      • Downloads (Last 6 weeks)25
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model FusionEntropy10.3390/e2701005627:1(56)Online publication date: 10-Jan-2025
      • (2025)$$\text {H}^2\text {CAN}$$: heterogeneous hypergraph attention network with counterfactual learning for multimodal sentiment analysisComplex & Intelligent Systems10.1007/s40747-025-01806-y11:4Online publication date: 28-Feb-2025
      • (2024)Multimodal Recommender Systems: A SurveyACM Computing Surveys10.1145/369546157:2(1-17)Online publication date: 10-Oct-2024
      • (2024)Multimodal Graph Causal Embedding for Multimedia-Based RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342426836:12(8842-8858)Online publication date: 1-Dec-2024
      • (2024)Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution networkApplied Intelligence10.1007/s10489-024-06061-155:1Online publication date: 18-Nov-2024
      • (2024)Multimodal recommender system based on multi-channel counterfactual learning networksMultimedia Systems10.1007/s00530-024-01448-z30:5Online publication date: 13-Aug-2024
      • (2023)Enhanced Implicit Collaborative Knowledge Graph for Recommendation2023 4th International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)10.1109/ICHCI58871.2023.10277733(255-261)Online publication date: 4-Aug-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media