research-article

Multimodal Counterfactual Learning Network for Multimedia-based Recommendation

Authors:

Feng XueAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1539 - 1548

https://doi.org/10.1145/3539618.3591739

Published: 18 July 2023 Publication History

Abstract

Multimedia-based recommendation (MMRec) utilizes multimodal content (images, textual descriptions, etc.) as auxiliary information on historical interactions to determine user preferences. Most MMRec approaches predict user interests by exploiting a large amount of multimodal contents of user-interacted items, ignoring the potential effect of multimodal content of user-uninteracted items. As a matter of fact, there is a small portion of user preference-irrelevant features in the multimodal content of user-interacted items, which may be a kind of spurious correlation with user preferences, thereby degrading the recommendation performance. In this work, we argue that the multimodal content of user-uninteracted items can be further exploited to identify and eliminate the user preference-irrelevant portion inside user-interacted multimodal content, for example by counterfactual inference of causal theory. Going beyond multimodal user preference modeling only using interacted items, we propose a novel model called Multimodal Counterfactual Learning Network (MCLN), in which user-uninteracted items' multimodal content is additionally exploited to further purify the representation of user preference-relevant multimodal content that better matches the user's interests, yielding state-of-the-art performance. Extensive experiments are conducted to validate the effectiveness and rationality of MCLN. We release the complete codes of MCLN at https://github.com/hfutmars/MCLN.

Supplemental Material

MP4 File

Presentation video of SIGIR'23 paper "Multimodal Counterfactual Learning Network for Multimedia-based Recommendation"

Download
26.58 MB

References

[1]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In ICLR. 1--16.

[2]

Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263 (2017).

[3]

Desheng Cai, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2021. Heterogeneous hierarchical feature aggregation network for personalized micro-video recommendation. TMM, Vol. 24 (2021), 805--818.

[4]

Lei Chen, Le Wu, Richang Hong, Kun Zhang, and Meng Wang. 2020. Revisiting graph based collaborative filtering: A linear residual graph convolutional network approach. In AAAI. 27--34.

[5]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. IEEE, 248--255.

[6]

Xiaoyu Du, Zike Wu, Fuli Feng, Xiangnan He, and Jinhui Tang. 2022. Invariant Representation Learning for Multimedia Recommendation. In MM. ACM, 619--628.

[7]

Fuli Feng, Jizhi Zhang, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2021. Empowering language understanding with counterfactual reasoning. In ACL-IJCNLP Findings. ACL, 2226--2236.

[8]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In AISTATS. JMLR, 249--256.

[9]

Dan Guo, Hui Wang, and Meng Wang. 2021. Context-aware graph inference with knowledge distillation for visual dialog. TPAMI, Vol. 44, 10 (2021), 6056--6073.

Digital Library

[10]

Dan Guo, Hui Wang, Shuhui Wang, and Meng Wang. 2020. Textual-visual reference-aware attention network for visual dialog. TIP, Vol. 29 (2020), 6655--6666.

[11]

Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized ranking from implicit feedback. In AAAI. 144--150.

[12]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. In SIGIR. ACM, 639--648.

Digital Library

[13]

Xiangnan He, Zhankui He, Xiaoyu Du, and Tat-Seng Chua. 2018. Adversarial personalized ranking for recommendation. In SIGIR. ACM, 355--364.

[14]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW. ACM, 173--182.

[15]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR. 1--15.

[16]

Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR. 1--14.

[17]

Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In SIGKDD. ACM, 426--434.

[18]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer, Vol. 42, 8 (2009), 30--37.

Digital Library

[19]

Fan Liu, Huilin Chen, Zhiyong Cheng, Anan Liu, Liqiang Nie, and Mohan Kankanhalli. 2022a. Disentangled Multimodal Representation Learning for Recommendation. TMM (2022), 1--11.

[20]

Kang Liu, Feng Xue, Dan Guo, Peijie Sun, Shengsheng Qian, and Richang Hong. 2023 a. Multimodal Graph Contrastive Learning for Multimedia-Based Recommendation. TMM (2023), 1--13.

[21]

Kang Liu, Feng Xue, Dan Guo, Le Wu, Shujie Li, and Richang Hong. 2023 b. MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation. TOIS, Vol. 41, 2 (2023), 1--27.

Digital Library

[22]

Kang Liu, Feng Xue, Xiangnan He, Dan Guo, and Richang Hong. 2023 c. Joint Multi-Grained Popularity-Aware Graph Convolution Collaborative Filtering for Recommendation. TCSS, Vol. 10, 1 (2023), 72--83.

[23]

Kang Liu, Feng Xue, Shuaiyang Li, Sheng Sang, and Richang Hong. 2022b. Multimodal Hierarchical Graph Collaborative Filtering for Multimedia-Based Recommendation. TCSS (2022), 1--12.

[24]

Yuanxing Liu, Zhaochun Ren, Wei-Nan Zhang, Wanxiang Che, Ting Liu, and Dawei Yin. 2020. Keywords generation improves e-commerce session-based recommendation. In WWW. ACM, 1604--1614.

[25]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In EMNLP-IJCNLP. 188--197.

[26]

Judea Pearl. 2009. Causality. Cambridge university press.

[27]

Judea Pearl. 2022. Direct and indirect effects. In Probabilistic and Causal Inference: The Works of Judea Pearl. 373--392.

[28]

Jiaxin Qi, Yulei Niu, Jianqiang Huang, and Hanwang Zhang. 2020. Two causal principles for improving visual dialog. In CVPR. IEEE, 10860--10869.

[29]

Ruihong Qiu, Sen Wang, Zhi Chen, Hongzhi Yin, and Zi Huang. 2021. Causalrec: Causal inference for visual debiasing in visually-aware recommendation. In MM. ACM, 3844--3852.

[30]

Yongming Rao, Guangyi Chen, Jiwen Lu, and Jie Zhou. 2021. Counterfactual attention learning for fine-grained visual categorization and re-identification. In ICCV. IEEE, 1025--1034.

[31]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI. 452--461.

Digital Library

[32]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In WWW. ACM, 285--295.

[33]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR. 1--14.

[34]

Chien-Lin Tang, Jingxian Liao, Hao-Chuan Wang, Ching-Ying Sung, and Wen-Chieh Lin. 2021. Conceptguide: Supporting online video learning with concept map-based recommendation of learning path. In WWW. ACM, 2757--2768.

[35]

Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased scene graph generation from biased training. In CVPR. IEEE, 3716--3725.

[36]

Zhulin Tao, Yinwei Wei, Xiang Wang, Xiangnan He, Xianglin Huang, and Tat-Seng Chua. 2020. MGAT: multimodal graph attention network for recommendation. IPM, Vol. 57, 5 (2020), 102277.

[37]

Quoc-Tuan Truong, Aghiles Salah, and Hady Lauw. 2021. Multi-modal recommender systems: Hands-on exploration. In RecSys. ACM, 834--837.

[38]

Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In ICLR. 1--12.

[39]

Shoujin Wang, Liang Hu, Yan Wang, Xiangnan He, Quan Z Sheng, Mehmet A Orgun, Longbing Cao, Francesco Ricci, and Philip S Yu. 2021c. Graph learning based recommender systems: A review. In IJCAI. 4644--4652.

[40]

Tan Wang, Jianqiang Huang, Hanwang Zhang, and Qianru Sun. 2020. Visual commonsense r-cnn. In CVPR. IEEE, 10760--10770.

[41]

Wenjie Wang, Fuli Feng, Xiangnan He, Xiang Wang, and Tat-Seng Chua. 2021a. Deconfounded recommendation for alleviating bias amplification. In SIGKDD. ACM, 1717--1725.

[42]

Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2021b. Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue. In SIGIR. ACM, 1288--1297.

[43]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In SIGIR. ACM, 165--174.

[44]

Tianxin Wei, Fuli Feng, Jiawei Chen, Ziwei Wu, Jinfeng Yi, and Xiangnan He. 2021a. Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system. In SIGKDD. ACM, 1791--1800.

[45]

Yinwei Wei, Xiang Wang, Xiangnan He, Liqiang Nie, Yong Rui, and Tat-Seng Chua. 2021b. Hierarchical user intent graph network for multimedia recommendation. TMM, Vol. 24 (2021), 2701--2712.

Digital Library

[46]

Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In MM. ACM, 1437--1445.

Digital Library

[47]

Le Wu, Xiangnan He, Xiang Wang, Kun Zhang, and Meng Wang. 2023. A survey on accuracy-oriented neural recommendation: From collaborative filtering to information-rich recommendation. TKDE, Vol. 35, 5 (2023), 4425--4445.

Digital Library

[48]

Feng Xue, Xiangnan He, Xiang Wang, Jiandong Xu, Kai Liu, and Richang Hong. 2019. Deep item-based collaborative filtering for top-n recommendation. TOIS, Vol. 37, 3 (2019), 1--25.

Digital Library

[49]

Liangwei Yang, Zhiwei Liu, Yu Wang, Chen Wang, Ziwei Fan, and Philip S Yu. 2022. Large-scale Personalized Video Game Recommendation via Social-aware Contextualized Graph Neural Network. In WWW. ACM, 3376--3386.

[50]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In SIGKDD. ACM, 974--983.

[51]

Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In SIGKDD. ACM, 353--362.

[52]

Yang Zhang, Fuli Feng, Xiangnan He, Tianxin Wei, Chonggang Song, Guohui Ling, and Yongdong Zhang. 2021. Causal intervention for leveraging popularity bias in recommendation. In SIGIR. ACM, 11--20.

Cited By

Hu XZhang H(2025)Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model FusionEntropy10.3390/e2701005627:1(56)Online publication date: 10-Jan-2025
https://doi.org/10.3390/e27010056
Huang CLin ZHuang QHuang XJiang FChen J(2025)$$\text {H}^2\text {CAN}$$: heterogeneous hypergraph attention network with counterfactual learning for multimodal sentiment analysisComplex & Intelligent Systems10.1007/s40747-025-01806-y11:4Online publication date: 28-Feb-2025
https://doi.org/10.1007/s40747-025-01806-y
Liu QHu JXiao YZhao XGao JWang WLi QTang J(2024)Multimodal Recommender Systems: A SurveyACM Computing Surveys10.1145/369546157:2(1-17)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3695461
Show More Cited By

Index Terms

Multimodal Counterfactual Learning Network for Multimedia-based Recommendation
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
    2. Users and interactive retrieval
      1. Personalization

Recommendations

Multimodal recommender system based on multi-channel counterfactual learning networks
Abstract
Most multimodal recommender systems utilize multimodal content of user-interacted items as supplemental information to capture user preferences based on historical interactions without considering user-uninteracted items. In contrast, multimodal ...
User-Specific Feature-Based Similarity Models for Top-n Recommendation of New Items
Survey Paper, Regular Papers and Special Section on Participatory Sensing and Crowd Intelligence

Recommending new items for suitable users is an important yet challenging problem due to the lack of preference history for the new items. Noncollaborative user modeling techniques that rely on the item features can be used to recommend new items. ...
Jointly modeling content, social network and ratings for explainable and cold-start recommendation

Model-based approach to collaborative filtering (CF), such as latent factor models, has improved both accuracy and efficiency of predictions on large and sparse dataset. However, most existing methods still face two major problems: (1) the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Anhui Provincial Major Science and Technology Project, China
National Natural Science Foundation of China
Seventh Special Support Plan for Innovation and Entrepreneurship in Anhui Province
University Synergy Innovation Program of Anhui Province

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
685
Total Downloads

Downloads (Last 12 months)291
Downloads (Last 6 weeks)25

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu XZhang H(2025)Invariant Representation Learning in Multimedia Recommendation with Modality Alignment and Model FusionEntropy10.3390/e2701005627:1(56)Online publication date: 10-Jan-2025
https://doi.org/10.3390/e27010056
Huang CLin ZHuang QHuang XJiang FChen J(2025)$$\text {H}^2\text {CAN}$$: heterogeneous hypergraph attention network with counterfactual learning for multimodal sentiment analysisComplex & Intelligent Systems10.1007/s40747-025-01806-y11:4Online publication date: 28-Feb-2025
https://doi.org/10.1007/s40747-025-01806-y
Liu QHu JXiao YZhao XGao JWang WLi QTang J(2024)Multimodal Recommender Systems: A SurveyACM Computing Surveys10.1145/369546157:2(1-17)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3695461
Li SXue FLiu KGuo DHong R(2024)Multimodal Graph Causal Embedding for Multimedia-Based RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342426836:12(8842-8858)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3424268
Wang RLi CZhao Z(2024)Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution networkApplied Intelligence10.1007/s10489-024-06061-155:1Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1007/s10489-024-06061-1
Fang HSha LLiang J(2024)Multimodal recommender system based on multi-channel counterfactual learning networksMultimedia Systems10.1007/s00530-024-01448-z30:5Online publication date: 13-Aug-2024
https://dl.acm.org/doi/10.1007/s00530-024-01448-z
Guo HSheng Y(2023)Enhanced Implicit Collaborative Knowledge Graph for Recommendation2023 4th International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI)10.1109/ICHCI58871.2023.10277733(255-261)Online publication date: 4-Aug-2023
https://doi.org/10.1109/ICHCI58871.2023.10277733

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten