research-article

Enhancing Adversarial Robustness of Multi-modal Recommendation via Modality Balancing

Authors:

Jiansheng Chen,

Yong LiAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 6274 - 6282

https://doi.org/10.1145/3581783.3612337

Published: 27 October 2023 Publication History

Abstract

Recently multi-modal recommender systems have been widely applied in real scenarios such as e-commerce businesses. Existing multi-modal recommendation methods exploit the multi-modal content of items as auxiliary information and fuse them to boost performance. Despite the superior performance achieved by multi-modal recommendation models, there's currently no understanding of their robustness to adversarial attacks. In this work, we first identify the vulnerability of existing multi-modal recommendation models. Next, we show the key reason for such vulnerability is modality imbalance, i.e., the prediction score margin between positive and negative samples in the sensitive modality will drop dramatically facing adversarial attacks and fail to be compensated by other modalities. Finally, based on this finding we propose a novel defense method to enhance the robustness of multi-modal recommendation models through modality balancing. Specifically, we first adopt an embedding distillation to obtain a pair of content-similar but prediction-different item embeddings in the sensitive modality and calculate the score margin reflecting the modality vulnerability. Then we optimize the model to utilize the score margin between positive and negative samples in other modalities to compensate for the vulnerability. The proposed method can serve as a plug-and-play module and is flexible to be applied to a wide range of multi-modal recommendation models. Extensive experiments on two real-world datasets demonstrate that our method significantly improves the robustness of multi-modal recommendation models with nearly no performance degradation on clean data.

References

[1]

Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2021. A study of defensive methods to protect visual recommendation against adversarial manipulation of images. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1094--1103.

Digital Library

[2]

Feiyu Chen, Junjie Wang, Yinwei Wei, Hai-Tao Zheng, and Jie Shao. 2022. Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation. In Proceedings of the 30th ACM International Conference on Multimedia. 385--394.

Digital Library

[3]

Huiyuan Chen and Jing Li. 2019. Adversarial tensor factorization for context-aware recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 363--367.

Digital Library

[4]

Xu Chen, Hanxiong Chen, Hongteng Xu, Yongfeng Zhang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2019. Personalized fashion recommendation with visual explanations based on multimodal attention network: Towards visually explainable recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 765--774.

Digital Library

[5]

Yashar Deldjoo, Tommaso Di Noia, Eugenio Di Sciascio, and Felice Antonio Merra. 2020. How dataset characteristics affect the robustness of collaborative recommendation models. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 951--960.

Digital Library

[6]

Xiaoyu Du, Zike Wu, Fuli Feng, Xiangnan He, and Jinhui Tang. 2022. Invariant Representation Learning for Multimedia Recommendation. In Proceedings of the 30th ACM International Conference on Multimedia. 619--628.

Digital Library

[7]

Yali Du, Meng Fang, Jinfeng Yi, Chang Xu, Jun Cheng, and Dacheng Tao. 2018. Enhancing the robustness of neural collaborative filtering systems under malicious attacks. IEEE Transactions on Multimedia, Vol. 21, 3 (2018), 555--565.

[8]

Minghong Fang, Neil Zhenqiang Gong, and Jia Liu. 2020. Influence function based data poisoning attacks to top-n recommender systems. In Proceedings of The Web Conference 2020. 3019--3025.

Digital Library

[9]

Chen Gao, Yu Zheng, Nian Li, Yinfeng Li, Yingrong Qin, Jinghua Piao, Yuhan Quan, Jianxin Chang, Depeng Jin, Xiangnan He, et al. 2023. A survey of graph neural networks for recommender systems: challenges, methods, and directions. ACM Transactions on Recommender Systems, Vol. 1, 1 (2023), 1--51.

Digital Library

[10]

Chen Gao, Yu Zheng, Wenjie Wang, Fuli Feng, Xiangnan He, and Yong Li. 2022. Causal Inference in Recommender Systems: A Survey and Future Directions. arXiv preprint arXiv:2208.12397 (2022).

[11]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249--256.

[12]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).

[13]

Ihsan Gunes, Cihan Kaleli, Alper Bilge, and Huseyin Polat. 2014. Shilling attacks against recommender systems: A comprehensive survey. Artificial Intelligence Review, Vol. 42, 4 (2014).

Digital Library

[14]

Ruining He and Julian McAuley. 2016a. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web. 507--517.

Digital Library

[15]

Ruining He and Julian McAuley. 2016b. VBPR: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.

[16]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639--648.

Digital Library

[17]

Xiangnan He, Zhankui He, Xiaoyu Du, and Tat-Seng Chua. 2018. Adversarial personalized ranking for recommendation. In The 41st International ACM SIGIR conference on research & development in information retrieval. 355--364.

Digital Library

[18]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. (2015).

[19]

Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 426--434.

Digital Library

[20]

Shyong K Lam and John Riedl. 2004. Shilling recommender systems for fun and profit. In Proceedings of the 13th international conference on World Wide Web. 393--402.

Digital Library

[21]

Bo Li, Yining Wang, Aarti Singh, and Yevgeniy Vorobeychik. 2016. Data poisoning attacks on factorization-based collaborative filtering. Advances in neural information processing systems, Vol. 29 (2016).

[22]

Fan Liu, Zhiyong Cheng, Changchang Sun, Yinglong Wang, Liqiang Nie, and Mohan Kankanhalli. 2019b. User diverse preference modeling by multimodal attentive metric learning. In Proceedings of the 27th ACM international conference on multimedia. 1526--1534.

Digital Library

[23]

Shang Liu, Zhenzhong Chen, Hongyi Liu, and Xinghai Hu. 2019a. User-video co-attention network for personalized micro-video recommendation. In The World Wide Web Conference. 3020--3026.

Digital Library

[24]

Xiaohao Liu, Zhulin Tao, Jiahong Shao, Lifang Yang, and Xianglin Huang. 2022. EliMRec: Eliminating Single-modal Bias in Multimedia Recommendation. In Proceedings of the 30th ACM International Conference on Multimedia. 687--695.

Digital Library

[25]

Zhuoran Liu and Martha Larson. 2021. Adversarial item promotion: Vulnerabilities at the core of top-n recommenders that use images to address cold start. In Proceedings of the Web Conference 2021. 3590--3602.

Digital Library

[26]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representations.

[27]

Zongshen Mu, Yueting Zhuang, Jie Tan, Jun Xiao, and Siliang Tang. 2022. Learning Hybrid Behavior Patterns for Multimedia Recommendation. In Proceedings of the 30th ACM International Conference on Multimedia. 376--384.

Digital Library

[28]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. 452--461.

Digital Library

[29]

Jinhui Tang, Xiaoyu Du, Xiangnan He, Fajie Yuan, Qi Tian, and Tat-Seng Chua. 2019. Adversarial training towards robust multimedia recommender system. IEEE Transactions on Knowledge and Data Engineering, Vol. 32, 5 (2019), 855--867.

[30]

Jiaxi Tang, Hongyi Wen, and Ke Wang. 2020. Revisiting adversarially learned injection attacks against recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems. 318--327.

Digital Library

[31]

Zhulin Tao, Xiaohao Liu, Yewei Xia, Xiang Wang, Lifang Yang, Xianglin Huang, and Tat-Seng Chua. 2022. Self-supervised learning for multimedia recommendation. IEEE Transactions on Multimedia (2022).

Digital Library

[32]

Zhulin Tao, Yinwei Wei, Xiang Wang, Xiangnan He, Xianglin Huang, and Tat-Seng Chua. 2020. Mgat: Multimodal graph attention network for recommendation. Information Processing & Management, Vol. 57, 5 (2020), 102277.

[33]

Nhu-Thuat Tran and Hady W Lauw. 2022. Aligning Dual Disentangled User Representations from Ratings and Textual Content. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1798--1806.

Digital Library

[34]

Haoyu Wang, Nan Shao, and Defu Lian. 2019b. Adversarial binary collaborative filtering for implicit feedback. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5248--5255.

Digital Library

[35]

Qifan Wang, Yinwei Wei, Jianhua Yin, Jianlong Wu, Xuemeng Song, and Liqiang Nie. 2021. Dualgnn: Dual graph neural network for multimedia recommendation. IEEE Transactions on Multimedia (2021).

[36]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019a. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval. 165--174.

Digital Library

[37]

Yinwei Wei, Xiang Wang, Xiangnan He, Liqiang Nie, Yong Rui, and Tat-Seng Chua. 2021. Hierarchical user intent graph network for multimedia recommendation. IEEE Transactions on Multimedia, Vol. 24 (2021), 2701--2712.

Digital Library

[38]

Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2020. Graph-refined convolutional network for multimedia recommendation with implicit feedback. In Proceedings of the 28th ACM international conference on multimedia. 3541--3549.

Digital Library

[39]

Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM international conference on multimedia. 1437--1445.

Digital Library

[40]

Chenwang Wu, Defu Lian, Yong Ge, Zhihao Zhu, Enhong Chen, and Senchao Yuan. 2021. Fight fire with fire: towards robust recommender systems via adversarial poisoning training. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1074--1083.

Digital Library

[41]

Zixuan Yi, Xi Wang, Iadh Ounis, and Craig Macdonald. 2022. Multi-modal graph contrastive learning for micro-video recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1807--1811.

Digital Library

[42]

Feng Yuan, Lina Yao, and Boualem Benatallah. 2019. Adversarial collaborative neural network for robust recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1065--1068.

Digital Library

[43]

Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 353--362.

Digital Library

[44]

Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining latent structures for multimedia recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 3872--3880.

Digital Library

[45]

Hongyu Zhou, Xin Zhou, Zhiwei Zeng, Lingzi Zhang, and Zhiqi Shen. 2023. A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions. arXiv preprint arXiv:2302.04473 (2023).

[46]

Xin Zhou, Hongyu Zhou, Yong Liu, Zhiwei Zeng, Chunyan Miao, Pengwei Wang, Yuan You, and Feijun Jiang. 2022. Bootstrap latent representations for multi-modal recommendation. arXiv preprint arXiv:2207.05969 (2022).

Cited By

Liu QHu JXiao YZhao XGao JWang WLi QTang J(2024)Multimodal Recommender Systems: A SurveyACM Computing Surveys10.1145/369546157:2(1-17)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3695461
Oh SUstun BMcauley JKumar S(2024)FINEST: Stabilizing Recommendations by Rank-Preserving Fine-TuningACM Transactions on Knowledge Discovery from Data10.1145/369525618:9(1-22)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1145/3695256
Zhang YXie RChen JSun XWang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe QuestionsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3685510(11175-11183)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3685510

Index Terms

Enhancing Adversarial Robustness of Multi-modal Recommendation via Modality Balancing
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Multi-Modal Self-Supervised Learning for Recommendation
WWW '23: Proceedings of the ACM Web Conference 2023

The online emergence of multi-modal sharing platforms (e.g., TikTok, Youtube) is powering personalized recommender systems to incorporate various modalities (e.g., visual, textual and acoustic) into the latent user representations. While existing works ...
Bootstrap Latent Representations for Multi-modal Recommendation
WWW '23: Proceedings of the ACM Web Conference 2023

This paper studies the multi-modal recommendation problem, where the item multi-modality information (e.g., images and textual descriptions) is exploited to improve the recommendation accuracy. Besides the user-item interaction graph, existing state-of-...
Preference-Aware Modality Representation and Fusion for Micro-video Recommendation
Pattern Recognition and Computer Vision
Abstract
Personalized multi-modal micro-video recommendation has attracted increasing research interests recently. Despite existing methods have achieved much progress, they ignore the importance of the user’s modality preference for micro-video ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The National Natural Science Foundation of China
The National Key R&D Program of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
422
Total Downloads

Downloads (Last 12 months)254
Downloads (Last 6 weeks)25

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu QHu JXiao YZhao XGao JWang WLi QTang J(2024)Multimodal Recommender Systems: A SurveyACM Computing Surveys10.1145/369546157:2(1-17)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3695461
Oh SUstun BMcauley JKumar S(2024)FINEST: Stabilizing Recommendations by Rank-Preserving Fine-TuningACM Transactions on Knowledge Discovery from Data10.1145/369525618:9(1-22)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1145/3695256
Zhang YXie RChen JSun XWang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe QuestionsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3685510(11175-11183)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3685510

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten