ABSTRACT
Recently multi-modal recommender systems have been widely applied in real scenarios such as e-commerce businesses. Existing multi-modal recommendation methods exploit the multi-modal content of items as auxiliary information and fuse them to boost performance. Despite the superior performance achieved by multi-modal recommendation models, there's currently no understanding of their robustness to adversarial attacks. In this work, we first identify the vulnerability of existing multi-modal recommendation models. Next, we show the key reason for such vulnerability is modality imbalance, i.e., the prediction score margin between positive and negative samples in the sensitive modality will drop dramatically facing adversarial attacks and fail to be compensated by other modalities. Finally, based on this finding we propose a novel defense method to enhance the robustness of multi-modal recommendation models through modality balancing. Specifically, we first adopt an embedding distillation to obtain a pair of content-similar but prediction-different item embeddings in the sensitive modality and calculate the score margin reflecting the modality vulnerability. Then we optimize the model to utilize the score margin between positive and negative samples in other modalities to compensate for the vulnerability. The proposed method can serve as a plug-and-play module and is flexible to be applied to a wide range of multi-modal recommendation models. Extensive experiments on two real-world datasets demonstrate that our method significantly improves the robustness of multi-modal recommendation models with nearly no performance degradation on clean data.
- Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2021. A study of defensive methods to protect visual recommendation against adversarial manipulation of images. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1094--1103.Google ScholarDigital Library
- Feiyu Chen, Junjie Wang, Yinwei Wei, Hai-Tao Zheng, and Jie Shao. 2022. Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation. In Proceedings of the 30th ACM International Conference on Multimedia. 385--394.Google ScholarDigital Library
- Huiyuan Chen and Jing Li. 2019. Adversarial tensor factorization for context-aware recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 363--367.Google ScholarDigital Library
- Xu Chen, Hanxiong Chen, Hongteng Xu, Yongfeng Zhang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2019. Personalized fashion recommendation with visual explanations based on multimodal attention network: Towards visually explainable recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 765--774.Google ScholarDigital Library
- Yashar Deldjoo, Tommaso Di Noia, Eugenio Di Sciascio, and Felice Antonio Merra. 2020. How dataset characteristics affect the robustness of collaborative recommendation models. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 951--960.Google ScholarDigital Library
- Xiaoyu Du, Zike Wu, Fuli Feng, Xiangnan He, and Jinhui Tang. 2022. Invariant Representation Learning for Multimedia Recommendation. In Proceedings of the 30th ACM International Conference on Multimedia. 619--628.Google ScholarDigital Library
- Yali Du, Meng Fang, Jinfeng Yi, Chang Xu, Jun Cheng, and Dacheng Tao. 2018. Enhancing the robustness of neural collaborative filtering systems under malicious attacks. IEEE Transactions on Multimedia, Vol. 21, 3 (2018), 555--565.Google ScholarCross Ref
- Minghong Fang, Neil Zhenqiang Gong, and Jia Liu. 2020. Influence function based data poisoning attacks to top-n recommender systems. In Proceedings of The Web Conference 2020. 3019--3025.Google ScholarDigital Library
- Chen Gao, Yu Zheng, Nian Li, Yinfeng Li, Yingrong Qin, Jinghua Piao, Yuhan Quan, Jianxin Chang, Depeng Jin, Xiangnan He, et al. 2023. A survey of graph neural networks for recommender systems: challenges, methods, and directions. ACM Transactions on Recommender Systems, Vol. 1, 1 (2023), 1--51.Google ScholarDigital Library
- Chen Gao, Yu Zheng, Wenjie Wang, Fuli Feng, Xiangnan He, and Yong Li. 2022. Causal Inference in Recommender Systems: A Survey and Future Directions. arXiv preprint arXiv:2208.12397 (2022).Google Scholar
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249--256.Google Scholar
- Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google Scholar
- Ihsan Gunes, Cihan Kaleli, Alper Bilge, and Huseyin Polat. 2014. Shilling attacks against recommender systems: A comprehensive survey. Artificial Intelligence Review, Vol. 42, 4 (2014).Google ScholarDigital Library
- Ruining He and Julian McAuley. 2016a. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web. 507--517.Google ScholarDigital Library
- Ruining He and Julian McAuley. 2016b. VBPR: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.Google ScholarCross Ref
- Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639--648.Google ScholarDigital Library
- Xiangnan He, Zhankui He, Xiaoyu Du, and Tat-Seng Chua. 2018. Adversarial personalized ranking for recommendation. In The 41st International ACM SIGIR conference on research & development in information retrieval. 355--364.Google ScholarDigital Library
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. (2015).Google Scholar
- Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 426--434.Google ScholarDigital Library
- Shyong K Lam and John Riedl. 2004. Shilling recommender systems for fun and profit. In Proceedings of the 13th international conference on World Wide Web. 393--402.Google ScholarDigital Library
- Bo Li, Yining Wang, Aarti Singh, and Yevgeniy Vorobeychik. 2016. Data poisoning attacks on factorization-based collaborative filtering. Advances in neural information processing systems, Vol. 29 (2016).Google Scholar
- Fan Liu, Zhiyong Cheng, Changchang Sun, Yinglong Wang, Liqiang Nie, and Mohan Kankanhalli. 2019b. User diverse preference modeling by multimodal attentive metric learning. In Proceedings of the 27th ACM international conference on multimedia. 1526--1534.Google ScholarDigital Library
- Shang Liu, Zhenzhong Chen, Hongyi Liu, and Xinghai Hu. 2019a. User-video co-attention network for personalized micro-video recommendation. In The World Wide Web Conference. 3020--3026.Google ScholarDigital Library
- Xiaohao Liu, Zhulin Tao, Jiahong Shao, Lifang Yang, and Xianglin Huang. 2022. EliMRec: Eliminating Single-modal Bias in Multimedia Recommendation. In Proceedings of the 30th ACM International Conference on Multimedia. 687--695.Google ScholarDigital Library
- Zhuoran Liu and Martha Larson. 2021. Adversarial item promotion: Vulnerabilities at the core of top-n recommenders that use images to address cold start. In Proceedings of the Web Conference 2021. 3590--3602.Google ScholarDigital Library
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representations.Google Scholar
- Zongshen Mu, Yueting Zhuang, Jie Tan, Jun Xiao, and Siliang Tang. 2022. Learning Hybrid Behavior Patterns for Multimedia Recommendation. In Proceedings of the 30th ACM International Conference on Multimedia. 376--384.Google ScholarDigital Library
- Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. 452--461.Google ScholarDigital Library
- Jinhui Tang, Xiaoyu Du, Xiangnan He, Fajie Yuan, Qi Tian, and Tat-Seng Chua. 2019. Adversarial training towards robust multimedia recommender system. IEEE Transactions on Knowledge and Data Engineering, Vol. 32, 5 (2019), 855--867.Google ScholarCross Ref
- Jiaxi Tang, Hongyi Wen, and Ke Wang. 2020. Revisiting adversarially learned injection attacks against recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems. 318--327.Google ScholarDigital Library
- Zhulin Tao, Xiaohao Liu, Yewei Xia, Xiang Wang, Lifang Yang, Xianglin Huang, and Tat-Seng Chua. 2022. Self-supervised learning for multimedia recommendation. IEEE Transactions on Multimedia (2022).Google ScholarDigital Library
- Zhulin Tao, Yinwei Wei, Xiang Wang, Xiangnan He, Xianglin Huang, and Tat-Seng Chua. 2020. Mgat: Multimodal graph attention network for recommendation. Information Processing & Management, Vol. 57, 5 (2020), 102277.Google ScholarCross Ref
- Nhu-Thuat Tran and Hady W Lauw. 2022. Aligning Dual Disentangled User Representations from Ratings and Textual Content. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1798--1806.Google ScholarDigital Library
- Haoyu Wang, Nan Shao, and Defu Lian. 2019b. Adversarial binary collaborative filtering for implicit feedback. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5248--5255.Google ScholarDigital Library
- Qifan Wang, Yinwei Wei, Jianhua Yin, Jianlong Wu, Xuemeng Song, and Liqiang Nie. 2021. Dualgnn: Dual graph neural network for multimedia recommendation. IEEE Transactions on Multimedia (2021).Google Scholar
- Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019a. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval. 165--174.Google ScholarDigital Library
- Yinwei Wei, Xiang Wang, Xiangnan He, Liqiang Nie, Yong Rui, and Tat-Seng Chua. 2021. Hierarchical user intent graph network for multimedia recommendation. IEEE Transactions on Multimedia, Vol. 24 (2021), 2701--2712.Google ScholarDigital Library
- Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2020. Graph-refined convolutional network for multimedia recommendation with implicit feedback. In Proceedings of the 28th ACM international conference on multimedia. 3541--3549.Google ScholarDigital Library
- Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM international conference on multimedia. 1437--1445.Google ScholarDigital Library
- Chenwang Wu, Defu Lian, Yong Ge, Zhihao Zhu, Enhong Chen, and Senchao Yuan. 2021. Fight fire with fire: towards robust recommender systems via adversarial poisoning training. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1074--1083.Google ScholarDigital Library
- Zixuan Yi, Xi Wang, Iadh Ounis, and Craig Macdonald. 2022. Multi-modal graph contrastive learning for micro-video recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1807--1811.Google ScholarDigital Library
- Feng Yuan, Lina Yao, and Boualem Benatallah. 2019. Adversarial collaborative neural network for robust recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1065--1068.Google ScholarDigital Library
- Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 353--362.Google ScholarDigital Library
- Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining latent structures for multimedia recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 3872--3880.Google ScholarDigital Library
- Hongyu Zhou, Xin Zhou, Zhiwei Zeng, Lingzi Zhang, and Zhiqi Shen. 2023. A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions. arXiv preprint arXiv:2302.04473 (2023).Google Scholar
- Xin Zhou, Hongyu Zhou, Yong Liu, Zhiwei Zeng, Chunyan Miao, Pengwei Wang, Yuan You, and Feijun Jiang. 2022. Bootstrap latent representations for multi-modal recommendation. arXiv preprint arXiv:2207.05969 (2022).Google Scholar
Index Terms
- Enhancing Adversarial Robustness of Multi-modal Recommendation via Modality Balancing
Recommendations
Multi-modal Mixture of Experts Represetation Learning for Sequential Recommendation
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge ManagementWithin online platforms, it is critical to capture the dynamic user preference from the sequential interaction behaviors for making accurate recommendation over time. Recently, significant progress has been made in sequential recommendation with deep ...
Preference-Aware Modality Representation and Fusion for Micro-video Recommendation
Pattern Recognition and Computer VisionAbstractPersonalized multi-modal micro-video recommendation has attracted increasing research interests recently. Despite existing methods have achieved much progress, they ignore the importance of the user’s modality preference for micro-video ...
Bootstrap Latent Representations for Multi-modal Recommendation
WWW '23: Proceedings of the ACM Web Conference 2023This paper studies the multi-modal recommendation problem, where the item multi-modality information (e.g., images and textual descriptions) is exploited to improve the recommendation accuracy. Besides the user-item interaction graph, existing state-of-...
Comments