Abstract
The multimodal side information such as images and text have been commonly used as supplements to improve graph collaborative filtering recommendations. However, there is often a semantic gap between multimodal information and collaborative filtering information. Previous works often directly fuse or align these information, which results in semantic distortion or degradation. Additionally, multimodal information also introduces additional noises, and previous methods lack explicit supervision to identify these noises. To tackle the issues, we propose a novel contrastive learning approach to improve graph collaborative filtering, named Multimodal-Side-Information-enriched Contrastive Learning (MSICL), which does not fuse multimodal information directly, but still explicitly captures users’ potential preferences for similar images or text by contrasting ID embeddings, and filters noises in multimodal side information. Specifically, we first search for samples with similar images or text as positive contrastive pairs. Secondly, some searched sample pairs may be irrelevant, so we distinguish the noise by filtering out sample pairs that have no interaction relationship. Thirdly, we contrast the ID embeddings of the true positive sample pairs to excavate the potential similarity relationship in multimodal side information. Extensive experiments on three datasets demonstrate the superiority of our method in multimodal recommendation. Moreover, our approach significantly reduces computation and memory cost compared to previous work.
Similar content being viewed by others
Availability of supporting data
Amazon Sports, Clothing and Toys are openly available dataset and can be downloaded from their official website: http://jmcauley.ucsd.edu/data/amazon/links.html. Our code is available at https://anonymous.4open.science/r/MSICL.
Notes
Our code is available at https://anonymous.4open.science/r/MSICL.
References
Albanese, M., d’Acierno, A., Moscato, V., et al. (2013). A multimedia recommender system. ACM Transactions on Internet Technology (TOIT), 13(1), 1–32.
Baluja S., Seth R., Sivakumar D., et al. (2008) Video suggestion and discovery for youtube: taking random walks through the view graph. In: The Web Conference, pp. 895–904, https://doi.org/10.1145/1367497.1367618
Chen J., Zhang H., He X., et al. (2017) Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In: Conference on neural information processing systems, pp. 335–344. https://doi.org/10.1145/3077136.3080797
Chen T., Kornblith S., Norouzi M., et al. (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp. 1597–1607. https://doi.org/10.5555/3524938.3525087
Giorgi J., Nitski O., Wang B., et al. (2020) Declutr: Deep contrastive learning for unsupervised textual representations. arXiv:2006.03659. https://doi.org/10.48550
Grill, J. B., Strub, F., Altché, F., et al. (2020). Bootstrap your own latent-a new approach to self-supervised learning. Conference on Neural Information Processing Systems, 33, 21271–21284. https://doi.org/10.5555/3495724.3497510
Han T., Wang P., Niu S., et al. (2022) Modality matches modality: Pretraining modality-disentangled item representations for recommendation. In: The web conference, pp. 2058–2066. https://doi.org/10.1145/3485447.3512079
He R., McAuley J. (2016) Vbpr: visual bayesian personalized ranking from implicit feedback. In: Association for the advancement of artificial intelligence. https://doi.org/10.1609/aaai.v30i1.9973
He X., Liao L., Zhang H., et al. (2017) Neural collaborative filtering. In: The web conference, pp. 173–182. https://doi.org/10.1145/3038912.3052569
He X., Deng K., Wang X., et al. (2020) Lightgcn: Simplifying and powering graph convolution network for recommendation. In: ACM SIGIR conference on research and development in information retrieval, pp. 639–648. https://doi.org/10.1145/3397271.3401063
Kim T., Lee Y.C., Shin K., et al. (2022) Mario: Modality-aware attention and modality-preserving decoders for multimedia recommendation. In: ACM Conference on information and knowledge management, pp. 993–1002
La Gatta V., Moscato V., Pennone M., et al. (2022) Music recommendation via hypergraph embedding. IEEE Transactions on Neural Networks and Learning Systems
Lan Z., Chen M., Goodman S., et al. (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942. https://doi.org/10.48550
Lee N., Lee J., Park C. (2022) Augmentation-free self-supervised learning on graphs. In: Association for the advancement of artificial intelligence, pp. 7372–7380. https://doi.org/10.1609/aaai.v36i7.20700
Lin Z., Tian C., Hou Y., et al. (2022) Improving graph collaborative filtering with neighborhood-enriched contrastive learning. In: The web conference, pp. 2320–2329. https://doi.org/10.1145/3485447.3512104
Liu C., Li X., Cai G., et al. (2021a) Noninvasive self-attention for side information fusion in sequential recommendation. In: Association for the advancement of artificial intelligence, pp. 4249–4256. https://doi.org/10.1609/aaai.v35i5.16549
Liu Q., Wu S., Wang L. (2017) Deepstyle: Learning user preferences for visual recommendation. In: ACM SIGIR conference on research and development in information retrieval, pp. 841–844. https://doi.org/10.1145/3077136.3080658
Liu Y., Yang S., Lei C., et al. (2021b) Pre-training graph transformer with multimodal side information for recommendation. In: ACM multimedia conference, pp. 2853–2861. https://doi.org/10.1145/3474085.3475709
Mao K., Zhu J., Xiao X., et al. (2021) Ultragcn: ultra simplification of graph convolutional networks for recommendation. In: ACM conference on information and knowledge management, pp. 1253–1262. https://doi.org/10.1145/3459637.3482291
McPherson M., Smith-Lovin L., Cook J.M. (2001) Birds of a feather: Homophily in social networks. Annual Review of Sociology, pp. 415–444. https://doi.org/10.1146/annurev.soc.27.1.415
Moscato, V., Picariello, A., & Sperli, G. (2020). An emotional recommender system for music. IEEE Intelligent Systems, 36(5), 57–68.
Rendle S., Freudenthaler C., Gantner Z., et al. (2012) Bpr: Bayesian personalized ranking from implicit feedback. arXiv:1205.2618. https://doi.org/10.48550
Wang X., He X., Wang M., et al. (2019) Neural graph collaborative filtering. In: ACM SIGIR conference on research and development in information retrieval, pp. 165–174. https://doi.org/10.1145/3331184.3331267
Wei Y., Wang X., Nie L., et al. (2019) Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. In: ACM multimedia conference, pp. 1437–1445. https://doi.org/10.1145/3343031.3351034
Wei Y., Wang X., Nie L., et al. (2020) Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: ACM multimedia conference, pp. 3541–3549,. https://doi.org/10.1145/3394171.3413556
Wu C., Wu F., Qi T., et al. (2021a) Mm-rec: multimodal news recommendation. arXiv:2104.07407. https://doi.org/10.48550
Wu J., Wang X., Feng F., et al. (2021b) Self-supervised graph learning for recommendation. In: ACM SIGIR conference on research and development in information retrieval, pp. 726–735. https://doi.org/10.1145/3404835.3462862
Xia J., Wu L., Chen J., et al. (2022) Simgrace: A simple framework for graph contrastive learning without data augmentation. In: The web conference, pp. 1070–1079. https://doi.org/10.1145/3485447.3512156
Xie, Y., Zhou, P., & Kim, S. (2022). Decoupled side information fusion for sequential recommendation. ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/3477495.3531963
Zhang J., Zhu Y., Liu Q., et al. (2021a) Mining latent structures for multimedia recommendation. In: ACM multimedia conference, pp. 3872–3880. https://doi.org/10.1145/3474085.3475259
Zhang, J., Zhu, Y., Liu, Q., et al. (2021). Latent structures mining with contrastive modality fusion for multimedia recommendation. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2022.3221949
Zhao W.X., Chen J., Wang P., et al. (2020) Revisiting alternative experimental settings for evaluating top-n item recommendation algorithms. In: ACM conference on information and knowledge management, pp. 2329–2332. https://doi.org/10.1145/3340531.3412095
Zhao W.X., Mu S., Hou Y., et al. (2021) Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In: ACM conference on information and knowledge management, pp. 4653–4664. https://doi.org/10.1145/3459637.3482016
Zhou H., Zhou X., Shen Z. (2023a) Enhancing dyadic relations with homogeneous graphs for multimodal recommendation. arXiv:2301.12097
Zhou X. (2022) A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. arXiv:2211.06924
Zhou X., Zhou H., Liu Y., et al. (2023b) Bootstrap latent representations for multi-modal recommendation. In: The web conference, pp. 845–854
Zhu Y., Xu Y., Yu F., et al. (2021) Graph contrastive learning with adaptive augmentation. In: The web conference, pp. 2069–2080. https://doi.org/10.1145/3442381.3449802
Acknowledgements
This research was partially supported by the NSFC (61876117, 62176175), the major project of natural science research in Universities of Jiangsu Province (21KJA520004), Suzhou Science and Technology Development Program (SYC2022139).
Funding
No Funding
Author information
Authors and Affiliations
Contributions
Lei Shan was responsible for the conceptualization and design of the study. Huanhuan Yuan and Pengpeng Zhao contributed to the main modifications of the manuscript. Jianfeng Qu, Junhua Fang, Guanfeng Liu, and Victor Sheng reviewed and provided critical feedback on the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors have no conficts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lei, S., Huanhuan, Y., Pengpeng, Z. et al. Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning. J Intell Inf Syst 62, 143–161 (2024). https://doi.org/10.1007/s10844-023-00807-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-023-00807-y