Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning

Lei, Shan; Huanhuan, Yuan; Pengpeng, Zhao; Jianfeng, Qu; Junhua, Fang; Guanfeng, Liu; Victor S., Sheng

doi:10.1007/s10844-023-00807-y

Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning

Research
Published: 29 August 2023

Volume 62, pages 143–161, (2024)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Shan Lei¹,
Yuan Huanhuan¹,
Zhao Pengpeng¹,
Qu Jianfeng¹,
Fang Junhua¹,
Liu Guanfeng² &
…
Sheng Victor S.³

242 Accesses
1 Citation
Explore all metrics

Abstract

The multimodal side information such as images and text have been commonly used as supplements to improve graph collaborative filtering recommendations. However, there is often a semantic gap between multimodal information and collaborative filtering information. Previous works often directly fuse or align these information, which results in semantic distortion or degradation. Additionally, multimodal information also introduces additional noises, and previous methods lack explicit supervision to identify these noises. To tackle the issues, we propose a novel contrastive learning approach to improve graph collaborative filtering, named Multimodal-Side-Information-enriched Contrastive Learning (MSICL), which does not fuse multimodal information directly, but still explicitly captures users’ potential preferences for similar images or text by contrasting ID embeddings, and filters noises in multimodal side information. Specifically, we first search for samples with similar images or text as positive contrastive pairs. Secondly, some searched sample pairs may be irrelevant, so we distinguish the noise by filtering out sample pairs that have no interaction relationship. Thirdly, we contrast the ID embeddings of the true positive sample pairs to excavate the potential similarity relationship in multimodal side information. Extensive experiments on three datasets demonstrate the superiority of our method in multimodal recommendation. Moreover, our approach significantly reduces computation and memory cost compared to previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recommendation system based on deep learning methods: a systematic review and new directions

Article 03 August 2019

Enhancing Recommender System with Multi-modal Knowledge Graph

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Availability of supporting data

Amazon Sports, Clothing and Toys are openly available dataset and can be downloaded from their official website: http://jmcauley.ucsd.edu/data/amazon/links.html. Our code is available at https://anonymous.4open.science/r/MSICL.

Notes

http://jmcauley.ucsd.edu/data/amazon/links.html
Our code is available at https://anonymous.4open.science/r/MSICL.

References

Albanese, M., d’Acierno, A., Moscato, V., et al. (2013). A multimedia recommender system. ACM Transactions on Internet Technology (TOIT), 13(1), 1–32.
Article Google Scholar
Baluja S., Seth R., Sivakumar D., et al. (2008) Video suggestion and discovery for youtube: taking random walks through the view graph. In: The Web Conference, pp. 895–904, https://doi.org/10.1145/1367497.1367618
Chen J., Zhang H., He X., et al. (2017) Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In: Conference on neural information processing systems, pp. 335–344. https://doi.org/10.1145/3077136.3080797
Chen T., Kornblith S., Norouzi M., et al. (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp. 1597–1607. https://doi.org/10.5555/3524938.3525087
Giorgi J., Nitski O., Wang B., et al. (2020) Declutr: Deep contrastive learning for unsupervised textual representations. arXiv:2006.03659. https://doi.org/10.48550
Grill, J. B., Strub, F., Altché, F., et al. (2020). Bootstrap your own latent-a new approach to self-supervised learning. Conference on Neural Information Processing Systems, 33, 21271–21284. https://doi.org/10.5555/3495724.3497510
Article Google Scholar
Han T., Wang P., Niu S., et al. (2022) Modality matches modality: Pretraining modality-disentangled item representations for recommendation. In: The web conference, pp. 2058–2066. https://doi.org/10.1145/3485447.3512079
He R., McAuley J. (2016) Vbpr: visual bayesian personalized ranking from implicit feedback. In: Association for the advancement of artificial intelligence. https://doi.org/10.1609/aaai.v30i1.9973
He X., Liao L., Zhang H., et al. (2017) Neural collaborative filtering. In: The web conference, pp. 173–182. https://doi.org/10.1145/3038912.3052569
He X., Deng K., Wang X., et al. (2020) Lightgcn: Simplifying and powering graph convolution network for recommendation. In: ACM SIGIR conference on research and development in information retrieval, pp. 639–648. https://doi.org/10.1145/3397271.3401063
Kim T., Lee Y.C., Shin K., et al. (2022) Mario: Modality-aware attention and modality-preserving decoders for multimedia recommendation. In: ACM Conference on information and knowledge management, pp. 993–1002
La Gatta V., Moscato V., Pennone M., et al. (2022) Music recommendation via hypergraph embedding. IEEE Transactions on Neural Networks and Learning Systems
Lan Z., Chen M., Goodman S., et al. (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942. https://doi.org/10.48550
Lee N., Lee J., Park C. (2022) Augmentation-free self-supervised learning on graphs. In: Association for the advancement of artificial intelligence, pp. 7372–7380. https://doi.org/10.1609/aaai.v36i7.20700
Lin Z., Tian C., Hou Y., et al. (2022) Improving graph collaborative filtering with neighborhood-enriched contrastive learning. In: The web conference, pp. 2320–2329. https://doi.org/10.1145/3485447.3512104
Liu C., Li X., Cai G., et al. (2021a) Noninvasive self-attention for side information fusion in sequential recommendation. In: Association for the advancement of artificial intelligence, pp. 4249–4256. https://doi.org/10.1609/aaai.v35i5.16549
Liu Q., Wu S., Wang L. (2017) Deepstyle: Learning user preferences for visual recommendation. In: ACM SIGIR conference on research and development in information retrieval, pp. 841–844. https://doi.org/10.1145/3077136.3080658
Liu Y., Yang S., Lei C., et al. (2021b) Pre-training graph transformer with multimodal side information for recommendation. In: ACM multimedia conference, pp. 2853–2861. https://doi.org/10.1145/3474085.3475709
Mao K., Zhu J., Xiao X., et al. (2021) Ultragcn: ultra simplification of graph convolutional networks for recommendation. In: ACM conference on information and knowledge management, pp. 1253–1262. https://doi.org/10.1145/3459637.3482291
McPherson M., Smith-Lovin L., Cook J.M. (2001) Birds of a feather: Homophily in social networks. Annual Review of Sociology, pp. 415–444. https://doi.org/10.1146/annurev.soc.27.1.415
Moscato, V., Picariello, A., & Sperli, G. (2020). An emotional recommender system for music. IEEE Intelligent Systems, 36(5), 57–68.
Article Google Scholar
Rendle S., Freudenthaler C., Gantner Z., et al. (2012) Bpr: Bayesian personalized ranking from implicit feedback. arXiv:1205.2618. https://doi.org/10.48550
Wang X., He X., Wang M., et al. (2019) Neural graph collaborative filtering. In: ACM SIGIR conference on research and development in information retrieval, pp. 165–174. https://doi.org/10.1145/3331184.3331267
Wei Y., Wang X., Nie L., et al. (2019) Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. In: ACM multimedia conference, pp. 1437–1445. https://doi.org/10.1145/3343031.3351034
Wei Y., Wang X., Nie L., et al. (2020) Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: ACM multimedia conference, pp. 3541–3549,. https://doi.org/10.1145/3394171.3413556
Wu C., Wu F., Qi T., et al. (2021a) Mm-rec: multimodal news recommendation. arXiv:2104.07407. https://doi.org/10.48550
Wu J., Wang X., Feng F., et al. (2021b) Self-supervised graph learning for recommendation. In: ACM SIGIR conference on research and development in information retrieval, pp. 726–735. https://doi.org/10.1145/3404835.3462862
Xia J., Wu L., Chen J., et al. (2022) Simgrace: A simple framework for graph contrastive learning without data augmentation. In: The web conference, pp. 1070–1079. https://doi.org/10.1145/3485447.3512156
Xie, Y., Zhou, P., & Kim, S. (2022). Decoupled side information fusion for sequential recommendation. ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/3477495.3531963
Article Google Scholar
Zhang J., Zhu Y., Liu Q., et al. (2021a) Mining latent structures for multimedia recommendation. In: ACM multimedia conference, pp. 3872–3880. https://doi.org/10.1145/3474085.3475259
Zhang, J., Zhu, Y., Liu, Q., et al. (2021). Latent structures mining with contrastive modality fusion for multimedia recommendation. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2022.3221949
Article PubMed Central PubMed Google Scholar
Zhao W.X., Chen J., Wang P., et al. (2020) Revisiting alternative experimental settings for evaluating top-n item recommendation algorithms. In: ACM conference on information and knowledge management, pp. 2329–2332. https://doi.org/10.1145/3340531.3412095
Zhao W.X., Mu S., Hou Y., et al. (2021) Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In: ACM conference on information and knowledge management, pp. 4653–4664. https://doi.org/10.1145/3459637.3482016
Zhou H., Zhou X., Shen Z. (2023a) Enhancing dyadic relations with homogeneous graphs for multimodal recommendation. arXiv:2301.12097
Zhou X. (2022) A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. arXiv:2211.06924
Zhou X., Zhou H., Liu Y., et al. (2023b) Bootstrap latent representations for multi-modal recommendation. In: The web conference, pp. 845–854
Zhu Y., Xu Y., Yu F., et al. (2021) Graph contrastive learning with adaptive augmentation. In: The web conference, pp. 2069–2080. https://doi.org/10.1145/3442381.3449802

Download references

Acknowledgements

This research was partially supported by the NSFC (61876117, 62176175), the major project of natural science research in Universities of Jiangsu Province (21KJA520004), Suzhou Science and Technology Development Program (SYC2022139).

Funding

No Funding

Author information

Authors and Affiliations

The College of Computer Science and Technology, Soochow University, 21500, Suzhou, Jiangsu, China
Shan Lei, Yuan Huanhuan, Zhao Pengpeng, Qu Jianfeng & Fang Junhua
Texas Tech University, Lubbock, Texas, USA
Liu Guanfeng
Department of Computing, Macquarie University, Sydney, Sydney, Australia
Sheng Victor S.

Authors

Shan Lei
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Huanhuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Pengpeng
View author publications
You can also search for this author in PubMed Google Scholar
Qu Jianfeng
View author publications
You can also search for this author in PubMed Google Scholar
Fang Junhua
View author publications
You can also search for this author in PubMed Google Scholar
Liu Guanfeng
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Victor S.
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Lei Shan was responsible for the conceptualization and design of the study. Huanhuan Yuan and Pengpeng Zhao contributed to the main modifications of the manuscript. Jianfeng Qu, Junhua Fang, Guanfeng Liu, and Victor Sheng reviewed and provided critical feedback on the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhao Pengpeng.

Ethics declarations

Competing interests

The authors have no conficts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lei, S., Huanhuan, Y., Pengpeng, Z. et al. Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning. J Intell Inf Syst 62, 143–161 (2024). https://doi.org/10.1007/s10844-023-00807-y

Download citation

Received: 01 May 2023
Revised: 09 July 2023
Accepted: 31 July 2023
Published: 29 August 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s10844-023-00807-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning

Abstract

Access this article

Similar content being viewed by others

Recommendation system based on deep learning methods: a systematic review and new directions

Enhancing Recommender System with Multi-modal Knowledge Graph

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Availability of supporting data

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning

Abstract

Access this article

Similar content being viewed by others

Recommendation system based on deep learning methods: a systematic review and new directions

Enhancing Recommender System with Multi-modal Knowledge Graph

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Availability of supporting data

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation