Skip to main content
Log in

Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning

  • Research
  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The multimodal side information such as images and text have been commonly used as supplements to improve graph collaborative filtering recommendations. However, there is often a semantic gap between multimodal information and collaborative filtering information. Previous works often directly fuse or align these information, which results in semantic distortion or degradation. Additionally, multimodal information also introduces additional noises, and previous methods lack explicit supervision to identify these noises. To tackle the issues, we propose a novel contrastive learning approach to improve graph collaborative filtering, named Multimodal-Side-Information-enriched Contrastive Learning (MSICL), which does not fuse multimodal information directly, but still explicitly captures users’ potential preferences for similar images or text by contrasting ID embeddings, and filters noises in multimodal side information. Specifically, we first search for samples with similar images or text as positive contrastive pairs. Secondly, some searched sample pairs may be irrelevant, so we distinguish the noise by filtering out sample pairs that have no interaction relationship. Thirdly, we contrast the ID embeddings of the true positive sample pairs to excavate the potential similarity relationship in multimodal side information. Extensive experiments on three datasets demonstrate the superiority of our method in multimodal recommendation. Moreover, our approach significantly reduces computation and memory cost compared to previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of supporting data

Amazon Sports, Clothing and Toys are openly available dataset and can be downloaded from their official website: http://jmcauley.ucsd.edu/data/amazon/links.html. Our code is available at https://anonymous.4open.science/r/MSICL.

Notes

  1. http://jmcauley.ucsd.edu/data/amazon/links.html

  2. Our code is available at https://anonymous.4open.science/r/MSICL.

References

  • Albanese, M., d’Acierno, A., Moscato, V., et al. (2013). A multimedia recommender system. ACM Transactions on Internet Technology (TOIT), 13(1), 1–32.

    Article  Google Scholar 

  • Baluja S., Seth R., Sivakumar D., et al. (2008) Video suggestion and discovery for youtube: taking random walks through the view graph. In: The Web Conference, pp. 895–904, https://doi.org/10.1145/1367497.1367618

  • Chen J., Zhang H., He X., et al. (2017) Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In: Conference on neural information processing systems, pp. 335–344. https://doi.org/10.1145/3077136.3080797

  • Chen T., Kornblith S., Norouzi M., et al. (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp. 1597–1607. https://doi.org/10.5555/3524938.3525087

  • Giorgi J., Nitski O., Wang B., et al. (2020) Declutr: Deep contrastive learning for unsupervised textual representations. arXiv:2006.03659. https://doi.org/10.48550

  • Grill, J. B., Strub, F., Altché, F., et al. (2020). Bootstrap your own latent-a new approach to self-supervised learning. Conference on Neural Information Processing Systems, 33, 21271–21284. https://doi.org/10.5555/3495724.3497510

    Article  Google Scholar 

  • Han T., Wang P., Niu S., et al. (2022) Modality matches modality: Pretraining modality-disentangled item representations for recommendation. In: The web conference, pp. 2058–2066. https://doi.org/10.1145/3485447.3512079

  • He R., McAuley J. (2016) Vbpr: visual bayesian personalized ranking from implicit feedback. In: Association for the advancement of artificial intelligence. https://doi.org/10.1609/aaai.v30i1.9973

  • He X., Liao L., Zhang H., et al. (2017) Neural collaborative filtering. In: The web conference, pp. 173–182. https://doi.org/10.1145/3038912.3052569

  • He X., Deng K., Wang X., et al. (2020) Lightgcn: Simplifying and powering graph convolution network for recommendation. In: ACM SIGIR conference on research and development in information retrieval, pp. 639–648. https://doi.org/10.1145/3397271.3401063

  • Kim T., Lee Y.C., Shin K., et al. (2022) Mario: Modality-aware attention and modality-preserving decoders for multimedia recommendation. In: ACM Conference on information and knowledge management, pp. 993–1002

  • La Gatta V., Moscato V., Pennone M., et al. (2022) Music recommendation via hypergraph embedding. IEEE Transactions on Neural Networks and Learning Systems

  • Lan Z., Chen M., Goodman S., et al. (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942. https://doi.org/10.48550

  • Lee N., Lee J., Park C. (2022) Augmentation-free self-supervised learning on graphs. In: Association for the advancement of artificial intelligence, pp. 7372–7380. https://doi.org/10.1609/aaai.v36i7.20700

  • Lin Z., Tian C., Hou Y., et al. (2022) Improving graph collaborative filtering with neighborhood-enriched contrastive learning. In: The web conference, pp. 2320–2329. https://doi.org/10.1145/3485447.3512104

  • Liu C., Li X., Cai G., et al. (2021a) Noninvasive self-attention for side information fusion in sequential recommendation. In: Association for the advancement of artificial intelligence, pp. 4249–4256. https://doi.org/10.1609/aaai.v35i5.16549

  • Liu Q., Wu S., Wang L. (2017) Deepstyle: Learning user preferences for visual recommendation. In: ACM SIGIR conference on research and development in information retrieval, pp. 841–844. https://doi.org/10.1145/3077136.3080658

  • Liu Y., Yang S., Lei C., et al. (2021b) Pre-training graph transformer with multimodal side information for recommendation. In: ACM multimedia conference, pp. 2853–2861. https://doi.org/10.1145/3474085.3475709

  • Mao K., Zhu J., Xiao X., et al. (2021) Ultragcn: ultra simplification of graph convolutional networks for recommendation. In: ACM conference on information and knowledge management, pp. 1253–1262. https://doi.org/10.1145/3459637.3482291

  • McPherson M., Smith-Lovin L., Cook J.M. (2001) Birds of a feather: Homophily in social networks. Annual Review of Sociology, pp. 415–444. https://doi.org/10.1146/annurev.soc.27.1.415

  • Moscato, V., Picariello, A., & Sperli, G. (2020). An emotional recommender system for music. IEEE Intelligent Systems, 36(5), 57–68.

    Article  Google Scholar 

  • Rendle S., Freudenthaler C., Gantner Z., et al. (2012) Bpr: Bayesian personalized ranking from implicit feedback. arXiv:1205.2618. https://doi.org/10.48550

  • Wang X., He X., Wang M., et al. (2019) Neural graph collaborative filtering. In: ACM SIGIR conference on research and development in information retrieval, pp. 165–174. https://doi.org/10.1145/3331184.3331267

  • Wei Y., Wang X., Nie L., et al. (2019) Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. In: ACM multimedia conference, pp. 1437–1445. https://doi.org/10.1145/3343031.3351034

  • Wei Y., Wang X., Nie L., et al. (2020) Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: ACM multimedia conference, pp. 3541–3549,. https://doi.org/10.1145/3394171.3413556

  • Wu C., Wu F., Qi T., et al. (2021a) Mm-rec: multimodal news recommendation. arXiv:2104.07407. https://doi.org/10.48550

  • Wu J., Wang X., Feng F., et al. (2021b) Self-supervised graph learning for recommendation. In: ACM SIGIR conference on research and development in information retrieval, pp. 726–735. https://doi.org/10.1145/3404835.3462862

  • Xia J., Wu L., Chen J., et al. (2022) Simgrace: A simple framework for graph contrastive learning without data augmentation. In: The web conference, pp. 1070–1079. https://doi.org/10.1145/3485447.3512156

  • Xie, Y., Zhou, P., & Kim, S. (2022). Decoupled side information fusion for sequential recommendation. ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/3477495.3531963

    Article  Google Scholar 

  • Zhang J., Zhu Y., Liu Q., et al. (2021a) Mining latent structures for multimedia recommendation. In: ACM multimedia conference, pp. 3872–3880. https://doi.org/10.1145/3474085.3475259

  • Zhang, J., Zhu, Y., Liu, Q., et al. (2021). Latent structures mining with contrastive modality fusion for multimedia recommendation. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2022.3221949

    Article  PubMed Central  PubMed  Google Scholar 

  • Zhao W.X., Chen J., Wang P., et al. (2020) Revisiting alternative experimental settings for evaluating top-n item recommendation algorithms. In: ACM conference on information and knowledge management, pp. 2329–2332. https://doi.org/10.1145/3340531.3412095

  • Zhao W.X., Mu S., Hou Y., et al. (2021) Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In: ACM conference on information and knowledge management, pp. 4653–4664. https://doi.org/10.1145/3459637.3482016

  • Zhou H., Zhou X., Shen Z. (2023a) Enhancing dyadic relations with homogeneous graphs for multimodal recommendation. arXiv:2301.12097

  • Zhou X. (2022) A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. arXiv:2211.06924

  • Zhou X., Zhou H., Liu Y., et al. (2023b) Bootstrap latent representations for multi-modal recommendation. In: The web conference, pp. 845–854

  • Zhu Y., Xu Y., Yu F., et al. (2021) Graph contrastive learning with adaptive augmentation. In: The web conference, pp. 2069–2080. https://doi.org/10.1145/3442381.3449802

Download references

Acknowledgements

This research was partially supported by the NSFC (61876117, 62176175), the major project of natural science research in Universities of Jiangsu Province (21KJA520004), Suzhou Science and Technology Development Program (SYC2022139).

Funding

No Funding

Author information

Authors and Affiliations

Authors

Contributions

Lei Shan was responsible for the conceptualization and design of the study. Huanhuan Yuan and Pengpeng Zhao contributed to the main modifications of the manuscript. Jianfeng Qu, Junhua Fang, Guanfeng Liu, and Victor Sheng reviewed and provided critical feedback on the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhao Pengpeng.

Ethics declarations

Competing interests

The authors have no conficts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lei, S., Huanhuan, Y., Pengpeng, Z. et al. Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning. J Intell Inf Syst 62, 143–161 (2024). https://doi.org/10.1007/s10844-023-00807-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-023-00807-y

Keywords

Navigation