Skip to main content

Advertisement

Multifactorial modality fusion network for multimodal recommendation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multimodal recommendation systems aim to deliver precise and personalized recommendations by integrating diverse modalities such as text, images, and audio. Despite their potential, these systems often struggle with effective modality fusion strategies and comprehensive modeling of user preferences. To address these issues, we propose the Multifactorial Modality Fusion Network (MMFN). MMFN overcomes the limitations of previous models by following pivotal architectures. First, this novel approach employs three Graph Neural Networks (GNN) to extract foundational interactions and semantic information across modalities meticulously. Second, a Gated Multi-factor Semantic Sensor operates through a series of stacked gating units, guided by interaction embeddings, to extract features from modal embeddings deeply. Third, a User Preference-Oriented Modality Aligner, leveraging contrastive learning to synchronize user preferences with item features, thus enhancing the expressiveness of embeddings and the overall quality of recommendations. We demonstrate the marked superiority of MMFN in both performance and efficiency compared to traditional collaborative filtering methods and contemporary deep multimodal recommendation systems. Through comprehensive evaluations on the baby, sports, and clothing datasets, MMFN achieves significant gains in Recall@20 metrics, with improvements of 2.49%, 8.79%, and 24.51% over the following best baseline models. Additionally, MMFN also leads in training efficiency, outperforming most competing models. MMFN paves the way for future multimodal recommendation systems, leveraging the full spectrum of deep learning technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The dataset is derived from a publicly available dataset: https://drive.google.com/drive/folders/13cBy1EA_saTUuXxVllKgtfci2A09jyaG

Code availability

The codes for the baselines are provided by the corresponding authors, and the code for our model will be available upon request after the paper is accepted for publication.

Materials availability

Not applicable

References

  1. Ko H, Lee S, Park Y et al (2022) A survey of recommendation systems: recommendation models, techniques, and application fields. Electronics 11(1):141

    Article  MATH  Google Scholar 

  2. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning. PMLR, pp 10347–10357

  3. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. PMLR, pp 1597–1607

  4. Talaat AS (2023) Sentiment analysis classification system using hybrid bert models. J Big Data 10(1):110

    Article  MATH  Google Scholar 

  5. Wu T, He S, Liu J, Sun S, Liu K, Han Q-L, Tang Y (2023) A brief overview of chatgpt: The history, status quo and potential future development. IEEE/CAA J Auto Sinica 10(5):1122–1136

    Article  MATH  Google Scholar 

  6. Xiao Y, Huang J, Yang J (2024) Hfnf: learning a hybrid fourier neural filter with a heterogeneous loss for sequential recommendation. Appl Intell 54(1):283–300

    Article  MATH  Google Scholar 

  7. Wang M, Li W, Shi J, Wu S, Bai Q (2023) Dor: a novel dual-observation-based approach for recommendation systems. Appl Intell 53(23):29109–29127

    Article  MATH  Google Scholar 

  8. Wang X, He X, Wang M, et al (2019) Neural graph collaborative filtering. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp 165–174

  9. He X, Deng K, Wang X, et al (2020) Lightgcn: Simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp 639–648

  10. Natarajan S, Vairavasundaram S, Natarajan S, Gandomi AH (2020) Resolving data sparsity and cold start problem in collaborative filtering recommender system using linked open data. Exp Syst Appl 149:113248

    Article  MATH  Google Scholar 

  11. He R, McAuley J (2016) Vbpr: visual bayesian personalized ranking from implicit feedback. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30

  12. Wei Y, Wang X, Nie L, He X, Hong R, Chua T-S (2019) Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. In: Proceedings of the 27th ACM International Conference on Multimedia. pp 1437–1445

  13. Wang Q, Wei Y, Yin J, Wu J, Song X, Nie L (2021) Dualgnn: Dual graph neural network for multimedia recommendation. IEEE Trans Multimedia 25:1074–1084

    Article  MATH  Google Scholar 

  14. Wei Y, Wang X, Nie L, He X, Chua T-S (2020) Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: Proceedings of the 28th ACM International Conference on Multimedia. pp 3541–3549

  15. Du X, Wu Z, Feng F, et al (2022) Invariant representation learning for multimedia recommendation. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 619–628

  16. Zhou X, Zhou H, Liu Y et al (2023) Bootstrap latent representations for multi-modal recommendation. Proce ACM Web Conf 2023:845–854

    MATH  Google Scholar 

  17. Zhang X, Xu B, Ma F, Li C, Yang L, Lin H (2023) Beyond co-occurrence: Multi-modal session-based recommendation. IEEE Trans Knowl Data Eng

  18. Liu K, Xue F, Li S, Sang S, Hong R (2022) Multimodal hierarchical graph collaborative filtering for multimedia-based recommendation. IEEE Trans Comput Soc Syst 11(1):216–227

    Article  MATH  Google Scholar 

  19. Phan HT, Nguyen NT, Hwang D (2023) Aspect-level sentiment analysis: A survey of graph convolutional network methods. Inform Fusion 91:149–172

    Article  MATH  Google Scholar 

  20. Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y et al (2017) Graph attention networks. stat 1050(20):10–48550

    Google Scholar 

  21. Liu J, Ong GP, Chen X (2020) Graphsage-based traffic speed forecasting for segment network with sparse data. IEEE Trans Intell Transp Syst 23(3):1755–1766

    Article  MATH  Google Scholar 

  22. Wei T, Chow TW, Ma J, Zhao M (2023) Expgcn: Review-aware graph convolution network for explainable recommendation. Neural Netw 157:202–215

    Article  MATH  Google Scholar 

  23. Yin Y, Li Y, Gao H, Liang T, Pan Q (2022) Fgc: Gcn-based federated learning approach for trust industrial service recommendation. IEEE Trans Industr Inf 19(3):3240–3250

    Article  MATH  Google Scholar 

  24. Zhou X (2023) Mmrec: Simplifying multimodal recommendation. In: Proceedings of the 5th ACM International Conference on Multimedia in Asia Workshops. pp 1–2

  25. Zhang J, Zhu Y, Liu Q, Wu S, Wang S, Wang L (2021) Mining latent structures for multimedia recommendation. In: Proceedings of the 29th ACM International Conference on Multimedia. ACM, pp 3872–3880

  26. Yi Z, Wang X, Ounis I, et al (2022) Multi-modal graph contrastive learning for micro-video recommendation. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp 1807–1811

  27. Zhang J, Zhu Y, Liu Q, Zhang M, Wu S, Wang L (2022) Latent structure mining with contrastive modality fusion for multimedia recommendation. IEEE Trans Knowl Data Eng 35(9):9154–9167

    Article  MATH  Google Scholar 

  28. Tran N-T, Lauw HW (2022) Aligning dual disentangled user representations from ratings and textual content. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. pp 1798–1806

  29. Xu Y, Wang Z, Gao H, Jiang Z, Yin Y, Li R (2023) Towards machine-learning-driven effective mashup recommendations from big data in mobile networks and the internet-of-things. Dig Commun Netw 9(1):138–145

    Article  MATH  Google Scholar 

  30. Chen J, Cao B, Peng Z, Xie Z, Liu S, Peng Q (2024) Tn-mr: topic-aware neural network-based mobile application recommendation. Int J Web Inform Syst 20(2):159–175

    Article  MATH  Google Scholar 

  31. Xiong Y, Fu X (2024) User credibility evaluation for reputation measurement of online service. Int J Web Inform Syst 20(2):176–194

    Article  MATH  Google Scholar 

  32. Gao H, Jiang W, Ran Q, Wang Y (2024) Vision-language interaction via contrastive learning for surface anomaly detection in consumer electronics manufacturing. IEEE Trans Consumer Electron

  33. Jing M, Zhu Y, Zang T, Wang K (2023) Contrastive self-supervised learning in recommender systems: A survey. ACM Trans Inform Syst 42(2):1–39

    Article  MATH  Google Scholar 

  34. Wang F, Wang Y, Li D, Gu H, Lu T, Zhang P, Gu N (2023) Cl4ctr: A contrastive learning framework for ctr prediction. In: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. pp 805–813

  35. Cai D, Qian S, Fang Q, Hu J, Ding W, Xu C (2022) Heterogeneous graph contrastive learning network for personalized micro-video recommendation. IEEE Trans Multimed

  36. Gao H, Jiang W, Ran Q, Wang Y (2024) Vision-language interaction via contrastive learning for surface anomaly detection in consumer electronics manufacturing. IEEE Trans Consum Electron

  37. Liu F, Chen H, Cheng Z, Liu A, Nie L, Kankanhalli M (2022) Disentangled multimodal representation learning for recommendation. IEEE Trans Multimedia 25:7149–7159

    Article  MATH  Google Scholar 

  38. Guo W, Tian J, Li M (2023) Price-aware enhanced dynamic recommendation based on deep learning. J Retail Consum Serv 75:103500

    Article  MATH  Google Scholar 

  39. Zhao S, Gong M, Zhao H, Zhang J, Tao D (2023) Deep corner. Int J Comput Vision 131(11):2908–2932

    Article  MATH  Google Scholar 

  40. Wang J, Wu J, Jia C, Zhang Z (2023) Self-supervised variational autoencoder towards recommendation by nested contrastive learning. Appl Intell 53(15):18887–18897

    Article  Google Scholar 

  41. Lin X-Y, Xu Y-Y, Wang W-J, Zhang Y, Feng F-L (2023) Mitigating spurious correlations for self-supervised recommendation. Mach Intell Res 20(2):263–275

    Article  MATH  Google Scholar 

  42. Zhang A, Sheng L, Cai Z, Wang X, Chua T-S (2024) Empowering collaborative filtering with principled adversarial contrastive loss. Adv Neural Inform Process Syst 36

  43. McAuley J, Targett C, Shi Q, Van Den Hengel A (2015) Image-based recommendations on styles and substitutes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp 43–52

  44. Devika R, Vairavasundaram S, Mahenthar CSJ, Varadarajan V, Kotecha K (2021) A deep learning model based on bert and sentence transformer for semantic keyphrase extraction on big social data. IEEE Access 9:165252–165261

    Article  Google Scholar 

  45. Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2012) Bpr: Bayesian personalized ranking from implicit feedback. arXiv:1205.2618

  46. Wei W, Huang C, Xia L, Zhang C (2023) Multi-modal self-supervised learning for recommendation. Proc ACM Web Conf 2023:790–800

    MATH  Google Scholar 

  47. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, pp 249–256

Download references

Funding

This work was supported by the Natural Science Foundation of Chongqing, China under Grant CSTB2023NSCQ-LMX0013.

Author information

Authors and Affiliations

Authors

Contributions

Yanke Chen designed the study, established the proposed model, and played a leading role in the design of experiments. Tianhao Sun contributed to the development of the algorithms and the execution of experiments. Yunhao Ma was responsible for drafting the manuscript and carrying out experiments. Huhai Zou prepared the figures and tables and contributed to the analysis and interpretation of the data. All authors discussed the results and implications at all stages and contributed to the revision of the manuscript. Each author has read and approved the final manuscript.

Corresponding author

Correspondence to Tianhao sun.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable

Conflict of interest

The authors declare that they have no conflicts of interest or financial interests in any organizations or entities with a direct financial interest in the subject matter or materials discussed in the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., sun, T., Ma, Y. et al. Multifactorial modality fusion network for multimodal recommendation. Appl Intell 55, 139 (2025). https://doi.org/10.1007/s10489-024-06038-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06038-0

Keywords