Multifactorial modality fusion network for multimodal recommendation

Chen, Yanke; sun, Tianhao; Ma, Yunhao; Zou, Huhai

doi:10.1007/s10489-024-06038-0

Multifactorial modality fusion network for multimodal recommendation

Published: 12 December 2024

Volume 55, article number 139, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yanke Chen¹,
Tianhao sun¹,
Yunhao Ma¹^na1 &
…
Huhai Zou¹^na1

138 Accesses
Explore all metrics

Abstract

Multimodal recommendation systems aim to deliver precise and personalized recommendations by integrating diverse modalities such as text, images, and audio. Despite their potential, these systems often struggle with effective modality fusion strategies and comprehensive modeling of user preferences. To address these issues, we propose the Multifactorial Modality Fusion Network (MMFN). MMFN overcomes the limitations of previous models by following pivotal architectures. First, this novel approach employs three Graph Neural Networks (GNN) to extract foundational interactions and semantic information across modalities meticulously. Second, a Gated Multi-factor Semantic Sensor operates through a series of stacked gating units, guided by interaction embeddings, to extract features from modal embeddings deeply. Third, a User Preference-Oriented Modality Aligner, leveraging contrastive learning to synchronize user preferences with item features, thus enhancing the expressiveness of embeddings and the overall quality of recommendations. We demonstrate the marked superiority of MMFN in both performance and efficiency compared to traditional collaborative filtering methods and contemporary deep multimodal recommendation systems. Through comprehensive evaluations on the baby, sports, and clothing datasets, MMFN achieves significant gains in Recall@20 metrics, with improvements of 2.49%, 8.79%, and 24.51% over the following best baseline models. Additionally, MMFN also leads in training efficiency, outperforming most competing models. MMFN paves the way for future multimodal recommendation systems, leveraging the full spectrum of deep learning technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Dynamic Collaborative Recommendation Method Based on Multimodal Fusion

Lightweight Modality Adaptation to Sequential Recommendation via Correlation Supervision

MIN: Multi-stage Interactive Network for Multimodal Recommendation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The dataset is derived from a publicly available dataset: https://drive.google.com/drive/folders/13cBy1EA_saTUuXxVllKgtfci2A09jyaG

Code availability

The codes for the baselines are provided by the corresponding authors, and the code for our model will be available upon request after the paper is accepted for publication.

Materials availability

Not applicable

References

Ko H, Lee S, Park Y et al (2022) A survey of recommendation systems: recommendation models, techniques, and application fields. Electronics 11(1):141
Article MATH Google Scholar
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning. PMLR, pp 10347–10357
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. PMLR, pp 1597–1607
Talaat AS (2023) Sentiment analysis classification system using hybrid bert models. J Big Data 10(1):110
Article MATH Google Scholar
Wu T, He S, Liu J, Sun S, Liu K, Han Q-L, Tang Y (2023) A brief overview of chatgpt: The history, status quo and potential future development. IEEE/CAA J Auto Sinica 10(5):1122–1136
Article MATH Google Scholar
Xiao Y, Huang J, Yang J (2024) Hfnf: learning a hybrid fourier neural filter with a heterogeneous loss for sequential recommendation. Appl Intell 54(1):283–300
Article MATH Google Scholar
Wang M, Li W, Shi J, Wu S, Bai Q (2023) Dor: a novel dual-observation-based approach for recommendation systems. Appl Intell 53(23):29109–29127
Article MATH Google Scholar
Wang X, He X, Wang M, et al (2019) Neural graph collaborative filtering. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp 165–174
He X, Deng K, Wang X, et al (2020) Lightgcn: Simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp 639–648
Natarajan S, Vairavasundaram S, Natarajan S, Gandomi AH (2020) Resolving data sparsity and cold start problem in collaborative filtering recommender system using linked open data. Exp Syst Appl 149:113248
Article MATH Google Scholar
He R, McAuley J (2016) Vbpr: visual bayesian personalized ranking from implicit feedback. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30
Wei Y, Wang X, Nie L, He X, Hong R, Chua T-S (2019) Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. In: Proceedings of the 27th ACM International Conference on Multimedia. pp 1437–1445
Wang Q, Wei Y, Yin J, Wu J, Song X, Nie L (2021) Dualgnn: Dual graph neural network for multimedia recommendation. IEEE Trans Multimedia 25:1074–1084
Article MATH Google Scholar
Wei Y, Wang X, Nie L, He X, Chua T-S (2020) Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: Proceedings of the 28th ACM International Conference on Multimedia. pp 3541–3549
Du X, Wu Z, Feng F, et al (2022) Invariant representation learning for multimedia recommendation. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 619–628
Zhou X, Zhou H, Liu Y et al (2023) Bootstrap latent representations for multi-modal recommendation. Proce ACM Web Conf 2023:845–854
MATH Google Scholar
Zhang X, Xu B, Ma F, Li C, Yang L, Lin H (2023) Beyond co-occurrence: Multi-modal session-based recommendation. IEEE Trans Knowl Data Eng
Liu K, Xue F, Li S, Sang S, Hong R (2022) Multimodal hierarchical graph collaborative filtering for multimedia-based recommendation. IEEE Trans Comput Soc Syst 11(1):216–227
Article MATH Google Scholar
Phan HT, Nguyen NT, Hwang D (2023) Aspect-level sentiment analysis: A survey of graph convolutional network methods. Inform Fusion 91:149–172
Article MATH Google Scholar
Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y et al (2017) Graph attention networks. stat 1050(20):10–48550
Google Scholar
Liu J, Ong GP, Chen X (2020) Graphsage-based traffic speed forecasting for segment network with sparse data. IEEE Trans Intell Transp Syst 23(3):1755–1766
Article MATH Google Scholar
Wei T, Chow TW, Ma J, Zhao M (2023) Expgcn: Review-aware graph convolution network for explainable recommendation. Neural Netw 157:202–215
Article MATH Google Scholar
Yin Y, Li Y, Gao H, Liang T, Pan Q (2022) Fgc: Gcn-based federated learning approach for trust industrial service recommendation. IEEE Trans Industr Inf 19(3):3240–3250
Article MATH Google Scholar
Zhou X (2023) Mmrec: Simplifying multimodal recommendation. In: Proceedings of the 5th ACM International Conference on Multimedia in Asia Workshops. pp 1–2
Zhang J, Zhu Y, Liu Q, Wu S, Wang S, Wang L (2021) Mining latent structures for multimedia recommendation. In: Proceedings of the 29th ACM International Conference on Multimedia. ACM, pp 3872–3880
Yi Z, Wang X, Ounis I, et al (2022) Multi-modal graph contrastive learning for micro-video recommendation. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp 1807–1811
Zhang J, Zhu Y, Liu Q, Zhang M, Wu S, Wang L (2022) Latent structure mining with contrastive modality fusion for multimedia recommendation. IEEE Trans Knowl Data Eng 35(9):9154–9167
Article MATH Google Scholar
Tran N-T, Lauw HW (2022) Aligning dual disentangled user representations from ratings and textual content. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. pp 1798–1806
Xu Y, Wang Z, Gao H, Jiang Z, Yin Y, Li R (2023) Towards machine-learning-driven effective mashup recommendations from big data in mobile networks and the internet-of-things. Dig Commun Netw 9(1):138–145
Article MATH Google Scholar
Chen J, Cao B, Peng Z, Xie Z, Liu S, Peng Q (2024) Tn-mr: topic-aware neural network-based mobile application recommendation. Int J Web Inform Syst 20(2):159–175
Article MATH Google Scholar
Xiong Y, Fu X (2024) User credibility evaluation for reputation measurement of online service. Int J Web Inform Syst 20(2):176–194
Article MATH Google Scholar
Gao H, Jiang W, Ran Q, Wang Y (2024) Vision-language interaction via contrastive learning for surface anomaly detection in consumer electronics manufacturing. IEEE Trans Consumer Electron
Jing M, Zhu Y, Zang T, Wang K (2023) Contrastive self-supervised learning in recommender systems: A survey. ACM Trans Inform Syst 42(2):1–39
Article MATH Google Scholar
Wang F, Wang Y, Li D, Gu H, Lu T, Zhang P, Gu N (2023) Cl4ctr: A contrastive learning framework for ctr prediction. In: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. pp 805–813
Cai D, Qian S, Fang Q, Hu J, Ding W, Xu C (2022) Heterogeneous graph contrastive learning network for personalized micro-video recommendation. IEEE Trans Multimed
Gao H, Jiang W, Ran Q, Wang Y (2024) Vision-language interaction via contrastive learning for surface anomaly detection in consumer electronics manufacturing. IEEE Trans Consum Electron
Liu F, Chen H, Cheng Z, Liu A, Nie L, Kankanhalli M (2022) Disentangled multimodal representation learning for recommendation. IEEE Trans Multimedia 25:7149–7159
Article MATH Google Scholar
Guo W, Tian J, Li M (2023) Price-aware enhanced dynamic recommendation based on deep learning. J Retail Consum Serv 75:103500
Article MATH Google Scholar
Zhao S, Gong M, Zhao H, Zhang J, Tao D (2023) Deep corner. Int J Comput Vision 131(11):2908–2932
Article MATH Google Scholar
Wang J, Wu J, Jia C, Zhang Z (2023) Self-supervised variational autoencoder towards recommendation by nested contrastive learning. Appl Intell 53(15):18887–18897
Article Google Scholar
Lin X-Y, Xu Y-Y, Wang W-J, Zhang Y, Feng F-L (2023) Mitigating spurious correlations for self-supervised recommendation. Mach Intell Res 20(2):263–275
Article MATH Google Scholar
Zhang A, Sheng L, Cai Z, Wang X, Chua T-S (2024) Empowering collaborative filtering with principled adversarial contrastive loss. Adv Neural Inform Process Syst 36
McAuley J, Targett C, Shi Q, Van Den Hengel A (2015) Image-based recommendations on styles and substitutes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp 43–52
Devika R, Vairavasundaram S, Mahenthar CSJ, Varadarajan V, Kotecha K (2021) A deep learning model based on bert and sentence transformer for semantic keyphrase extraction on big social data. IEEE Access 9:165252–165261
Article Google Scholar
Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2012) Bpr: Bayesian personalized ranking from implicit feedback. arXiv:1205.2618
Wei W, Huang C, Xia L, Zhang C (2023) Multi-modal self-supervised learning for recommendation. Proc ACM Web Conf 2023:790–800
MATH Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, pp 249–256

Download references

Funding

This work was supported by the Natural Science Foundation of Chongqing, China under Grant CSTB2023NSCQ-LMX0013.

Author information

Yunhao Ma and Huhai Zou contributed equally to this work.

Authors and Affiliations

College of Computer Science, Chongqing University, No. 55, Daxuecheng South Road, Chongqing, 400044, China
Yanke Chen, Tianhao sun, Yunhao Ma & Huhai Zou

Authors

Yanke Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tianhao sun
View author publications
You can also search for this author in PubMed Google Scholar
Yunhao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Huhai Zou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yanke Chen designed the study, established the proposed model, and played a leading role in the design of experiments. Tianhao Sun contributed to the development of the algorithms and the execution of experiments. Yunhao Ma was responsible for drafting the manuscript and carrying out experiments. Huhai Zou prepared the figures and tables and contributed to the analysis and interpretation of the data. All authors discussed the results and implications at all stages and contributed to the revision of the manuscript. Each author has read and approved the final manuscript.

Corresponding author

Correspondence to Tianhao sun.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable

Conflict of interest

The authors declare that they have no conflicts of interest or financial interests in any organizations or entities with a direct financial interest in the subject matter or materials discussed in the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Y., sun, T., Ma, Y. et al. Multifactorial modality fusion network for multimodal recommendation. Appl Intell 55, 139 (2025). https://doi.org/10.1007/s10489-024-06038-0

Download citation

Accepted: 12 September 2024
Published: 12 December 2024
DOI: https://doi.org/10.1007/s10489-024-06038-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multifactorial modality fusion network for multimodal recommendation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Dynamic Collaborative Recommendation Method Based on Multimodal Fusion

Lightweight Modality Adaptation to Sequential Recommendation via Correlation Supervision

MIN: Multi-stage Interactive Network for Multimodal Recommendation

Data Availability

Code availability

Materials availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multifactorial modality fusion network for multimodal recommendation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Dynamic Collaborative Recommendation Method Based on Multimodal Fusion

Lightweight Modality Adaptation to Sequential Recommendation via Correlation Supervision

MIN: Multi-stage Interactive Network for Multimodal Recommendation

Explore related subjects

Data Availability

Code availability

Materials availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation