ABSTRACT
Within online platforms, it is critical to capture the dynamic user preference from the sequential interaction behaviors for making accurate recommendation over time. Recently, significant progress has been made in sequential recommendation with deep learning. However, existing neural sequential recommender often suffer from the data sparsity issue in real-world applications.
To tackle this problem, we propose a Multi-Modal Mixture of experts model for Sequential Recommendation, named M3SRec, which leverage rich multi-modal interaction data for improving sequential recommendation. Different from existing multi-modal recommendation models, our approach jointly considers reducing the semantic gap across modalities and adapts multi-modal semantics to fit recommender systems. For this purpose, we make two important technical contributions in architecture and training. Firstly, we design a novel multi-modal mixture-of-experts (MoE) fusion network, which can deeply fuse the across-modal semantics and largely enhance the modeling capacity of complex user intents. For training, we design specific pre-training tasks that can mimic the goal of the recommendation, which help model learn the semantic relatedness between the multi-modal sequential context and the target item. Extensive experiments conducted on both public and industry datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods, especially when only limited training data is available.
- Shuqing Bian, Wayne Xin Zhao, Kun Zhou, Jing Cai, Yancheng He, Cunxiang Yin, and Ji-Rong Wen. 2021a. Contrastive Curriculum Learning for Sequential User Behavior Modeling via Data Augmentation. In CIKM 2021. 3737--3746.Google ScholarDigital Library
- Shuqing Bian, Wayne Xin Zhao, Kun Zhou, Xu Chen, Jing Cai, Yancheng He, Xingji Luo, and Ji-Rong Wen. 2021b. A Novel Macro-Micro Fusion Network for User Representation Learning on Mobile Apps. In WWW 2021. 3199--3209.Google Scholar
- Da Cao, Xiangnan He, Liqiang Nie, Xiaochi Wei, Xia Hu, Shunxiang Wu, and Tat-Seng Chua. 2017. Cross-Platform App Recommendation by Jointly Modeling Ratings and Texts. ACM Trans. Inf. Syst., Vol. 35, 4 (2017).Google ScholarDigital Library
- Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In SIGIR. ACM, 335--344.Google Scholar
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In ICML 2020.Google Scholar
- Xu Chen, Hanxiong Chen, Hongteng Xu, Yongfeng Zhang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2019. Personalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network: Towards Visually Explainable Recommendation. In SIGIR 2019. ACM, 765--774.Google ScholarDigital Library
- Yongjun Chen, Jia Li, Chenghao Liu, Chenxi Li, Markus Anderle, Julian McAuley, and Caiming Xiong. 2021. Modeling Dynamic Attributes for Next Basket Recommendation. arXiv preprint arXiv:2109.11654 (2021).Google Scholar
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT 2019. 4171--4186.Google Scholar
- Ruining He and Julian J. McAuley. 2016. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation. In ICDM 2016.Google Scholar
- Balá zs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based Recommendations with Recurrent Neural Networks. CoRR, Vol. abs/1511.06939 (2015).Google Scholar
- Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, and Ji-Rong Wen. 2022. Towards Universal Sequence Representation Learning for Recommender Systems. In KDD. ACM, 585--593.Google Scholar
- Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. Adaptive Mixtures of Local Experts. Neural Comput., Vol. 3, 1 (1991), 79--87.Google ScholarCross Ref
- Wang-Cheng Kang and Julian J. McAuley. 2018. Self-Attentive Sequential Recommendation. In ICDM 2018. 197--206.Google Scholar
- Dong Hyun Kim, Chanyoung Park, Jinoh Oh, Sungyoung Lee, and Hwanjo Yu. 2016. Convolutional Matrix Factorization for Document Context-Aware Recommendation. In RecSys. ACM, 233--240.Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR 2015.Google Scholar
- Chenyi Lei, Dong Liu, Weiping Li, Zheng-Jun Zha, and Houqiang Li. 2016. Comparative Deep Learning of Hybrid Representations for Image Recommendations. In CVPR. IEEE Computer Society, 2545--2553.Google Scholar
- Dingcheng Li, Xu Li, Jun Wang, and Ping Li. 2020b. Video Recommendation with Multi-gate Mixture of Experts Soft Actor Critic. In SIGIR. ACM, 1553--1556.Google Scholar
- Gen Li, Nan Duan, Yuejian Fang, Ming Gong, and Daxin Jiang. 2020a. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training. In AAAI 2020. 11336--11344.Google ScholarCross Ref
- Xiaopeng Li and James She. 2017. Collaborative Variational Autoencoder for Recommender Systems. In KDD. ACM, 305--314.Google Scholar
- Yang Li, Tong Chen, Peng-Fei Zhang, and Hongzhi Yin. 2021. Lightweight Self-Attentive Sequential Recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 967--977.Google ScholarDigital Library
- Yong Liu, Susen Yang, Chenyi Lei, Guoxin Wang, Haihong Tang, Juyong Zhang, Aixin Sun, and Chunyan Miao. 2021b. Pre-training Graph Transformer with Multimodal Side Information for Recommendation. In ACM Multimedia. ACM, 2853--2861.Google Scholar
- Zhiwei Liu, Yongjun Chen, Jia Li, Philip S Yu, Julian McAuley, and Caiming Xiong. 2021a. Contrastive self-supervised sequential recommendation with robust augmentation. arXiv preprint arXiv:2108.06479 (2021).Google Scholar
- Zhongqi Lu, Zhicheng Dou, Jianxun Lian, Xing Xie, and Qiang Yang. 2015. Content-Based Collaborative Filtering for News Topic Recommendation. In AAAI. AAAI Press, 217--223.Google Scholar
- Jiaqi Ma, Zhe Zhao, Jilin Chen, Ang Li, Lichan Hong, and Ed H. Chi. 2019. SNR: Sub-Network Routing for Flexible Parameter Sharing in Multi-Task Learning. In AAAI. AAAI Press, 216--223.Google Scholar
- Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. 2018. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. In KDD. ACM, 1930--1939.Google ScholarDigital Library
- Julian J. McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In RecSys. ACM, 165--172.Google Scholar
- Wei Niu, James Caverlee, and Haokai Lu. 2018. Neural Personalized Ranking for Image Recommendation. In WSDM. ACM, 423--431.Google Scholar
- Xingyu Pan, Yushuo Chen, Changxin Tian, Zihan Lin, Jinpeng Wang, He Hu, and Wayne Xin Zhao. 2022. Multimodal Meta-Learning for Cold-Start Sequential Recommendation. In CIKM. ACM, 3421--3430.Google Scholar
- Zhen Qin, Yicheng Cheng, Zhe Zhao, Zhe Chen, Donald Metzler, and Jingzheng Qin. 2020. Multitask Mixture of Sequential Experts for User Activity Streams. In KDD. ACM, 3083--3091.Google Scholar
- Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. In WSDM 2022.Google Scholar
- Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995--1000.Google ScholarDigital Library
- Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009. 452--461.Google ScholarDigital Library
- S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. 2010. Factorizing personalized Markov chains for next-basket recommendation. In WWW 2010.Google Scholar
- Ajit Paul Singh and Geoffrey J. Gordon. 2008. Relational learning via collective matrix factorization. In KDD. ACM, 650--658.Google Scholar
- Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In CIKM. 1441--1450.Google Scholar
- Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In RecSys. ACM, 269--278.Google Scholar
- Jiaxi Tang and Ke Wang. 2018. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding. In WSDM 2018. 565--573.Google ScholarDigital Library
- Wilson L Taylor. 1953. "Cloze procedure": A new tool for measuring readability. Journalism quarterly, Vol. 30, 4 (1953), 415--433.Google ScholarCross Ref
- Quoc-Tuan Truong and Hady W. Lauw. 2019. Multimodal Review Generation for Recommender Systems. In WWW. ACM, 1864--1874.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NeurIPS 2017. 5998--6008.Google ScholarDigital Library
- Chuhan Wu, Fangzhao Wu, Tao Qi, Chao Zhang, Yongfeng Huang, and Tong Xu. 2022. MM-Rec: Visiolinguistic Model Empowered Multimodal News Recommendation. In SIGIR. ACM, 2560--2564.Google Scholar
- Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-Based Recommendation with Graph Neural Networks. In AAAI 2019.Google Scholar
- Xu Xie, Fei Sun, Bolin Ding, and Bin Cui. 2020. Contrastive Pre-training for Sequential Recommendation. CoRR, Vol. abs/2010.14395 (2020).Google Scholar
- Chunfeng Yang, Huan Yan, Donghan Yu, Yong Li, and Dah Ming Chiu. 2017. Multi-site User Behavior Modeling and Its Application in Video Recommendation. In SIGIR 2017. 175--184.Google Scholar
- Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, and Xiaofang Zhou. 2019. Feature-level Deeper Self-Attention Network for Sequential Recommendation. In IJCAI 2019. 4320--4326.Google Scholar
- Yongfeng Zhang, Qingyao Ai, Xu Chen, and W. Bruce Croft. 2017. Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources. In CIKM. ACM, 1449--1458.Google Scholar
- Wayne Xin Zhao, Yupeng Hou, Xingyu Pan, Chen Yang, Zeyu Zhang, Zihan Lin, Jingsen Zhang, Shuqing Bian, Jiakai Tang, Wenqi Sun, Yushuo Chen, Lanling Xu, Gaowei Zhang, Zhen Tian, Changxin Tian, Shanlei Mu, Xinyan Fan, Xu Chen, and Ji-Rong Wen. 2022. RecBole 2.0: Towards a More Up-to-Date Recommendation Library. In CIKM 2022. ACM, 4722--4726.Google ScholarDigital Library
- Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, Yingqian Min, Zhichao Feng, Xinyan Fan, Xu Chen, Pengfei Wang, Wendi Ji, Yaliang Li, Xiaoling Wang, and Ji-Rong Wen. 2021. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In CIKM 2021. 4653--4664.Google Scholar
- Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S3: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization. In CIKM 2020.Google ScholarDigital Library
Index Terms
- Multi-modal Mixture of Experts Represetation Learning for Sequential Recommendation
Recommendations
Multi-Modal Self-Supervised Learning for Recommendation
WWW '23: Proceedings of the ACM Web Conference 2023The online emergence of multi-modal sharing platforms (e.g., TikTok, Youtube) is powering personalized recommender systems to incorporate various modalities (e.g., visual, textual and acoustic) into the latent user representations. While existing works ...
Bootstrap Latent Representations for Multi-modal Recommendation
WWW '23: Proceedings of the ACM Web Conference 2023This paper studies the multi-modal recommendation problem, where the item multi-modality information (e.g., images and textual descriptions) is exploited to improve the recommendation accuracy. Besides the user-item interaction graph, existing state-of-...
Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation
MM '23: Proceedings of the 31st ACM International Conference on MultimediaMulti-modal recommendation systems, which integrate diverse types of information, have gained widespread attention in recent years. However, compared to traditional collaborative filtering-based multi-modal recommendation systems, research on multi-modal ...
Comments