skip to main content
10.1145/3539618.3591778acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Towards Multi-Interest Pre-training with Sparse Capsule Network

Published: 18 July 2023 Publication History

Abstract

The pre-training paradigm, i.e., learning universal knowledge across a wide spectrum of domains, has increasingly become a new de-facto practice in many fields, especially for transferring to new domains. The recent progress includes universal pre-training solutions for recommendation. However, we argue that the common treatment utilizing the masked language modeling or simple data augmentation via contrastive learning is not sufficient for pre-training a recommender system, since a user's intent could be more complex than predicting the next word or item. It is more intuitive to go a step further by devising the multi-interest driven pre-training framework for universal user understanding. Nevertheless, incorporating multi-interest modeling in recommender system pre-training is non-trivial due to the dynamic, contextual, and temporary nature of the user interests, particularly when the users are from different domains. The limited effort on this line has greatly rendered it as an open question.
In this paper, we propose a novel Multi-Interest Pre-training with Sparse Capsule framework (named Miracle). Miracle performs a universal multi-interest modeling with a sparse capsule network and an interest-aware pre-training task. Specifically, we utilize a text-aware item embedding module, including an MoE adaptor and a deeply-contextual encoding component, to model contextual and transferable item representations. Then, we propose a sparse interest activation mechanism coupled with a position-aware capsule network for adaptive interest extraction. Furthermore, an interest-level contrastive pre-training task is introduced to guide the sparse capsule network to learn universal interests precisely. We conduct extensive experiments on eleven real-world datasets and eight baselines. The results show that our method significantly outperforms a series of SOTA on these benchmark datasets. The code is available at https://github.com/WHUIR/Miracle.

Supplemental Material

MP4 File
Presentation video of SIGIR2023 paper Towards Multi-Interest Pre-training with Sparse Capsule Network, the first attempt to explore the transferbility of modeling multi-interest for pre-training driven sequential recommendation.

References

[1]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[2]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.
[3]
Yukuo Cen, Jianwei Zhang, Xu Zou, Chang Zhou, Hongxia Yang, and Jie Tang. 2020. Controllable multi-interest framework for recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2942--2951.
[4]
Zheng Chai, Zhihong Chen, Chenliang Li, Rong Xiao, Houyi Li, Jiawei Wu, Jingxu Chen, and Haihong Tang. 2022. User-Aware Multi-Interest Learning for Candidate Matching in Recommenders. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1326--1335.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[7]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
[8]
Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, and Ji-Rong Wen. 2022. Towards Universal Sequence Representation Learning for Recommender Systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 585--593.
[9]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197--206.
[10]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. preprint arXiv:1412.6980 (2014).
[11]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer, Vol. 42, 8 (2009), 30--37.
[12]
Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the sentence embeddings from pre-trained language models. arXiv preprint arXiv:2011.05864 (2020).
[13]
Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Huan Zhao, Pipei Huang, Guoliang Kang, Qiwei Chen, Wei Li, and Dik Lun Lee. 2019. Multi-interest network with dynamic routing for recommendation at Tmall. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2615--2623.
[14]
Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). 188--197.
[15]
Zhaopeng Qiu, Xian Wu, Jingyue Gao, and Wei Fan. 2021. U-BERT: Pre-training user representations for improved recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4320--4327.
[16]
Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811--820.
[17]
Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. 2017. Dynamic routing between capsules. Advances in neural information processing systems, Vol. 30 (2017).
[18]
J Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. 2007. Collaborative filtering recommender systems. In The adaptive web. Springer, 291--324.
[19]
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 (2017).
[20]
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441--1450.
[21]
Qiaoyu Tan, Jianwei Zhang, Jiangchao Yao, Ninghao Liu, Jingren Zhou, Hongxia Yang, and Xia Hu. 2021. Sparse-interest network for sequential recommendation. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 598--606.
[22]
Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565--573.
[23]
Yu Tian, Jianxin Chang, Yanan Niu, Yang Song, and Chenliang Li. 2022. When Multi-Level Meets Multi-Interest: A Multi-Grained Neural Model for Sequential Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1632--1641.
[24]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[25]
Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 346--353.
[26]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 974--983.
[27]
Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, and Xiaofang Zhou. 2019. Feature-level Deeper Self-Attention Network for Sequential Recommendation. In IJCAI. 4320--4326.
[28]
Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1893--1902.

Cited By

View all
  • (2025)Federated Recommender System Based on Diffusion Augmentation and Guided DenoisingACM Transactions on Information Systems10.1145/368857043:2(1-36)Online publication date: 17-Jan-2025
  • (2025)Sequential recommendation by reprogramming pretrained transformerInformation Processing & Management10.1016/j.ipm.2024.10393862:1(103938)Online publication date: Jan-2025
  • (2025)A Review on Deep Learning for Sequential Recommender Systems: Key Technologies and DirectionsBig Data10.1007/978-981-96-1024-2_22(305-318)Online publication date: 24-Jan-2025
  • Show More Cited By

Index Terms

  1. Towards Multi-Interest Pre-training with Sparse Capsule Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2023
    3567 pages
    ISBN:9781450394086
    DOI:10.1145/3539618
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. multi-interest learning
    2. pre-trained model
    3. sequential recommendation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGIR '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)201
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Federated Recommender System Based on Diffusion Augmentation and Guided DenoisingACM Transactions on Information Systems10.1145/368857043:2(1-36)Online publication date: 17-Jan-2025
    • (2025)Sequential recommendation by reprogramming pretrained transformerInformation Processing & Management10.1016/j.ipm.2024.10393862:1(103938)Online publication date: Jan-2025
    • (2025)A Review on Deep Learning for Sequential Recommender Systems: Key Technologies and DirectionsBig Data10.1007/978-981-96-1024-2_22(305-318)Online publication date: 24-Jan-2025
    • (2024)One Model for All: Large Language Models are Domain-Agnostic Recommendation SystemsACM Transactions on Information Systems10.1145/3705727Online publication date: 26-Nov-2024
    • (2024)MARS: Matching Attribute-aware Representations for Text-based Sequential RecommendationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679960(3822-3826)Online publication date: 21-Oct-2024
    • (2024)TEXT CAN BE FAIR: Mitigating Popularity Bias with PLMs by Learning Relative PreferenceProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679581(2240-2249)Online publication date: 21-Oct-2024
    • (2024)Enhancing Sequential Recommendation via LLM-based Semantic Embedding LearningCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648307(103-111)Online publication date: 13-May-2024
    • (2024)Contrastive multi-interest graph attention network for knowledge-aware recommendationExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124748255:PDOnline publication date: 21-Nov-2024
    • (2024)DualCFGL: dual-channel fusion global and local features for sequential recommendationComplex & Intelligent Systems10.1007/s40747-024-01734-311:1Online publication date: 28-Dec-2024
    • (2024)High-level preferences as positive examples in contrastive learning for multi-interest sequential recommendationWorld Wide Web10.1007/s11280-024-01263-627:2Online publication date: 14-Mar-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media