research-article

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Authors:

Peng JiangAuthors Info & Claims

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Pages 1441 - 1450

https://doi.org/10.1145/3357384.3357895

Published: 03 November 2019 Publication History

Abstract

Modeling users' dynamic preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks to encode users' historical interactions from left to right into hidden representations for making recommendations. Despite their effectiveness, we argue that such left-to-right unidirectional models are sub-optimal due to the limitations including: \begin enumerate* [label=series\itshape\alph*\upshape)] \item unidirectional architectures restrict the power of hidden representation in users' behavior sequences; \item they often assume a rigidly ordered sequence which is not always practical. \end enumerate* To address these limitations, we proposed a sequential recommendation model called BERT4Rec, which employs the deep bidirectional self-attention to model user behavior sequences. To avoid the information leakage and efficiently train the bidirectional model, we adopt the Cloze objective to sequential recommendation, predicting the random masked items in the sequence by jointly conditioning on their left and right context. In this way, we learn a bidirectional representation model to make recommendations by allowing each item in user historical behaviors to fuse information from both left and right sides. Extensive experiments on four benchmark datasets show that our model outperforms various state-of-the-art sequential models consistently.

References

[1]

Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR, Vol. abs/1607.06450 (2016).

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of ICLR .

[3]

Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential Recommendation with User Memory Networks. In Proceedings of WSDM . ACM, 108--116.

Digital Library

[4]

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation. In Proceedings of EMNLP . 1724--1734.

[5]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of RecSys . ACM, 191--198.

Digital Library

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL .

[7]

Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. 2017. Sequential User-based Recurrent Neural Network Recommendations. In Proceedings of RecSys . 152--160.

Digital Library

[8]

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., Vol. 5, 4, Article 19 (Dec. 2015), bibinfonumpages19 pages.

Digital Library

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of CVPR. 770--778.

[10]

Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017a. Translation-based Recommendation. In Proceedings of RecSys. ACM, 161--169.

[11]

Ruining He and Julian McAuley. 2016. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation. In Proceedings of ICDM. 191--200.

[12]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017b. Neural Collaborative Filtering. In Proceedings of WWW. 173--182.

Digital Library

[13]

Dan Hendrycks and Kevin Gimpel. 2016. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units. CoRR, Vol. abs/1606.08415 (2016).

[14]

Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. In Proceedings of CIKM. ACM, 843--852.

Digital Library

[15]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In Proceedings of ICLR .

[16]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. In Deep Learning and Representation Learning Workshop .

[17]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, Vol. 9, 8 (Nov. 1997), 1735--1780.

Digital Library

[18]

Liang Hu, Longbing Cao, Shoujin Wang, Guandong Xu, Jian Cao, and Zhiping Gu. 2017. Diversifying Personalized Recommendation with User-session Context. In Proceedings of IJCAI . 1858--1864.

[19]

Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, and Edward Y. Chang. 2018. Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks. In Proceedings of SIGIR. ACM, 505--514.

Digital Library

[20]

Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender Systems. In Proceedings of KDD . ACM, 659--667.

Digital Library

[21]

Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian McAuley. 2017. Visually-Aware Fashion Recommendation and Design with Generative Image Models. In Proceedings of ICDM. 207--216.

[22]

Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. In Proceedings of ICDM. 197--206.

[23]

Donghyun Kim, Chanyoung Park, Jinoh Oh, Sungyoung Lee, and Hwanjo Yu. 2016. Convolutional Matrix Factorization for Document Context-Aware Recommendation. In Proceedings of RecSys. ACM, 233--240.

Digital Library

[24]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of ICLR .

[25]

Yehuda Koren. 2008. Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. In Proceedings of KDD. ACM, 426--434.

Digital Library

[26]

Yehuda Koren and Robert Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook . Springer US, Boston, MA, 145--186.

Digital Library

[27]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer, Vol. 42, 8 (Aug. 2009), 30--37.

Digital Library

[28]

Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. In Proceedings of CIKM. ACM, 1419--1428.

Digital Library

[29]

Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, and Tong Zhang. 2018. Multi-Head Attention with Disagreement Regularization. In Proceedings of EMNLP . 2897--2903.

[30]

Zhouhan Lin, Minwei Feng, C'i cero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A Structured Self-attentive Sentence Embedding. In Proceedings of ICLR .

[31]

Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.Com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, Vol. 7, 1 (Jan. 2003), 76--80.

Digital Library

[32]

Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018a. Generating Wikipedia by Summarizing Long Sequences. In Proceedings of ICLR .

[33]

Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018b. STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation. In Proceedings of KDD. ACM, 1831--1839.

Digital Library

[34]

Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based Recommendations on Styles and Substitutes. In Proceedings of SIGIR . ACM, 43--52.

[35]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR, Vol. abs/1301.3781 (2013).

[36]

Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi. 2017. Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks. In Proceedings of RecSys. ACM, 130--137.

Digital Library

[37]

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. In OpenAI Technical report .

[38]

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. In OpenAI Technical report .

[39]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of UAI . AUAI Press, Arlington, Virginia, United States, 452--461.

[40]

Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing Personalized Markov Chains for Next-basket Recommendation. In Proceedings of WWW. ACM, 811--820.

Digital Library

[41]

Ruslan Salakhutdinov and Andriy Mnih. 2007. Probabilistic Matrix Factorization. In Proceedings of NIPS. Curran Associates Inc., USA, 1257--1264.

[42]

Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. 2007. Restricted Boltzmann Machines for Collaborative Filtering. In Proceedings of ICML . 791--798.

Digital Library

[43]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based Collaborative Filtering Recommendation Algorithms. In Proceedings of WWW . ACM, 285--295.

Digital Library

[44]

Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. AutoRec: Autoencoders Meet Collaborative Filtering. In Proceedings of WWW . ACM, 111--112.

Digital Library

[45]

Guy Shani, David Heckerman, and Ronen I. Brafman. 2005. An MDP-Based Recommender System. J. Mach. Learn. Res., Vol. 6 (Dec. 2005), 1265--1295.

Digital Library

[46]

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-Attention with Relative Position Representations. In Proceedings of NAACL . 464--468.

[47]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res., Vol. 15, 1 (Jan. 2014), 1929--1958.

Digital Library

[48]

Gongbo Tang, Mathias Müller, Annette Rios, and Rico Sennrich. 2018. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. In Proceedings of EMNLP. 4263--4272.

[49]

Jiaxi Tang and Ke Wang. 2018. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding. In Proceedings of WSDM. 565--573.

Digital Library

[50]

Wilson L. Taylor. 1953. "Cloze Procedure": A New Tool for Measuring Readability. Journalism Bulletin, Vol. 30, 4 (1953), 415--433.

[51]

Aaron van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep content-based music recommendation. In Proceedings of NIPS . 2643--2651.

[52]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. Curran Associates, Inc., 5998--6008.

[53]

Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative Deep Learning for Recommender Systems. In Proceedings of KDD . ACM, 1235--1244.

Digital Library

[54]

Shoujin Wang, Liang Hu, Longbing Cao, Xiaoshui Huang, Defu Lian, and Wei Liu. 2018. Attention-Based Transactional Context Embedding for Next-Item Recommendation. In Proceedings of AAAI. 2532--2539.

[55]

Suhang Wang, Yilin Wang, Jiliang Tang, Kai Shu, Suhas Ranganath, and Huan Liu. 2017. What Your Images Reveal: Exploiting Visual Contents for Point-of-Interest Recommendation. In Proceedings of WWW. 391--400.

Digital Library

[56]

Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J. Smola, and How Jing. 2017. Recurrent Recommender Networks. In Proceedings of WSDM. ACM, 495--503.

Digital Library

[57]

Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collaborative Denoising Auto-Encoders for Top-N Recommender Systems. In Proceedings of WSDM . ACM, 153--162.

Digital Library

[58]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification. In Proceedings of NAACL . 1480--1489.

[59]

Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. A Dynamic Recurrent Model for Next Basket Recommendation. In Proceedings of SIGIR . ACM, 729--732.

Digital Library

Cited By

Alves Gomes MMeisen PMeisen T(2025)Efficient Personalization in E-Commerce: Leveraging Universal Customer Representations with EmbeddingsJournal of Theoretical and Applied Electronic Commerce Research10.3390/jtaer2001001220:1(12)Online publication date: 16-Jan-2025
https://doi.org/10.3390/jtaer20010012
Sun JWang ZWu GWang HQiao BHan D(2025)Discreetly Exploiting Inter-Session Information for Session-Based RecommendationApplied Sciences10.3390/app1504215115:4(2151)Online publication date: 18-Feb-2025
https://doi.org/10.3390/app15042151
Chen XXie HTao XWang FZhang DDai H(2025)A computational analysis of aspect-based sentiment analysis research through bibliometric mapping and topic modelingJournal of Big Data10.1186/s40537-025-01068-y12:1Online publication date: 19-Feb-2025
https://doi.org/10.1186/s40537-025-01068-y
Show More Cited By

Index Terms

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Contrastive Learning with Bidirectional Transformers for Sequential Recommendation
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Contrastive learning with Transformer-based sequence encoder has gained predominance for sequential recommendation. It maximizes the agreements between paired sequence augmentations that share similar semantics. However, existing contrastive learning ...
Sequential Recommendation with Dual Side Neighbor-based Collaborative Relation Modeling
WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining

Sequential recommendation task aims to predict user preference over items in the future given user historical behaviors. The order of user behaviors implies that there are resourceful sequential patterns embedded in the behavior history which reveal the ...
Integrating Keywords into BERT4Rec for Sequential Recommendation
KI 2020: Advances in Artificial Intelligence
Abstract
A crucial part of recommender systems is to model the user’s preference based on her previous interactions. Different neural networks (e.g., Recurrent Neural Networks), that predict the next item solely based on the sequence of interactions have ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

November 2019

3373 pages

ISBN:9781450369763

DOI:10.1145/3357384

General Chairs:
Wenwu Zhu
Tsinghua University, China
,
Dacheng Tao
University of Massachusetts, USA
,
Xueqi Cheng
Institute of Computing Technology, CAS, China
,
Program Chairs:
Peng Cui
Tsinghua University, China
,
Elke Rundensteiner
Worcester Polytechnic Institute, USA
,
David Carmel
Amazon Research, USA
,
Qi He
LinkedIn, USA
,
Jeffrey Xu Yu
Chinese University of Hong Kong, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '19

Sponsor:

CIKM '19: The 28th ACM International Conference on Information and Knowledge Management

November 3 - 7, 2019

Beijing, China

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,315
Total Citations
View Citations
7,707
Total Downloads

Downloads (Last 12 months)1,535
Downloads (Last 6 weeks)148

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alves Gomes MMeisen PMeisen T(2025)Efficient Personalization in E-Commerce: Leveraging Universal Customer Representations with EmbeddingsJournal of Theoretical and Applied Electronic Commerce Research10.3390/jtaer2001001220:1(12)Online publication date: 16-Jan-2025
https://doi.org/10.3390/jtaer20010012
Sun JWang ZWu GWang HQiao BHan D(2025)Discreetly Exploiting Inter-Session Information for Session-Based RecommendationApplied Sciences10.3390/app1504215115:4(2151)Online publication date: 18-Feb-2025
https://doi.org/10.3390/app15042151
Chen XXie HTao XWang FZhang DDai H(2025)A computational analysis of aspect-based sentiment analysis research through bibliometric mapping and topic modelingJournal of Big Data10.1186/s40537-025-01068-y12:1Online publication date: 19-Feb-2025
https://doi.org/10.1186/s40537-025-01068-y
Huang XZhang XHuang TLiu LWen J(2025)Multi-task intent recommendation based on dynamic and static intent integration and disentanglementIntelligent Data Analysis: An International Journal10.1177/1088467X241301915Online publication date: 13-Feb-2025
https://doi.org/10.1177/1088467X241301915
Zhang JSun WHou YZhao XWen J(2025)Review-Enhanced Universal Sequence Representation Learning for Recommender SystemsACM Transactions on Information Systems10.1145/3717832Online publication date: 14-Feb-2025
https://doi.org/10.1145/3717832
Boz AZorgdrager WKotti ZHarte JLouridas PKarakoidas VJannach DFragkoulis M(2025)Improving Sequential Recommendations with LLMsACM Transactions on Recommender Systems10.1145/3711667Online publication date: 10-Jan-2025
https://doi.org/10.1145/3711667
Yuan WYang CQu LHung Nguyen QYe GYin H(2025)PTF-FSR: A Parameter Transmission-Free Federated Sequential Recommender SystemACM Transactions on Information Systems10.1145/370834443:2(1-24)Online publication date: 28-Jan-2025
https://dl.acm.org/doi/10.1145/3708344
Wang WLin YRen PChen ZMine TZhao JZhao QZhang MBen XLi Y(2025)Privacy-Preserving Sequential Recommendation with Collaborative ConfusionACM Transactions on Information Systems10.1145/370720443:2(1-25)Online publication date: 18-Jan-2025
https://dl.acm.org/doi/10.1145/3707204
Wang YHe ZYue ZMcAuley JWang DNejdl WAuer SCha MMoens MNajork M(2025)Your Causal Self-Attentive Recommender Hosts a Lonely NeighborhoodProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703587(688-696)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703587
Yang XYu GHe JNejdl WAuer SCha MMoens MNajork M(2025)UIPN: User Intent Profiling Network for Multi Behavior Modeling in CTR PredictionProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703570(847-856)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703570
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten