Abstract
Sequential recommendations play a crucial role in many real-world applications. Due to the sequential nature, reinforcement learning has been employed to iteratively produce recommendations based on an observed stream of user behavior. In this setting, a recommendation agent interacts with the environments (users) by sequentially recommending items (actions) to maximize users’ overall long-term cumulative rewards. However, most reinforcement learning-based recommendation models only focus on extrinsic rewards based on user feedback, leading to sub-optimal policies if user-item interactions are sparse and fail to obtain the dynamic rewards based on the users’ preferences. As a remedy, we propose a dynamic intrinsic reward signal integrated with a contrastive discriminator-augmented reinforcement learning framework. Concretely, our framework contains two modules: (1) a contrastive learning module is employed to learn the representation of item sequences; (2) an intrinsic reward learning function to imitate the user’s internal dynamics. Furthermore, we combine static extrinsic reward and dynamic intrinsic reward to train a sequential recommender system based on double Q-learning. We integrate our framework with five representative sequential recommendation models. Specifically, our framework augments these recommendation models with two output layers: the supervised layer that applies cross-entropy loss to perform ranking and the other for reinforcement learning. Experimental results on two real-world datasets demonstrate that the proposed framework outperforms several sequential recommendation baselines and exploration with intrinsic reward baselines.






Similar content being viewed by others
Availability of data and materials
The datasets can be found in https://recsys.acm.org/recsys15/challenge/ and https://www.kaggle.com/retailrocket/ecommerce-dataset.
References
Burda Y, Edwards H, Pathak D, Storkey AJ, Darrell T, Efros AA (2019) Large-scale study of curiosity-driven learning. ICLR (poster). OpenReview.net
Chen M, Beutel A, Covington P, Jain S, Belletti F, Chi EH (2019) Top-k off-policy correction for a REINFORCE recommender system. In: WSDM. ACM, pp 456–464
Chen T, Kornblith S, Norouzi M, Hinton GE (2020) A simple framework for contrastive learning of visual representations. In: ICML, vol 119. PMLR, pp 1597–1607
Chen T, Kornblith S, Swersky K, Norouzi M, Hinton GE (2020) Big self-supervised models are strong semi-supervised learners. In: Neurips
Chen T, Wong RC (2020) Handling information loss of graph neural networks for session-based recommendation. In: KDD. ACM, pp 1172–1180
Chen X, Fan H, Girshick RB, He K (2020) Improved baselines with momentum contrastive learning. CoRR, arXiv:2003.04297
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1). Association for Computational Linguistics, pp 4171–4186
Glorot X, Bengio Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, vol 9. JMLR.org, pp 249–256
Haghgoo B, Zhou A, Sharma A, Finn C (2021) Discriminator augmented model-based reinforcement learning. CoRR, arXiv:2103.12999
Hassani K, Ahmadi AHK (2020) Contrastive multi-view representation learning on graphs. In: ICML, vol 119. PMLR, pp 4116–4126
He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast for unsupervised visual representation learning. In: CVPR. IEEE, pp 9726–9735
Hénaff OJ (2020) Data-efficient image recognition with contrastive predictive coding. In: ICML, vol 119. PMLR, pp 4182–4192
Hidasi B, Karatzoglou A (2018) Recurrent neural networks with top-k gains for session-based recommendations. In: CIKM. ACM, pp 843–852
Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. In: ICLR (poster)
Kang W, McAuley JJ (2018) Self-attentive sequential recommendation. In: ICDM. IEEE Computer Society, pp 197–206
Kim H, Kim J, Jeong Y, Levine S, Song HO (2019) EMI: exploration with mutual information. In: ICML, vol 97. PMLR, pp 3360–3369
Kumar NM (2018) Empowerment-driven exploration using mutual information estimation. CoRR, arXiv:1810.05533
Kurbiel T, Khaleghian S (2017) Training of deep neural networks based on distance measures using rmsprop. arXiv preprint arXiv:1708.01911
Lei Y, Pei H, Yan H, Li W (2020) Reinforcement learning based recommendation with graph convolutional q-network. In: SIGIR. ACM, pp 1757–1760
Li J, Ren P, Chen Z, Ren Z, Lian T, Ma J (2017) Neural attentive session-based recommendation. In: CIKM. ACM, pp 1419–1428
Li Y, Gu C, Dullien T, Vinyals O, Kohli P (2019) Graph matching networks for learning the similarity of graph structured objects. In: ICML, vol 97. PMLR, pp 3835–3845
Lydia A, Francis S (2019) Adagrad—an optimizer for stochastic gradient descent. Int J Inf Comput Sci 6(5):566–568
Ma J, Zhao Z, Yi X, Yang J, Chen M, Tang J, Chi EH (2020) Off-policy learning in two-stage recommender systems. In: WWW. ACM/IW3C2, pp 463–473
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS, pp 3111–3119
Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiositydriven exploration by self-supervised prediction. In: ICML, vol 70. PMLR, pp 2778–2787
Sun F, Hoffmann J, Verma V, Tang J (2020) Infograph: unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In: ICLR. OpenReview.net
Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding. In: WSDM. ACM, pp 565–573
van den Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. CoRR, arXiv:1807.03748
van Hasselt H (2010) Double q-learning. In: NIPS. Curran Associates, Inc., pp 2613–2621
Wang S, Hu L,Wang Y, Cao L, Sheng QZ, Orgun MA (2019) Sequential recommender systems: challenges, progress and prospects. In: IJCAI. ijcai.org, pp 6332–6338
Wang W, Zhang W, Liu S, Liu Q, Zhang B, Lin L, Zha H (2020) Beyond clicks: modeling multi-relational item graph for session-based target behavior prediction. In: WWW. ACM/IW3C2, pp 3056–3062
Wang Z, Wei W, Cong G, Li X, Mao X, Qiu M (2020) Global context enhanced graph neural networks for session-based recommendation. In: SIGIR. ACM, pp 169–178
Wu S, Tang Y, Zhu Y,Wang L, Xie X, Tan T (2019) Session-based recommendation with graph neural networks. In: AAAI. AAAI Press, pp 346–353
Xian Y, Fu Z, Muthukrishnan S, de Melo G, Zhang Y (2019) Reinforcement knowledge graph reasoning for explainable recommendation. In: SIGIR. ACM, pp 285–294
Xie X, Sun F, Liu Z, Gao J, Ding B, Cui B (2020) Contrastive pre-training for sequential recommendation. CoRR, arXiv:2010.14395
Xin X, Karatzoglou A, Arapakis I, Jose JM (2020) Self-supervised reinforcement learning for recommender systems. In: SIGIR. ACM, pp 931–940
Yu X, Lyu Y, Tsang IW (2020) Intrinsic reward driven imitation learning via generative model. In: ICML, vol 119. PMLR, pp 10925–10935
Yuan F, Karatzoglou A, Arapakis I, Jose JM, He X (2019) A simple convolutional generative network for next item recommendation. In: WSDM. ACM, pp 582–590
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
Zhao X, Zhang L, Ding Z, Xia L, Tang J, Yin D (2018) Recommendations with negative feedback via pairwise deep reinforcement learning. In: KDD. ACM, pp 1040–1048
Zhou C, Ma J, Zhang J, Zhou J, Yang H (2021) Contrastive learning for debiased candidate generation in large-scale recommender systems
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (No.61977002), the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A and the State Key Laboratory of Software Development Environment of China (No. SKLSDE-2022ZX-14). The authors of this work take full responsibilities for its content. We thank the anonymous reviewers for their insightful comments and suggestions on this paper.
Funding
This research is partially funded by the National Natural Science Foundation of China (No. 61977002), the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A and the State Key Laboratory of Software Development Environment of China (No. SKLSDE-2022ZX-14).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this article.
Code availability
We train all models on a single NVIDIA GeForce GTX 1080 Ti GPU.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Z., Ma, Y., Hildebrandt, M. et al. CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations. Knowl Inf Syst 64, 2239–2265 (2022). https://doi.org/10.1007/s10115-022-01711-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01711-7