CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations

Liu, Zhuang; Ma, Yunpu; Hildebrandt, Marcel; Ouyang, Yuanxin; Xiong, Zhang

doi:10.1007/s10115-022-01711-7

CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations

Regular Paper
Published: 15 July 2022

Volume 64, pages 2239–2265, (2022)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Zhuang Liu ORCID: orcid.org/0000-0001-6149-9667¹,
Yunpu Ma²,
Marcel Hildebrandt³,
Yuanxin Ouyang¹ &
…
Zhang Xiong⁴

529 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Sequential recommendations play a crucial role in many real-world applications. Due to the sequential nature, reinforcement learning has been employed to iteratively produce recommendations based on an observed stream of user behavior. In this setting, a recommendation agent interacts with the environments (users) by sequentially recommending items (actions) to maximize users’ overall long-term cumulative rewards. However, most reinforcement learning-based recommendation models only focus on extrinsic rewards based on user feedback, leading to sub-optimal policies if user-item interactions are sparse and fail to obtain the dynamic rewards based on the users’ preferences. As a remedy, we propose a dynamic intrinsic reward signal integrated with a contrastive discriminator-augmented reinforcement learning framework. Concretely, our framework contains two modules: (1) a contrastive learning module is employed to learn the representation of item sequences; (2) an intrinsic reward learning function to imitate the user’s internal dynamics. Furthermore, we combine static extrinsic reward and dynamic intrinsic reward to train a sequential recommender system based on double Q-learning. We integrate our framework with five representative sequential recommendation models. Specifically, our framework augments these recommendation models with two output layers: the supervised layer that applies cross-entropy loss to perform ranking and the other for reinforcement learning. Experimental results on two real-world datasets demonstrate that the proposed framework outperforms several sequential recommendation baselines and exploration with intrinsic reward baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning-based denoising network for sequential recommendation

Article 28 April 2022

Multi-objective Meta-return Reinforcement Learning for Sequential Recommendation

Efficient Dual-Process Cognitive Recommender Balancing Accuracy and Diversity

Availability of data and materials

The datasets can be found in https://recsys.acm.org/recsys15/challenge/ and https://www.kaggle.com/retailrocket/ecommerce-dataset.

References

Burda Y, Edwards H, Pathak D, Storkey AJ, Darrell T, Efros AA (2019) Large-scale study of curiosity-driven learning. ICLR (poster). OpenReview.net
Chen M, Beutel A, Covington P, Jain S, Belletti F, Chi EH (2019) Top-k off-policy correction for a REINFORCE recommender system. In: WSDM. ACM, pp 456–464
Chen T, Kornblith S, Norouzi M, Hinton GE (2020) A simple framework for contrastive learning of visual representations. In: ICML, vol 119. PMLR, pp 1597–1607
Chen T, Kornblith S, Swersky K, Norouzi M, Hinton GE (2020) Big self-supervised models are strong semi-supervised learners. In: Neurips
Chen T, Wong RC (2020) Handling information loss of graph neural networks for session-based recommendation. In: KDD. ACM, pp 1172–1180
Chen X, Fan H, Girshick RB, He K (2020) Improved baselines with momentum contrastive learning. CoRR, arXiv:2003.04297
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1). Association for Computational Linguistics, pp 4171–4186
Glorot X, Bengio Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, vol 9. JMLR.org, pp 249–256
Haghgoo B, Zhou A, Sharma A, Finn C (2021) Discriminator augmented model-based reinforcement learning. CoRR, arXiv:2103.12999
Hassani K, Ahmadi AHK (2020) Contrastive multi-view representation learning on graphs. In: ICML, vol 119. PMLR, pp 4116–4126
He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast for unsupervised visual representation learning. In: CVPR. IEEE, pp 9726–9735
Hénaff OJ (2020) Data-efficient image recognition with contrastive predictive coding. In: ICML, vol 119. PMLR, pp 4182–4192
Hidasi B, Karatzoglou A (2018) Recurrent neural networks with top-k gains for session-based recommendations. In: CIKM. ACM, pp 843–852
Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. In: ICLR (poster)
Kang W, McAuley JJ (2018) Self-attentive sequential recommendation. In: ICDM. IEEE Computer Society, pp 197–206
Kim H, Kim J, Jeong Y, Levine S, Song HO (2019) EMI: exploration with mutual information. In: ICML, vol 97. PMLR, pp 3360–3369
Kumar NM (2018) Empowerment-driven exploration using mutual information estimation. CoRR, arXiv:1810.05533
Kurbiel T, Khaleghian S (2017) Training of deep neural networks based on distance measures using rmsprop. arXiv preprint arXiv:1708.01911
Lei Y, Pei H, Yan H, Li W (2020) Reinforcement learning based recommendation with graph convolutional q-network. In: SIGIR. ACM, pp 1757–1760
Li J, Ren P, Chen Z, Ren Z, Lian T, Ma J (2017) Neural attentive session-based recommendation. In: CIKM. ACM, pp 1419–1428
Li Y, Gu C, Dullien T, Vinyals O, Kohli P (2019) Graph matching networks for learning the similarity of graph structured objects. In: ICML, vol 97. PMLR, pp 3835–3845
Lydia A, Francis S (2019) Adagrad—an optimizer for stochastic gradient descent. Int J Inf Comput Sci 6(5):566–568
Google Scholar
Ma J, Zhao Z, Yi X, Yang J, Chen M, Tang J, Chi EH (2020) Off-policy learning in two-stage recommender systems. In: WWW. ACM/IW3C2, pp 463–473
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS, pp 3111–3119
Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiositydriven exploration by self-supervised prediction. In: ICML, vol 70. PMLR, pp 2778–2787
Sun F, Hoffmann J, Verma V, Tang J (2020) Infograph: unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In: ICLR. OpenReview.net
Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding. In: WSDM. ACM, pp 565–573
van den Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. CoRR, arXiv:1807.03748
van Hasselt H (2010) Double q-learning. In: NIPS. Curran Associates, Inc., pp 2613–2621
Wang S, Hu L,Wang Y, Cao L, Sheng QZ, Orgun MA (2019) Sequential recommender systems: challenges, progress and prospects. In: IJCAI. ijcai.org, pp 6332–6338
Wang W, Zhang W, Liu S, Liu Q, Zhang B, Lin L, Zha H (2020) Beyond clicks: modeling multi-relational item graph for session-based target behavior prediction. In: WWW. ACM/IW3C2, pp 3056–3062
Wang Z, Wei W, Cong G, Li X, Mao X, Qiu M (2020) Global context enhanced graph neural networks for session-based recommendation. In: SIGIR. ACM, pp 169–178
Wu S, Tang Y, Zhu Y,Wang L, Xie X, Tan T (2019) Session-based recommendation with graph neural networks. In: AAAI. AAAI Press, pp 346–353
Xian Y, Fu Z, Muthukrishnan S, de Melo G, Zhang Y (2019) Reinforcement knowledge graph reasoning for explainable recommendation. In: SIGIR. ACM, pp 285–294
Xie X, Sun F, Liu Z, Gao J, Ding B, Cui B (2020) Contrastive pre-training for sequential recommendation. CoRR, arXiv:2010.14395
Xin X, Karatzoglou A, Arapakis I, Jose JM (2020) Self-supervised reinforcement learning for recommender systems. In: SIGIR. ACM, pp 931–940
Yu X, Lyu Y, Tsang IW (2020) Intrinsic reward driven imitation learning via generative model. In: ICML, vol 119. PMLR, pp 10925–10935
Yuan F, Karatzoglou A, Arapakis I, Jose JM, He X (2019) A simple convolutional generative network for next item recommendation. In: WSDM. ACM, pp 582–590
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
Zhao X, Zhang L, Ding Z, Xia L, Tang J, Yin D (2018) Recommendations with negative feedback via pairwise deep reinforcement learning. In: KDD. ACM, pp 1040–1048
Zhou C, Ma J, Zhang J, Zhou J, Yang H (2021) Contrastive learning for debiased candidate generation in large-scale recommender systems

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (No.61977002), the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A and the State Key Laboratory of Software Development Environment of China (No. SKLSDE-2022ZX-14). The authors of this work take full responsibilities for its content. We thank the anonymous reviewers for their insightful comments and suggestions on this paper.

Funding

This research is partially funded by the National Natural Science Foundation of China (No. 61977002), the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A and the State Key Laboratory of Software Development Environment of China (No. SKLSDE-2022ZX-14).

Author information

Authors and Affiliations

State Key Laboratory of Software Development Environment, Beihang University, Beijing, China
Zhuang Liu & Yuanxin Ouyang
Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining, München, Germany
Yunpu Ma
University of Munich and Siemens AG, München, Germany
Marcel Hildebrandt
Engineering Research Center of Advanced Computer Application Technology, Ministry of Education, Beihang University, Beijing, China
Zhang Xiong

Authors

Zhuang Liu
View author publications
You can also search for this author inPubMed Google Scholar
Yunpu Ma
View author publications
You can also search for this author inPubMed Google Scholar
Marcel Hildebrandt
View author publications
You can also search for this author inPubMed Google Scholar
Yuanxin Ouyang
View author publications
You can also search for this author inPubMed Google Scholar
Zhang Xiong
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhuang Liu.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this article.

Code availability

We train all models on a single NVIDIA GeForce GTX 1080 Ti GPU.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Ma, Y., Hildebrandt, M. et al. CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations. Knowl Inf Syst 64, 2239–2265 (2022). https://doi.org/10.1007/s10115-022-01711-7

Download citation

Received: 29 August 2021
Revised: 22 June 2022
Accepted: 26 June 2022
Published: 15 July 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10115-022-01711-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement learning-based denoising network for sequential recommendation

Multi-objective Meta-return Reinforcement Learning for Sequential Recommendation

Efficient Dual-Process Cognitive Recommender Balancing Accuracy and Diversity

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now