research-article

Locality-Sensitive State-Guided Experience Replay Optimization for Sparse Rewards in Online Recommendation

Authors:

Julian McAuley,

Xianzhi WangAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1316 - 1325

https://doi.org/10.1145/3477495.3532015

Published: 07 July 2022 Publication History

Abstract

Online recommendation requires handling rapidly changing user preferences. Deep reinforcement learning (DRL) is an effective means of capturing users' dynamic interest during interactions with recommender systems. Generally, it is challenging to train a DRL agent in online recommender systems because of the sparse rewards caused by the large action space (e.g., candidate item space) and comparatively fewer user interactions. Leveraging experience replay (ER) has been extensively studied to conquer the issue of sparse rewards. However, they adapt poorly to the complex environment of online recommender systems and are inefficient in learning an optimal strategy from past experience. As a step to filling this gap, we propose a novel state-aware experience replay model, in which the agent selectively discovers the most relevant and salient experiences and is guided to find the optimal policy for online recommendations. In particular, a locality-sensitive hashing method is proposed to selectively retain the most meaningful experience at scale and a prioritized reward-driven strategy is designed to replay more valuable experiences with higher chance. We formally show that the proposed method guarantees the upper and lower bound on experience replay and optimizes the space complexity, as well as empirically demonstrate our model's superiority to several existing experience replay methods over three benchmark simulation platforms.

References

[1]

Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight experience replay. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5048--5058.

[2]

Xueying Bai, Jian Guan, and Hongning Wang. 2019. A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 10735--10746. https://proceedings.neurips.cc/paper/2019/file/e49eb6523da9e1c347bc148ea8ac55d3-Paper.pdf

[3]

Haokun Chen, Xinyi Dai, Han Cai, Weinan Zhang, Xuejian Wang, Ruiming Tang, Yuzhou Zhang, and Yong Yu. 2019. Large-scale interactive recommendation with tree-structured policy gradient. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. AAAI Press, 3312--3320.

Digital Library

[4]

Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1187--1196.

Digital Library

[5]

Xiaocong Chen, Chaoran Huang, Lina Yao, Xianzhi Wang, Wenjie Zhang, et al. 2020. Knowledge-guided deep reinforcement learning for interactive recommendation. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.

[6]

Xiaocong Chen, Lina Yao, Julian McAuley, Guangling Zhou, and Xianzhi Wang. 2021. A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions. arXiv preprint arXiv:2109.03540 (2021).

[7]

Tim De Bruin, Jens Kober, Karl Tuyls, and Robert Babuvs ka. 2015. The importance of experience replay database composition in deep reinforcement learning. In Deep reinforcement learning workshop, NIPS.

[8]

Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015).

[9]

Scott Fujimoto, Herke Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning. PMLR, 1587--1596.

[10]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861--1870.

[11]

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, and David Silver. 2018. Distributed Prioritized Experience Replay. In International Conference on Learning Representations.

[12]

Yuenan Hou, Lifeng Liu, Qing Wei, Xudong Xu, and Chunlin Chen. 2017. A novel ddpg method with prioritized experience replay. In 2017 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, 316--321.

Digital Library

[13]

Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . 368--377.

Digital Library

[14]

Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, and Craig Boutilier. 2019 a. SLATEQ: a tractable decomposition for reinforcement learning with recommendation sets. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2592--2599.

[15]

Eugene Ie, Chih wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019 b. RecSim: A Configurable Simulation Platform for Recommender Systems. (2019). arxiv: 1909.04847 [cs.LG]

[16]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. 604--613.

Digital Library

[17]

David Isele and Akansel Cosgun. 2018. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[18]

Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The Efficient Transformer. In International Conference on Learning Representations. https://openreview.net/forum?id=rkgNKkHtvB

[19]

Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008--1014.

[20]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).

[21]

Feng Liu, Huifeng Guo, Xutao Li, Ruiming Tang, Yunming Ye, and Xiuqiang He. 2020. End-to-End Deep Reinforcement Learning based Recommendation with Supervised Embedding. In Proceedings of the 13th International Conference on Web Search and Data Mining. 384--392.

Digital Library

[22]

Jieliang Luo and Hui Li. 2020. Dynamic experience replay. In Conference on Robot Learning. PMLR, 1191--1200.

[23]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).

[24]

John Nolan. 2003. Stable distributions: models for heavy-tailed data. Birkhauser New York.

[25]

Guido Novati and Petros Koumoutsakos. 2019. Remember and forget for experience replay. In International Conference on Machine Learning. PMLR, 4851--4860.

[26]

Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, and Qing He. 2019. Policy gradients for contextual recommendations. In The World Wide Web Conference. 1421--1431.

Digital Library

[27]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026--8037.

[28]

David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. 2018. Recogym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720 (2018).

[29]

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2016. Prioritized experience replay. In International Conference on Learning Representations.

[30]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889--1897.

[31]

Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, and An-Xiang Zeng. 2019. Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4902--4909.

Digital Library

[32]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et almbox. 2016. Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529, 7587 (2016), 484--489.

[33]

Peiquan Sun, Wengang Zhou, and Houqiang Li. 2020. Attentive experience replay. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5900--5907.

[34]

Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2447--2456.

Digital Library

[35]

Yikun Xian, Zuohui Fu, S Muthukrishnan, Gerard de Melo, and Yongfeng Zhang. 2019. Reinforcement Knowledge Graph Reasoning for Explainable Recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 285--294.

Digital Library

[36]

Daochen Zha, Kwei-Herng Lai, Kaixiong Zhou, and Xia Hu. 2019. Experience Replay Optimization. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 4243--4249. https://doi.org/10.24963/ijcai.2019/589

[37]

Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR), Vol. 52, 1 (2019), 1--38.

Digital Library

[38]

Kangzhi Zhao, Xiting Wang, Yuren Zhang, Li Zhao, Zheng Liu, Chunxiao Xing, and Xing Xie. 2020 a. Leveraging Demonstrations for Reinforcement Recommendation Reasoning over Knowledge Graphs. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 239--248.

Digital Library

[39]

Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin. 2019. Deep reinforcement learning for search, recommendation, and online advertising: a survey. SIGWEB Newsl., Vol. 2019, Spring (2019), 4:1--4:15. https://doi.org/10.1145/3320496.3320500

Digital Library

[40]

Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems. 95--103.

Digital Library

[41]

Xiangyu Zhao, Xudong Zheng, Xiwang Yang, Xiaobing Liu, and Jiliang Tang. 2020 b. Jointly learning to recommend and advertise. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3319--3327.

Digital Library

[42]

Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 World Wide Web Conference. 167--176.

Digital Library

Cited By

Wang SChen XMcAuley JCripps SYao L(2025)Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based RecommendationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.332980836:1(1044-1055)Online publication date: Jan-2025
https://doi.org/10.1109/TNNLS.2023.3329808
Chen XWang SMcAuley JJannach DYao L(2024)On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender SystemsACM Transactions on Information Systems10.1145/366199642:6(1-26)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3661996
Chen XWang SYao LBaeza-Yates RBonchi F(2024)Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671750(376-384)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671750
Show More Cited By

Index Terms

Locality-Sensitive State-Guided Experience Replay Optimization for Sparse Rewards in Online Recommendation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Hindsight-Combined and Hindsight-Prioritized Experience Replay
Neural Information Processing
Abstract
Reinforcement learning has proved to be of great utility; execution, however, may be costly due to sampling inefficiency. An efficient method for training is experience replay, which recalls past experiences. Several experience replay techniques, ...
Deep reinforcement learning for page-wise recommendations
RecSys '18: Proceedings of the 12th ACM Conference on Recommender Systems

Recommender systems can mitigate the information overload problem by suggesting users' personalized items. In real-world recommendations such as e-commerce, a typical interaction between the system and its users is - users are recommended a page of ...
Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning
AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

Deep Reinforcement Learning (RL) methods rely on experience replay to approximate the minibatched supervised learning setting; however, unlike supervised learning where access to lots of training data is crucial to generalization, replay-based deep RL ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
290
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)3

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang SChen XMcAuley JCripps SYao L(2025)Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based RecommendationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.332980836:1(1044-1055)Online publication date: Jan-2025
https://doi.org/10.1109/TNNLS.2023.3329808
Chen XWang SMcAuley JJannach DYao L(2024)On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender SystemsACM Transactions on Information Systems10.1145/366199642:6(1-26)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3661996
Chen XWang SYao LBaeza-Yates RBonchi F(2024)Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671750(376-384)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671750
Neves DIshitani Ldo Patrocínio Júnior Z(2024)Advances and challenges in learning from experience replayArtificial Intelligence Review10.1007/s10462-024-11062-058:2Online publication date: 20-Dec-2024
https://doi.org/10.1007/s10462-024-11062-0
Hu ZGao XWan KWang QZhai Y(2023)Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning Approach for UAV Autonomous Motion Control in Unknown Dynamic EnvironmentsIEEE Transactions on Vehicular Technology10.1109/TVT.2023.3285595(1-16)Online publication date: 2023
https://doi.org/10.1109/TVT.2023.3285595
Wang LGuo DLiu X(2023)Research on Intelligent Recommendation Technology for Complex Tasks2023 4th International Conference on Computer Engineering and Application (ICCEA)10.1109/ICCEA58433.2023.10135209(353-360)Online publication date: 7-Apr-2023
https://doi.org/10.1109/ICCEA58433.2023.10135209
Chen XYao LMcAuley JZhou GWang X(2023)Deep reinforcement learning in recommender systemsKnowledge-Based Systems10.1016/j.knosys.2023.110335264:COnline publication date: 15-Mar-2023
https://dl.acm.org/doi/10.1016/j.knosys.2023.110335
Chen XWang SQi LLi YYao L(2023)Intrinsically motivated reinforcement learning based recommendation with counterfactual data augmentationWorld Wide Web10.1007/s11280-023-01187-726:5(3253-3274)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1007/s11280-023-01187-7
Chen XYao LChang XWang S(2022)Empowerment-driven Policy Gradient Learning with Counterfactual Augmentation in Recommender Systems2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00102(885-890)Online publication date: Nov-2022
https://doi.org/10.1109/ICDM54844.2022.00102

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten