ABSTRACT
Reinforcement learning optimizes an objective function by learning an optimal policy for taking a sequence of actions in an environment. Offline RL is a mechanism to learn policy offline from pre-generated traces of an agent’s interactions in an environment, accelerating an agent’s initial learning phase. Real-time deployment of RL-based recommenders across geographies could mix online and offline RL algorithms to explore new users’ behavior and exploit old knowledge to learn recommendation policy. In such a scenario, RL agents will be distributed and deployed in multiple locations. In this paper, we share our experiences and learnings in accelerating our in-house offline RL-based recommender system. The recommender system employs a mix of Batch Constrained (BCQ) and distributional RL algorithms for building policy models. We present various acceleration techniques for this system, such as operators’ fusion, performance anti-patterns, heterogeneous deployments, and design space of synchronous and asynchronous distributed training over generative and policy models of the algorithm. We have shown that the presented acceleration techniques could speed up the training of the RL agents on A100 GPUs by a factor of 47 × over a naive code (written by the ML practitioner) implementation on A100.
- M. Mehdi Afsar, Trafford Crump, and Behrouz H. Far. 2021. Reinforcement learning based recommender systems: A survey. CoRR abs/2101.06286(2021). arxiv:2101.06286https://arxiv.org/abs/2101.06286Google Scholar
- Marc G. Bellemare, Will Dabney, and Rémi Munos. 2017. A Distributional Perspective on Reinforcement Learning. arxiv:1707.06887 [cs.LG]Google Scholar
- Yuwei Fu, Wu Di, and Benoit Boulet. 2020. Batch Reinforcement Learning in the Real World: A Survey. Offline RL Workshop, NeuroIPS(2020).Google Scholar
- Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, and Joelle Pineau. 2019. Benchmarking Batch Deep Reinforcement Learning Algorithms. arxiv:1910.01708 [cs.LG]Google Scholar
- Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-Policy Deep Reinforcement Learning without Exploration. In International Conference on Machine Learning. 2052–2062.Google Scholar
- Diksha Garg, Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2020. Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation. CoRR abs/2012.08984(2020). arxiv:2012.08984https://arxiv.org/abs/2012.08984Google Scholar
- Priyanka Gupta, Diksha Garg, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2019. NISER: Normalized Item and Session Representations with Graph Neural Networks. CoRR abs/1909.04276(2019). arxiv:1909.04276http://arxiv.org/abs/1909.04276Google Scholar
- Shruti Kunde, Amey Pandit, and Rekha Singhal. 2020. Benchmarking performance of RaySGD and Horovod for big data applications. In IEEE International Conference on Big Data, Big Data 2020, Atlanta, GA, USA, December 10-13, 2020. IEEE, 2757–2762. https://doi.org/10.1109/BigData50022.2020.9378470Google ScholarCross Ref
- Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arxiv:2005.01643 [cs.LG]Google Scholar
- Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, and Shixiang Shane Gu. 2020. DEPLOYMENT-EFFICIENT REINFORCEMENT LEARNING VIA MODEL-BASED OFFLINE OPTIMIZATION. Offline RL Workshop, NeuroIPS(2020).Google Scholar
- Gregory P. Meyer. 2020. An Alternative Probabilistic Interpretation of the Huber Loss. arxiv:1911.02088 [stat.ML]Google Scholar
- Louis Monier, Jakub Kmec, Alexandre Laterre, Thomas Pierrot, Valentin Courgeau, Olivier Sigaud, and Karim Beguir. 2020. Offline Reinforcement Learning Hands-On. CoRR abs/2011.14379(2020). arxiv:2011.14379https://arxiv.org/abs/2011.14379Google Scholar
- Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA) (OSDI’18). USENIX Association, USA, 561–577.Google Scholar
- Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, and David Silver. 2015. Massively Parallel Methods for Deep Reinforcement Learning. arxiv:1507.04296 [cs.LG]Google Scholar
- Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, and Nando de Freitas. 2020. Hyperparameter Selection for Offline Reinforcement Learning. arxiv:2007.09055 [cs.LG]Google Scholar
- Shoumik Palkar, James Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimajan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk, Saman Amarasinghe, Samuel Madden, and Matei Zaharia. 2018. Evaluating End-to-End Optimization for Data Analytics Applications in Weld. Proc. VLDB Endow. 11, 9 (May 2018), 1002–1015. https://doi.org/10.14778/3213880.3213890Google ScholarDigital Library
- Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré. 2015. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 2674–2682. https://proceedings.neurips.cc/paper/2015/hash/98986c005e5def2da341b4e0627d4712-Abstract.htmlGoogle Scholar
- Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.Google ScholarDigital Library
- Gina Yuan, Shoumik Palkar, Deepak Narayanan, and Matei Zaharia. 2020. Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 293–306. https://www.usenix.org/conference/atc20/presentation/yuanGoogle Scholar
Recommendations
On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems
Reinforcement learning serves as a potent tool for modeling dynamic user interests within recommender systems, garnering increasing research attention of late. However, a significant drawback persists: its poor data efficiency, stemming from its ...
Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalReinforcement learning-based recommender systems have recently gained popularity. However, the design of the reward function, on which the agent relies to optimize its recommendation policy, is often not straightforward. Exploring the causality ...
Provably Efficient Offline RL with Options
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent SystemsTemporal abstraction helps to reduce the sample complexity in long-horizon planning in reinforcement learning (RL). One powerful approach is the options framework, where the agent interacts with the environment using closed-loop policies, i.e., options, ...
Comments