research-article

Towards Accelerating Offline RL based Recommender Systems

Authors:
Mayank Mishra

TCS Research, India

TCS Research, India
View Profile

,
Rekha Singhal

TCS Research, India

TCS Research, India
View Profile

,
Ravi Singh

TCS Research, India

TCS Research, India
View Profile

AIMLSystems '21: Proceedings of the First International Conference on AI-ML SystemsOctober 2021Article No.: 24Pages 1–7https://doi.org/10.1145/3486001.3486244

Published:22 October 2021Publication History

AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems

Pages 1–7

ABSTRACT

Reinforcement learning optimizes an objective function by learning an optimal policy for taking a sequence of actions in an environment. Offline RL is a mechanism to learn policy offline from pre-generated traces of an agent’s interactions in an environment, accelerating an agent’s initial learning phase. Real-time deployment of RL-based recommenders across geographies could mix online and offline RL algorithms to explore new users’ behavior and exploit old knowledge to learn recommendation policy. In such a scenario, RL agents will be distributed and deployed in multiple locations. In this paper, we share our experiences and learnings in accelerating our in-house offline RL-based recommender system. The recommender system employs a mix of Batch Constrained (BCQ) and distributional RL algorithms for building policy models. We present various acceleration techniques for this system, such as operators’ fusion, performance anti-patterns, heterogeneous deployments, and design space of synchronous and asynchronous distributed training over generative and policy models of the algorithm. We have shown that the presented acceleration techniques could speed up the training of the RL agents on A100 GPUs by a factor of 47 × over a naive code (written by the ML practitioner) implementation on A100.

References

M. Mehdi Afsar, Trafford Crump, and Behrouz H. Far. 2021. Reinforcement learning based recommender systems: A survey. CoRR abs/2101.06286(2021). arxiv:2101.06286https://arxiv.org/abs/2101.06286Google Scholar
Marc G. Bellemare, Will Dabney, and Rémi Munos. 2017. A Distributional Perspective on Reinforcement Learning. arxiv:1707.06887 [cs.LG]Google Scholar
Yuwei Fu, Wu Di, and Benoit Boulet. 2020. Batch Reinforcement Learning in the Real World: A Survey. Offline RL Workshop, NeuroIPS(2020).Google Scholar
Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, and Joelle Pineau. 2019. Benchmarking Batch Deep Reinforcement Learning Algorithms. arxiv:1910.01708 [cs.LG]Google Scholar
Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-Policy Deep Reinforcement Learning without Exploration. In International Conference on Machine Learning. 2052–2062.Google Scholar
Diksha Garg, Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2020. Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation. CoRR abs/2012.08984(2020). arxiv:2012.08984https://arxiv.org/abs/2012.08984Google Scholar
Priyanka Gupta, Diksha Garg, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2019. NISER: Normalized Item and Session Representations with Graph Neural Networks. CoRR abs/1909.04276(2019). arxiv:1909.04276http://arxiv.org/abs/1909.04276Google Scholar
Shruti Kunde, Amey Pandit, and Rekha Singhal. 2020. Benchmarking performance of RaySGD and Horovod for big data applications. In IEEE International Conference on Big Data, Big Data 2020, Atlanta, GA, USA, December 10-13, 2020. IEEE, 2757–2762. https://doi.org/10.1109/BigData50022.2020.9378470Google ScholarCross Ref
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arxiv:2005.01643 [cs.LG]Google Scholar
Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, and Shixiang Shane Gu. 2020. DEPLOYMENT-EFFICIENT REINFORCEMENT LEARNING VIA MODEL-BASED OFFLINE OPTIMIZATION. Offline RL Workshop, NeuroIPS(2020).Google Scholar
Gregory P. Meyer. 2020. An Alternative Probabilistic Interpretation of the Huber Loss. arxiv:1911.02088 [stat.ML]Google Scholar
Louis Monier, Jakub Kmec, Alexandre Laterre, Thomas Pierrot, Valentin Courgeau, Olivier Sigaud, and Karim Beguir. 2020. Offline Reinforcement Learning Hands-On. CoRR abs/2011.14379(2020). arxiv:2011.14379https://arxiv.org/abs/2011.14379Google Scholar
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA) (OSDI’18). USENIX Association, USA, 561–577.Google Scholar
Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, and David Silver. 2015. Massively Parallel Methods for Deep Reinforcement Learning. arxiv:1507.04296 [cs.LG]Google Scholar
Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, and Nando de Freitas. 2020. Hyperparameter Selection for Offline Reinforcement Learning. arxiv:2007.09055 [cs.LG]Google Scholar
Shoumik Palkar, James Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimajan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk, Saman Amarasinghe, Samuel Madden, and Matei Zaharia. 2018. Evaluating End-to-End Optimization for Data Analytics Applications in Weld. Proc. VLDB Endow. 11, 9 (May 2018), 1002–1015. https://doi.org/10.14778/3213880.3213890Google ScholarDigital Library
Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré. 2015. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 2674–2682. https://proceedings.neurips.cc/paper/2015/hash/98986c005e5def2da341b4e0627d4712-Abstract.htmlGoogle Scholar
Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.Google ScholarDigital Library
Gina Yuan, Shoumik Palkar, Deepak Narayanan, and Matei Zaharia. 2020. Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 293–306. https://www.usenix.org/conference/atc20/presentation/yuanGoogle Scholar

Recommendations

On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems
Reinforcement learning serves as a potent tool for modeling dynamic user interests within recommender systems, garnering increasing research attention of late. However, a significant drawback persists: its poor data efficiency, stemming from its ...
Read More
Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Reinforcement learning-based recommender systems have recently gained popularity. However, the design of the reward function, on which the agent relies to optimize its recommendation policy, is often not straightforward. Exploring the causality ...
Read More
Provably Efficient Offline RL with Options
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

Temporal abstraction helps to reduce the sample complexity in long-horizon planning in reinforcement learning (RL). One powerful approach is the options framework, where the agent interacts with the environment using closed-loop policies, i.e., options, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems
October 2021
170 pages
ISBN:9781450385947
DOI:10.1145/3486001

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Distributed Training
GPU
Offline Reinforcement Learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 104
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Towards Accelerating Offline RL based Recommender Systems

AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems

ABSTRACT

References

Cited By

Recommendations

On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems

Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning

Provably Efficient Offline RL with Options

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Towards Accelerating Offline RL based Recommender Systems

AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems

ABSTRACT

References

Cited By

Recommendations

On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems

Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning

Provably Efficient Offline RL with Options

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media