skip to main content
10.1145/3486001.3486244acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article

Towards Accelerating Offline RL based Recommender Systems

Published:22 October 2021Publication History

ABSTRACT

Reinforcement learning optimizes an objective function by learning an optimal policy for taking a sequence of actions in an environment. Offline RL is a mechanism to learn policy offline from pre-generated traces of an agent’s interactions in an environment, accelerating an agent’s initial learning phase. Real-time deployment of RL-based recommenders across geographies could mix online and offline RL algorithms to explore new users’ behavior and exploit old knowledge to learn recommendation policy. In such a scenario, RL agents will be distributed and deployed in multiple locations. In this paper, we share our experiences and learnings in accelerating our in-house offline RL-based recommender system. The recommender system employs a mix of Batch Constrained (BCQ) and distributional RL algorithms for building policy models. We present various acceleration techniques for this system, such as operators’ fusion, performance anti-patterns, heterogeneous deployments, and design space of synchronous and asynchronous distributed training over generative and policy models of the algorithm. We have shown that the presented acceleration techniques could speed up the training of the RL agents on A100 GPUs by a factor of 47 × over a naive code (written by the ML practitioner) implementation on A100.

References

  1. M. Mehdi Afsar, Trafford Crump, and Behrouz H. Far. 2021. Reinforcement learning based recommender systems: A survey. CoRR abs/2101.06286(2021). arxiv:2101.06286https://arxiv.org/abs/2101.06286Google ScholarGoogle Scholar
  2. Marc G. Bellemare, Will Dabney, and Rémi Munos. 2017. A Distributional Perspective on Reinforcement Learning. arxiv:1707.06887 [cs.LG]Google ScholarGoogle Scholar
  3. Yuwei Fu, Wu Di, and Benoit Boulet. 2020. Batch Reinforcement Learning in the Real World: A Survey. Offline RL Workshop, NeuroIPS(2020).Google ScholarGoogle Scholar
  4. Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, and Joelle Pineau. 2019. Benchmarking Batch Deep Reinforcement Learning Algorithms. arxiv:1910.01708 [cs.LG]Google ScholarGoogle Scholar
  5. Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-Policy Deep Reinforcement Learning without Exploration. In International Conference on Machine Learning. 2052–2062.Google ScholarGoogle Scholar
  6. Diksha Garg, Priyanka Gupta, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2020. Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation. CoRR abs/2012.08984(2020). arxiv:2012.08984https://arxiv.org/abs/2012.08984Google ScholarGoogle Scholar
  7. Priyanka Gupta, Diksha Garg, Pankaj Malhotra, Lovekesh Vig, and Gautam Shroff. 2019. NISER: Normalized Item and Session Representations with Graph Neural Networks. CoRR abs/1909.04276(2019). arxiv:1909.04276http://arxiv.org/abs/1909.04276Google ScholarGoogle Scholar
  8. Shruti Kunde, Amey Pandit, and Rekha Singhal. 2020. Benchmarking performance of RaySGD and Horovod for big data applications. In IEEE International Conference on Big Data, Big Data 2020, Atlanta, GA, USA, December 10-13, 2020. IEEE, 2757–2762. https://doi.org/10.1109/BigData50022.2020.9378470Google ScholarGoogle ScholarCross RefCross Ref
  9. Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arxiv:2005.01643 [cs.LG]Google ScholarGoogle Scholar
  10. Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, and Shixiang Shane Gu. 2020. DEPLOYMENT-EFFICIENT REINFORCEMENT LEARNING VIA MODEL-BASED OFFLINE OPTIMIZATION. Offline RL Workshop, NeuroIPS(2020).Google ScholarGoogle Scholar
  11. Gregory P. Meyer. 2020. An Alternative Probabilistic Interpretation of the Huber Loss. arxiv:1911.02088 [stat.ML]Google ScholarGoogle Scholar
  12. Louis Monier, Jakub Kmec, Alexandre Laterre, Thomas Pierrot, Valentin Courgeau, Olivier Sigaud, and Karim Beguir. 2020. Offline Reinforcement Learning Hands-On. CoRR abs/2011.14379(2020). arxiv:2011.14379https://arxiv.org/abs/2011.14379Google ScholarGoogle Scholar
  13. Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA) (OSDI’18). USENIX Association, USA, 561–577.Google ScholarGoogle Scholar
  14. Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, and David Silver. 2015. Massively Parallel Methods for Deep Reinforcement Learning. arxiv:1507.04296 [cs.LG]Google ScholarGoogle Scholar
  15. Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, and Nando de Freitas. 2020. Hyperparameter Selection for Offline Reinforcement Learning. arxiv:2007.09055 [cs.LG]Google ScholarGoogle Scholar
  16. Shoumik Palkar, James Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimajan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk, Saman Amarasinghe, Samuel Madden, and Matei Zaharia. 2018. Evaluating End-to-End Optimization for Data Analytics Applications in Weld. Proc. VLDB Endow. 11, 9 (May 2018), 1002–1015. https://doi.org/10.14778/3213880.3213890Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Christopher De Sa, Ce Zhang, Kunle Olukotun, and Christopher Ré. 2015. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 2674–2682. https://proceedings.neurips.cc/paper/2015/hash/98986c005e5def2da341b4e0627d4712-Abstract.htmlGoogle ScholarGoogle Scholar
  18. Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 346–353.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gina Yuan, Shoumik Palkar, Deepak Narayanan, and Matei Zaharia. 2020. Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 293–306. https://www.usenix.org/conference/atc20/presentation/yuanGoogle ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    AIMLSystems '21: Proceedings of the First International Conference on AI-ML Systems
    October 2021
    170 pages
    ISBN:9781450385947
    DOI:10.1145/3486001

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 22 October 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format