skip to main content
10.1145/3534678.3539266acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Importance Prioritized Policy Distillation

Authors Info & Claims
Published:14 August 2022Publication History

ABSTRACT

Policy distillation (PD) has been widely studied in deep reinforcement learning (RL), while existing PD approaches assume that the demonstration data (i.e., state-action pairs in frames) in a decision making sequence is uniformly distributed. This may bring in unwanted bias since RL is a reward maximizing process instead of simple label matching. Given such an issue, we denote the frame importance as its contribution to the expected reward on a particular frame, and hypothesize that adapting such frame importance could benefit the performance of the distilled student policy. To verify our hypothesis, we analyze why and how frame importance matters in RL settings. Based on the analysis, we propose an importance prioritized PD framework that highlights the training on important frames, so as to learn efficiently. Particularly, the frame importance is measured by the reciprocal of weighted Shannon entropy from a teacher policy's action prescriptions. Experiments on Atari games and policy compression tasks show that capturing the frame importance significantly boosts the performance of the distilled policies.

Skip Supplemental Material Section

Supplemental Material

KDD-rtfp0476.mp4

mp4

22.9 MB

References

  1. Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference on Machine Learning (ICML). 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Marc G Bellemare, Will Dabney, and Rémi Munos. 2017. A Distributional Perspective on Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning (ICML). 449--458.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research (JAIR) 47 (2013), 253--279.Google ScholarGoogle ScholarCross RefCross Ref
  4. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016).Google ScholarGoogle Scholar
  5. Jonathon Byrd and Zachary Lipton. 2019. What is the effect of importance weighting in deep learning?. In International Conference on Machine Learning (ICML). PMLR, 872--881.Google ScholarGoogle Scholar
  6. Wojciech M Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant Jayakumar, Grzegorz Swirszcz, and Max Jaderberg. 2019. Distilling Policy Distillation. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS). 1331--1340.Google ScholarGoogle Scholar
  7. Marc Fischer, Matthew Mirman, Steven Stalder, and Martin Vechev. 2019. Online robustness training for deep reinforcement learning. arXiv preprint arXiv:1911.00887 (2019).Google ScholarGoogle Scholar
  8. Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Matteo Hessel, Ian Osband, Alex Graves, Volodymyr Mnih, Remi Munos, Demis Hassabis, et al. 2018. Noisy Networks For Exploration. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  9. Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, and Ronald Ortner. 2018. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning. In Proceedings of 35th International Conference on Machine Learning (ICML), Vol. 80. 1578--1586.Google ScholarGoogle Scholar
  10. Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision (IJCV) 129, 6 (2021), 1789--1819.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. 2018. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI). 3215--3222.Google ScholarGoogle ScholarCross RefCross Ref
  12. Edwin T Jaynes. 1957. Information theory and statistical mechanics. Physical Review 106, 4 (1957), 620.Google ScholarGoogle ScholarCross RefCross Ref
  13. Kwei-Herng Lai, Daochen Zha, Yuening Li, and Xia Hu. 2020. Dual Policy Distillation. In International Joint Conference on Artificial Intelligence (IJCAI). 3146--3152.Google ScholarGoogle Scholar
  14. Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning.. In The International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  15. Seyed Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, and Hassan Ghasemzadeh. 2020. Improved knowledge distillation via teacher assistant. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. 5191--5198.Google ScholarGoogle ScholarCross RefCross Ref
  16. Eric Mitchell, Rafael Rafailov, Xue Bin Peng, Sergey Levine, and Chelsea Finn. 2021. Offline Meta-Reinforcement Learning with Advantage Weighting. In International Conference on Machine Learning (ICML). PMLR, 7780--7791.Google ScholarGoogle Scholar
  17. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.Google ScholarGoogle Scholar
  18. Xue Bin Peng, Aviral Kumar, Grace Zhang, and Sergey Levine. 2019. Advantageweighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177 (2019).Google ScholarGoogle Scholar
  19. Mary Phuong and Christoph Lampert. 2019. Towards understanding knowledge distillation. In International Conference on Machine Learning (ICML). PMLR, 5142-- 5151.Google ScholarGoogle Scholar
  20. Bilal Piot, Matthieu Geist, and Olivier Pietquin. 2016. Bridging the gap between imitation learning and inverse reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 28, 8 (2016), 1814--1826.Google ScholarGoogle ScholarCross RefCross Ref
  21. Dean A Pomerleau. 1991. Efficient training of artificial neural networks for autonomous navigation. Neural Computation 3, 1 (1991), 88--97.Google ScholarGoogle ScholarCross RefCross Ref
  22. Stéphane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS). 627--635.Google ScholarGoogle Scholar
  23. Andrei A Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. 2015. Policy distillation. arXiv preprint arXiv:1511.06295 (2015).Google ScholarGoogle Scholar
  24. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. Nature 550, 7676 (2017), 354--359.Google ScholarGoogle Scholar
  25. Vladimir Vapnik. 1998. Statistical learning theory. Hoboken. Wiley. Wang, K., Tsung, F.(2007). Run-to-run Process Adjust. using Categ. Obs. J. Qual. Technol. 39, 4 (1998), 312.Google ScholarGoogle Scholar
  26. Vladimir N Vapnik. 1999. An overview of statistical learning theory. IEEE Transactions on Neural Networks 10, 5 (1999), 988--999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML). 1995--2003.Google ScholarGoogle Scholar
  28. Da Xu, Yuting Ye, and Chuanwei Ruan. 2021. Understanding the role of importance weighting for deep learning. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar

Index Terms

  1. Importance Prioritized Policy Distillation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2022
      5033 pages
      ISBN:9781450393850
      DOI:10.1145/3534678

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 August 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader