Skip to main content

Periodic Intra-ensemble Knowledge Distillation for Reinforcement Learning

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12975))

  • 2776 Accesses

Abstract

Off-policy ensemble reinforcement learning (RL) methods have demonstrated impressive results across a range of RL benchmark tasks. Recent works suggest that directly imitating experts’ policies in a supervised manner before or during the course of training enables faster policy improvement for an RL agent. Motivated by these recent insights, we propose Periodic Intra-Ensemble Knowledge Distillation (PIEKD). PIEKD is a learning framework that uses an ensemble of policies to act in the environment while periodically sharing knowledge amongst policies in the ensemble through knowledge distillation. Our experiments demonstrate that PIEKD improves upon a state-of-the-art RL method in sample efficiency on several challenging MuJoCo benchmark tasks. Additionally, we perform ablation studies to better understand PIEKD.

Z.-W. Hong—Work done during an internship at Preferred Networks, Inc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/pfnet-research/piekd.

References

  1. Abel, D., Agarwal, A., Diaz, F., Krishnamurthy, A., Schapire, R.E.: Exploratory gradient boosting for reinforcement learning in complex domains. In: International Conference on Machine Learning Workshop on Abstraction in Reinforcement Learning (2016)

    Google Scholar 

  2. Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)

  3. Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K., Clune, J.: Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Advances in Neural Information Processing Systems, pp. 5027–5038 (2018)

    Google Scholar 

  4. Czarnecki, W., et al.: Mix & match agent curricula for reinforcement learning. In: International Conference on Machine Learning, pp. 1087–1095. PMLR (2018)

    Google Scholar 

  5. Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)

    Google Scholar 

  6. Fujita, Y., Nagarajan, P., Kataoka, T., Ishikawa, T.: ChainerRL: a deep reinforcement learning library. J. Mach. Learn. Res. 22(77), 1–14 (2021)

    MATH  Google Scholar 

  7. Galashov, A., et al.: Information asymmetry in KL-regularized RL. In: International Conference on Learning Representations (2019)

    Google Scholar 

  8. Gangwani, T., Peng, J.: Policy optimization by genetic distillation. In: International Conference on Learning Representations (2018)

    Google Scholar 

  9. Ghosh, D., Singh, A., Rajeswaran, A., Kumar, V., Levine, S.: Divide-and-conquer reinforcement learning. In: International Conference on Learning Representations (2018)

    Google Scholar 

  10. Gimelfarb, M., Sanner, S., Lee, C.G.: Reinforcement learning with multiple experts: a Bayesian model combination approach. In: Advances in Neural Information Processing Systems, pp. 9528–9538 (2018)

    Google Scholar 

  11. Haarnoja, T., Ha, S., Zhou, A., Tan, J., Tucker, G., Levine, S.: Learning to walk via deep reinforcement learning. In: Robotics: Science and Systems XV (2019)

    Google Scholar 

  12. Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)

  13. Hester, T., et al.: Deep Q-learning from demonstrations. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  14. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  15. Hong, Z.W., Shann, T.Y., Su, S.Y., Chang, Y.H., Fu, T.J., Lee, C.Y.: Diversity-driven Exploration Strategy for Deep Reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 10489–10500 (2018)

    Google Scholar 

  16. Jung, W., Park, G., Sung, Y.: Population-guided parallel policy search for reinforcement learning. In: International Conference on Learning Representations (2020)

    Google Scholar 

  17. Khadka, S., Tumer, K.: Evolution-guided policy gradient in reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1188–1200 (2018)

    Google Scholar 

  18. Lan, X., Zhu, X., Gong, S.: Knowledge distillation by on-the-fly native ensemble. In: Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  19. Levine, S., Koltun, V.: Guided policy search. In: International Conference on Machine Learning, pp. 1–9 (2013)

    Google Scholar 

  20. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  21. Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation, pp. 7559–7566. IEEE (2018)

    Google Scholar 

  22. Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation, pp. 6292–6299. IEEE (2018)

    Google Scholar 

  23. Oh, J., Guo, Y., Singh, S., Lee, H.: Self-imitation learning. In: International Conference on Machine Learning, pp. 3878–3887. PMLR (2018)

    Google Scholar 

  24. Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped DQN. In: Advances in Neural Information Processing Systems, pp. 4026–4034 (2016)

    Google Scholar 

  25. Osband, I., Roy, B.V., Russo, D.J., Wen, Z.: Deep exploration via randomized value functions. J. Mach. Learn. Res. 20(124), 1–62 (2019)

    MathSciNet  MATH  Google Scholar 

  26. Rusu, A.A., et al.: Policy distillation. In: International Conference on Learning Representations (2016)

    Google Scholar 

  27. Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)

  28. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)

    Google Scholar 

  29. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  30. Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)

    Google Scholar 

  31. Teh, Y., et al.: Distral: robust multitask reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 4496–4506 (2017)

    Google Scholar 

  32. Tham, C.K.: Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robot. Auton. Syst. 15(4), 247–274 (1995)

    Article  Google Scholar 

  33. Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation, pp. 528–535. IEEE (2016)

    Google Scholar 

  34. Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4320–4328 (2018)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank Aaron Havens for suggestions for interesting experiments. We thank Yasuhiro Fujita for suggesting experiments and providing technical support. We thank Jean-Baptiste Mouret for useful feedback on our draft and formulation. Lastly, we thank Pieter Abbeel and Daisuke Okanohara for helpful advice on related works and the framing of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhang-Wei Hong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hong, ZW., Nagarajan, P., Maeda, G. (2021). Periodic Intra-ensemble Knowledge Distillation for Reinforcement Learning. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12975. Springer, Cham. https://doi.org/10.1007/978-3-030-86486-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86486-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86485-9

  • Online ISBN: 978-3-030-86486-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics