Periodic Intra-ensemble Knowledge Distillation for Reinforcement Learning

Hong, Zhang-Wei; Nagarajan, Prabhat; Maeda, Guilherme

doi:10.1007/978-3-030-86486-6_6

Zhang-Wei Hong¹³,
Prabhat Nagarajan¹⁴ &
Guilherme Maeda¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12975))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2776 Accesses

Abstract

Off-policy ensemble reinforcement learning (RL) methods have demonstrated impressive results across a range of RL benchmark tasks. Recent works suggest that directly imitating experts’ policies in a supervised manner before or during the course of training enables faster policy improvement for an RL agent. Motivated by these recent insights, we propose Periodic Intra-Ensemble Knowledge Distillation (PIEKD). PIEKD is a learning framework that uses an ensemble of policies to act in the environment while periodically sharing knowledge amongst policies in the ensemble through knowledge distillation. Our experiments demonstrate that PIEKD improves upon a state-of-the-art RL method in sample efficiency on several challenging MuJoCo benchmark tasks. Additionally, we perform ablation studies to better understand PIEKD.

Z.-W. Hong—Work done during an internship at Preferred Networks, Inc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Ensemble successor representations for task generalization in offline-to-online reinforcement learning

Article 25 June 2024

Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning

Article Open access 06 November 2023

Bag of Policies for Distributional Deep Exploration

Notes

1.
https://github.com/pfnet-research/piekd.

References

Abel, D., Agarwal, A., Diaz, F., Krishnamurthy, A., Schapire, R.E.: Exploratory gradient boosting for reinforcement learning in complex domains. In: International Conference on Machine Learning Workshop on Abstraction in Reinforcement Learning (2016)
Google Scholar
Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)
Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K., Clune, J.: Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Advances in Neural Information Processing Systems, pp. 5027–5038 (2018)
Google Scholar
Czarnecki, W., et al.: Mix & match agent curricula for reinforcement learning. In: International Conference on Machine Learning, pp. 1087–1095. PMLR (2018)
Google Scholar
Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)
Google Scholar
Fujita, Y., Nagarajan, P., Kataoka, T., Ishikawa, T.: ChainerRL: a deep reinforcement learning library. J. Mach. Learn. Res. 22(77), 1–14 (2021)
MATH Google Scholar
Galashov, A., et al.: Information asymmetry in KL-regularized RL. In: International Conference on Learning Representations (2019)
Google Scholar
Gangwani, T., Peng, J.: Policy optimization by genetic distillation. In: International Conference on Learning Representations (2018)
Google Scholar
Ghosh, D., Singh, A., Rajeswaran, A., Kumar, V., Levine, S.: Divide-and-conquer reinforcement learning. In: International Conference on Learning Representations (2018)
Google Scholar
Gimelfarb, M., Sanner, S., Lee, C.G.: Reinforcement learning with multiple experts: a Bayesian model combination approach. In: Advances in Neural Information Processing Systems, pp. 9528–9538 (2018)
Google Scholar
Haarnoja, T., Ha, S., Zhou, A., Tan, J., Tucker, G., Levine, S.: Learning to walk via deep reinforcement learning. In: Robotics: Science and Systems XV (2019)
Google Scholar
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Hester, T., et al.: Deep Q-learning from demonstrations. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hong, Z.W., Shann, T.Y., Su, S.Y., Chang, Y.H., Fu, T.J., Lee, C.Y.: Diversity-driven Exploration Strategy for Deep Reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 10489–10500 (2018)
Google Scholar
Jung, W., Park, G., Sung, Y.: Population-guided parallel policy search for reinforcement learning. In: International Conference on Learning Representations (2020)
Google Scholar
Khadka, S., Tumer, K.: Evolution-guided policy gradient in reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1188–1200 (2018)
Google Scholar
Lan, X., Zhu, X., Gong, S.: Knowledge distillation by on-the-fly native ensemble. In: Advances in Neural Information Processing Systems (2018)
Google Scholar
Levine, S., Koltun, V.: Guided policy search. In: International Conference on Machine Learning, pp. 1–9 (2013)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation, pp. 7559–7566. IEEE (2018)
Google Scholar
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation, pp. 6292–6299. IEEE (2018)
Google Scholar
Oh, J., Guo, Y., Singh, S., Lee, H.: Self-imitation learning. In: International Conference on Machine Learning, pp. 3878–3887. PMLR (2018)
Google Scholar
Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped DQN. In: Advances in Neural Information Processing Systems, pp. 4026–4034 (2016)
Google Scholar
Osband, I., Roy, B.V., Russo, D.J., Wen, Z.: Deep exploration via randomized value functions. J. Mach. Learn. Res. 20(124), 1–62 (2019)
MathSciNet MATH Google Scholar
Rusu, A.A., et al.: Policy distillation. In: International Conference on Learning Representations (2016)
Google Scholar
Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
Google Scholar
Teh, Y., et al.: Distral: robust multitask reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 4496–4506 (2017)
Google Scholar
Tham, C.K.: Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robot. Auton. Syst. 15(4), 247–274 (1995)
Article Google Scholar
Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation, pp. 528–535. IEEE (2016)
Google Scholar
Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4320–4328 (2018)
Google Scholar

Download references

Acknowledgements

The authors would like to thank Aaron Havens for suggestions for interesting experiments. We thank Yasuhiro Fujita for suggesting experiments and providing technical support. We thank Jean-Baptiste Mouret for useful feedback on our draft and formulation. Lastly, we thank Pieter Abbeel and Daisuke Okanohara for helpful advice on related works and the framing of the paper.

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, USA
Zhang-Wei Hong
Preferred Networks, Inc., Tokyo, Japan
Prabhat Nagarajan & Guilherme Maeda

Authors

Zhang-Wei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Prabhat Nagarajan
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme Maeda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhang-Wei Hong .

Editor information

Editors and Affiliations

ELLIS - The European Laboratory for Learning and Intelligent Systems, Alicante, Spain
Nuria Oliver
ETHZ and EPFL, Zürich, Switzerland
Fernando Pérez-Cruz
Johannes Gutenberg University of Mainz, Mainz, Germany
Stefan Kramer
École Polytechnique, Palaiseau, France
Jesse Read
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hong, ZW., Nagarajan, P., Maeda, G. (2021). Periodic Intra-ensemble Knowledge Distillation for Reinforcement Learning. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12975. Springer, Cham. https://doi.org/10.1007/978-3-030-86486-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-86486-6_6
Published: 10 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86485-9
Online ISBN: 978-3-030-86486-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Periodic Intra-ensemble Knowledge Distillation for Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Ensemble successor representations for task generalization in offline-to-online reinforcement learning

Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning

Bag of Policies for Distributional Deep Exploration

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Periodic Intra-ensemble Knowledge Distillation for Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Ensemble successor representations for task generalization in offline-to-online reinforcement learning

Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning

Bag of Policies for Distributional Deep Exploration

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation