Efficient Online Hyperparameter Adaptation for Deep Reinforcement Learning

Zhou, Yinda; Liu, Weiming; Li, Bin

doi:10.1007/978-3-030-16692-2_10

Yinda Zhou¹⁶,
Weiming Liu¹⁶ &
Bin Li¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11454))

Included in the following conference series:

International Conference on the Applications of Evolutionary Computation (Part of EvoStar)

1196 Accesses
4 Citations

Abstract

Deep Reinforcement Learning (DRL) has shown its extraordinary performance on a variety of challenging learning tasks, especially those in games. It has been recognized that DRL process is a high-dynamic and non-stationary optimization process even in the static environments, their performance is notoriously sensitive to the hyperparameter configuration which includes learning rate, discount coefficient, and step size, etc. The situation will be more serious when DRL is conducting in a changing environment. The most ideal state of hyperparameter configuration in DRL is that the hyperparameter can self-adapt to the best values promptly for their current learning state, rather than using a fixed set of hyperparameters for the whole course of training like most previous works did. In this paper, an efficient online hyperparameter adaptation method is presented, which is an improved version of Population-based Training (PBT) method on the promptness of adaptation. A recombination operation inspired by GA is introduced into the population adaptation to accelerating the convergence of the population towards the better hyperparameter configurations. Experiment results have shown that in four test environments, the presented method has achieved 92%, 70%, 2% and 15% performance improvement over PBT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/zhoudoudou/online-hyperparameter-adaptation-method.
2.
https://github.com/zhoudoudou/distributed-queue-computing-system.
3.
https://blog.openai.com/baselines-acktr-a2c.
4.
https://www.github.com/openai/baselines.
5.
Specific details can be found: https://gym.openai.com.

References

Li, Y.: Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274 (2017)
Sutton, R.S., Barto, A.G., Bach, F., et al.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Article Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)
Article Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Justesen, N., Bontrager, P., Togelius, J., Risi, S.: Deep learning for video game playing. arXiv preprint arXiv:1708.07902 (2017)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
Mirowski, P., et al.: Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673 (2016)
Yoo, S., Yun, K., Choi, J.Y.: Action-decision networks for visual tracking with deep reinforcement learning. In: CVPR, pp. 2711–2720 (2017)
Google Scholar
Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.J.: Deep reinforcement learning-based image captioning with embedding reward. arXiv preprint arXiv:1704.03899 (2017)
Zhang, J., Wang, N., Zhang, L.: Multi-shot pedestrian re-identification via sequential decision making. arXiv preprint arXiv:1712.07257 (2017)
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560 (2017)
Islam, R., Henderson, P., Gomrokchi, M., Precup, D.: Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. arXiv preprint arXiv:1708.04133 (2017)
Elfwing, S., Uchibe, E., Doya, K.: Online meta-learning by parallel algorithm competition. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 426–433. ACM (2018)
Google Scholar
Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: Proceedings of the 25th international conference on Machine learning, pp. 664–671. ACM (2008)
Google Scholar
François-Lavet, V., Fonteneau, R., Ernst, D.: How to discount deep reinforcement learning: Towards new dynamic strategies. arXiv preprint arXiv:1512.02011 (2015)
Downey, C., Sanner, S., et al.: Temporal difference bayesian model averaging: A bayesian perspective on adapting lambda. In: ICML, pp. 311–318. Citeseer (2010)
Google Scholar
Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15(4–6), 665–687 (2002)
Article Google Scholar
Mann, T.A., Penedones, H., Mannor, S., Hester, T.: Adaptive lambda least-squares temporal difference learning. arXiv preprint arXiv:1612.09465 (2016)
Jaderberg, M., et al.: Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. AAAI 16, 2094–2100 (2016)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015)
Anschel, O., Baram, N., Shimkin, N.: Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. arXiv preprint arXiv:1611.01929 (2016)
Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)
Hessel, M., et al.: Rainbow: Combining improvements in deep reinforcement learning. arXiv preprint arXiv:1710.02298 (2017)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
Google Scholar
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in Neural Information Processing Systems, pp. 5285–5294 (2017)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
MathSciNet MATH Google Scholar
Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp. 2951–2959 (2012)
Google Scholar
Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 507–523. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25566-3_40
Chapter Google Scholar
Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)
Google Scholar
Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: International Conference on Machine Learning, pp. 115–123 (2013)
Google Scholar
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Google Scholar

Download references

Acknowledgment

The work is supported by the National Natural Science Foundation of China under grand No. 61836011 and No. 61473271.

Author information

Authors and Affiliations

CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application Systems, University of Science and Technology of China, Hefei, China
Yinda Zhou, Weiming Liu & Bin Li

Authors

Yinda Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Weiming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Li .

Editor information

Editors and Affiliations

University of Mainz, Mainz, Germany
Paul Kaufmann
University of Granada, Granada, Spain
Pedro A. Castillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y., Liu, W., Li, B. (2019). Efficient Online Hyperparameter Adaptation for Deep Reinforcement Learning. In: Kaufmann, P., Castillo, P. (eds) Applications of Evolutionary Computation. EvoApplications 2019. Lecture Notes in Computer Science(), vol 11454. Springer, Cham. https://doi.org/10.1007/978-3-030-16692-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-16692-2_10
Published: 30 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16691-5
Online ISBN: 978-3-030-16692-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics