Skip to main content

Efficient Online Hyperparameter Adaptation for Deep Reinforcement Learning

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11454))

Abstract

Deep Reinforcement Learning (DRL) has shown its extraordinary performance on a variety of challenging learning tasks, especially those in games. It has been recognized that DRL process is a high-dynamic and non-stationary optimization process even in the static environments, their performance is notoriously sensitive to the hyperparameter configuration which includes learning rate, discount coefficient, and step size, etc. The situation will be more serious when DRL is conducting in a changing environment. The most ideal state of hyperparameter configuration in DRL is that the hyperparameter can self-adapt to the best values promptly for their current learning state, rather than using a fixed set of hyperparameters for the whole course of training like most previous works did. In this paper, an efficient online hyperparameter adaptation method is presented, which is an improved version of Population-based Training (PBT) method on the promptness of adaptation. A recombination operation inspired by GA is introduced into the population adaptation to accelerating the convergence of the population towards the better hyperparameter configurations. Experiment results have shown that in four test environments, the presented method has achieved 92%, 70%, 2% and 15% performance improvement over PBT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/zhoudoudou/online-hyperparameter-adaptation-method.

  2. 2.

    https://github.com/zhoudoudou/distributed-queue-computing-system.

  3. 3.

    https://blog.openai.com/baselines-acktr-a2c.

  4. 4.

    https://www.github.com/openai/baselines.

  5. 5.

    Specific details can be found: https://gym.openai.com.

References

  1. Li, Y.: Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274 (2017)

  2. Sutton, R.S., Barto, A.G., Bach, F., et al.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  3. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  4. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)

    Article  Google Scholar 

  5. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)

    Article  Google Scholar 

  6. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  7. Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  8. Justesen, N., Bontrager, P., Togelius, J., Risi, S.: Deep learning for video game playing. arXiv preprint arXiv:1708.07902 (2017)

  9. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)

    MathSciNet  MATH  Google Scholar 

  10. Mirowski, P., et al.: Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673 (2016)

  11. Yoo, S., Yun, K., Choi, J.Y.: Action-decision networks for visual tracking with deep reinforcement learning. In: CVPR, pp. 2711–2720 (2017)

    Google Scholar 

  12. Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.J.: Deep reinforcement learning-based image captioning with embedding reward. arXiv preprint arXiv:1704.03899 (2017)

  13. Zhang, J., Wang, N., Zhang, L.: Multi-shot pedestrian re-identification via sequential decision making. arXiv preprint arXiv:1712.07257 (2017)

  14. Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560 (2017)

  15. Islam, R., Henderson, P., Gomrokchi, M., Precup, D.: Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. arXiv preprint arXiv:1708.04133 (2017)

  16. Elfwing, S., Uchibe, E., Doya, K.: Online meta-learning by parallel algorithm competition. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 426–433. ACM (2018)

    Google Scholar 

  17. Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: Proceedings of the 25th international conference on Machine learning, pp. 664–671. ACM (2008)

    Google Scholar 

  18. François-Lavet, V., Fonteneau, R., Ernst, D.: How to discount deep reinforcement learning: Towards new dynamic strategies. arXiv preprint arXiv:1512.02011 (2015)

  19. Downey, C., Sanner, S., et al.: Temporal difference bayesian model averaging: A bayesian perspective on adapting lambda. In: ICML, pp. 311–318. Citeseer (2010)

    Google Scholar 

  20. Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15(4–6), 665–687 (2002)

    Article  Google Scholar 

  21. Mann, T.A., Penedones, H., Mannor, S., Hester, T.: Adaptive lambda least-squares temporal difference learning. arXiv preprint arXiv:1612.09465 (2016)

  22. Jaderberg, M., et al.: Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017)

  23. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

    Google Scholar 

  24. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. AAAI 16, 2094–2100 (2016)

    Google Scholar 

  25. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)

  26. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015)

  27. Anschel, O., Baram, N., Shimkin, N.: Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. arXiv preprint arXiv:1611.01929 (2016)

  28. Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)

  29. Hessel, M., et al.: Rainbow: Combining improvements in deep reinforcement learning. arXiv preprint arXiv:1710.02298 (2017)

  30. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  31. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)

    Google Scholar 

  32. Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in Neural Information Processing Systems, pp. 5285–5294 (2017)

    Google Scholar 

  33. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  34. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  35. Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp. 2951–2959 (2012)

    Google Scholar 

  36. Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 507–523. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25566-3_40

    Chapter  Google Scholar 

  37. Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)

    Google Scholar 

  38. Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: International Conference on Machine Learning, pp. 115–123 (2013)

    Google Scholar 

  39. Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)

    Google Scholar 

Download references

Acknowledgment

The work is supported by the National Natural Science Foundation of China under grand No. 61836011 and No. 61473271.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, Y., Liu, W., Li, B. (2019). Efficient Online Hyperparameter Adaptation for Deep Reinforcement Learning. In: Kaufmann, P., Castillo, P. (eds) Applications of Evolutionary Computation. EvoApplications 2019. Lecture Notes in Computer Science(), vol 11454. Springer, Cham. https://doi.org/10.1007/978-3-030-16692-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16692-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16691-5

  • Online ISBN: 978-3-030-16692-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics