Abstract
Deep Reinforcement Learning (DRL) has shown its extraordinary performance on a variety of challenging learning tasks, especially those in games. It has been recognized that DRL process is a high-dynamic and non-stationary optimization process even in the static environments, their performance is notoriously sensitive to the hyperparameter configuration which includes learning rate, discount coefficient, and step size, etc. The situation will be more serious when DRL is conducting in a changing environment. The most ideal state of hyperparameter configuration in DRL is that the hyperparameter can self-adapt to the best values promptly for their current learning state, rather than using a fixed set of hyperparameters for the whole course of training like most previous works did. In this paper, an efficient online hyperparameter adaptation method is presented, which is an improved version of Population-based Training (PBT) method on the promptness of adaptation. A recombination operation inspired by GA is introduced into the population adaptation to accelerating the convergence of the population towards the better hyperparameter configurations. Experiment results have shown that in four test environments, the presented method has achieved 92%, 70%, 2% and 15% performance improvement over PBT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Li, Y.: Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274 (2017)
Sutton, R.S., Barto, A.G., Bach, F., et al.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Justesen, N., Bontrager, P., Togelius, J., Risi, S.: Deep learning for video game playing. arXiv preprint arXiv:1708.07902 (2017)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Mirowski, P., et al.: Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673 (2016)
Yoo, S., Yun, K., Choi, J.Y.: Action-decision networks for visual tracking with deep reinforcement learning. In: CVPR, pp. 2711–2720 (2017)
Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.J.: Deep reinforcement learning-based image captioning with embedding reward. arXiv preprint arXiv:1704.03899 (2017)
Zhang, J., Wang, N., Zhang, L.: Multi-shot pedestrian re-identification via sequential decision making. arXiv preprint arXiv:1712.07257 (2017)
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560 (2017)
Islam, R., Henderson, P., Gomrokchi, M., Precup, D.: Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. arXiv preprint arXiv:1708.04133 (2017)
Elfwing, S., Uchibe, E., Doya, K.: Online meta-learning by parallel algorithm competition. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 426–433. ACM (2018)
Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: Proceedings of the 25th international conference on Machine learning, pp. 664–671. ACM (2008)
François-Lavet, V., Fonteneau, R., Ernst, D.: How to discount deep reinforcement learning: Towards new dynamic strategies. arXiv preprint arXiv:1512.02011 (2015)
Downey, C., Sanner, S., et al.: Temporal difference bayesian model averaging: A bayesian perspective on adapting lambda. In: ICML, pp. 311–318. Citeseer (2010)
Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15(4–6), 665–687 (2002)
Mann, T.A., Penedones, H., Mannor, S., Hester, T.: Adaptive lambda least-squares temporal difference learning. arXiv preprint arXiv:1612.09465 (2016)
Jaderberg, M., et al.: Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. AAAI 16, 2094–2100 (2016)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015)
Anschel, O., Baram, N., Shimkin, N.: Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. arXiv preprint arXiv:1611.01929 (2016)
Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)
Hessel, M., et al.: Rainbow: Combining improvements in deep reinforcement learning. arXiv preprint arXiv:1710.02298 (2017)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in Neural Information Processing Systems, pp. 5285–5294 (2017)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp. 2951–2959 (2012)
Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 507–523. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25566-3_40
Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)
Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: International Conference on Machine Learning, pp. 115–123 (2013)
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Acknowledgment
The work is supported by the National Natural Science Foundation of China under grand No. 61836011 and No. 61473271.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, Y., Liu, W., Li, B. (2019). Efficient Online Hyperparameter Adaptation for Deep Reinforcement Learning. In: Kaufmann, P., Castillo, P. (eds) Applications of Evolutionary Computation. EvoApplications 2019. Lecture Notes in Computer Science(), vol 11454. Springer, Cham. https://doi.org/10.1007/978-3-030-16692-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-16692-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16691-5
Online ISBN: 978-3-030-16692-2
eBook Packages: Computer ScienceComputer Science (R0)