Abstract
The use of demonstrations for deep reinforcement learning (RL) agents usually accelerates their training process as well as guides the agents to learn complicated policies. Most of the current deep RL approaches with demonstrations assume that there is a sufficient amount of high-quality demonstrations. However, for most real-world learning cases, the available demonstrations are often limited in terms of amount and quality. In this paper, we present an accelerated deep RL approach with dual replay buffer management and dynamic frame skipping on demonstrations. The dual replay buffer manager manages a human replay buffer and an actor replay buffer with independent sampling policies. We also propose dynamic frame skipping on demonstrations called DFS-ER (Dynamic Frame Skipping-Experience Replay) that learns the action repetition factor of the demonstrations. By implementing DFS-ER, we can accelerate deep RL by improving the efficiency of demonstration utilization, thereby yielding a faster exploration of the environment. We verified the training acceleration in three dense reward environments and one sparse reward environment compared to the conventional approach. In our evaluation using the Atari game environments, the proposed approach showed 21.7%-39.1% reduction in training iterations in a sparse reward environment.
Similar content being viewed by others
References
Bellemare, M.G., et al.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Brockman, G., et al.: Openai gym. arXiv:1606.01540 (2016)
Dhariwal, P., et al.: OpenAI Baselines: high-quality implementations of reinforcement learning algorithms. https://github.com/openai/baselines (2019). Accessed 25 Apr 2019
Espeholt, L., et al.: Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv:1802.01561 (2018)
Gao, Y., et al.: Reinforcement learning from imperfect demonstrations. arXiv:1802.05313 (2018)
Garmulewicz, M., Michalewski, H., Miłoś, P.: Expert-augmented actor-critic for vizdoom and montezumas revenge. arXiv:1809.03447 (2018)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396 (2017)
Hester, T., et al.: Deep q-learning from demonstrations. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Horgan, D., et al.: Distributed prioritized experience replay. arXiv:1803.00933 (2018)
Kurin, V., et al.: The atari grand challenge dataset. arXiv:1705.10998 (2017)
Lakshminarayanan, A.S., Sharma, S., Ravindran, B.: Dynamic frame skip deep q network. arXiv:1605.05365 (2016)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv :1509.02971 (2015)
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature. 518(7540), 529 (2015)
Ng, A.Y., et al.: Feature selection, L1 vs. L2 regularization, and rotational variance. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 78 (2004)
OpenAI Authors: OpenAI Five. https://openai.com/blog/openai-five/ (2018). Accessed 25 Apr 2019
Peng, J., et al.: Incremental Multi step Q-learning. Machine Learning Proceedings 1994, pp 226–232 (1994)
Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. arXiv:1712.04621 (2017)
Pohlen, T., et al.: Observe and look further: achieving consistent performance on atari. arXiv:1805.11593 (2018)
Salimans, T., Chen, R.: Learning Montezuma’s revenge from a single demonstration. arXiv:1812.03381 (2018)
Sallab, A.E.L., et al.: Deep reinforcement learning framework for autonomous driving. Electron. Imag. 2017(19), 70–76 (2017)
Schulman, J., et al.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
Sharma, S., Lakshminarayanan, A.S., Ravindran, B.: Learning to repeat: Fine grained action repetition for deep reinforcement learning. arXiv:1702.06054 (2017)
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature. 529(7587), 484 (2016)
Stadie, B.C., Abbeel, P., Sutskever, I..: Third-person imitation learning. arXiv:1703.01703 (2017)
Stooke, A., Abbeel, P.: Accelerated methods for deep reinforcement learning. arXiv:1803.02811 (2018)
TensorFlow Authors: tensorflow/tensorflow. An Open Source Machine Learning Framework for Everyone. https://github.com/tensorflow/tensorflow (2019). Accessed 25 Apr 2019
Yeo, S., Oh, S., Lee, M.: Accelerating deep reinforcement learning using human demonstration data based on replay buffer management and online frame skipping. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–8 (2019)
Zhang, R., et al.: Atari-HEAD: Atari human eye-tracking and demonstration dataset. arXiv:1903.06754 (2019)
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07043858, 2018R1D1A1B07049923), the supercomputing department at KISTI (Korea Institute of Science and Technology Information) (K-19-L02-C07-S01), and Technology Innovation Program (P0006720) funded by MOTIE, Korea
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Artificial Intelligence and Big Data Computing
Guest Editors: Wookey Lee and Hiroyuki Kitagawa
Electronic supplementary material
ESM 1
(DOCX 24.9 kb)
Rights and permissions
About this article
Cite this article
Yeo, S., Oh, S. & Lee, M. Accelerated deep reinforcement learning with efficient demonstration utilization techniques. World Wide Web 24, 1275–1297 (2021). https://doi.org/10.1007/s11280-019-00763-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-019-00763-0