Skip to main content
Log in

Accelerated deep reinforcement learning with efficient demonstration utilization techniques

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The use of demonstrations for deep reinforcement learning (RL) agents usually accelerates their training process as well as guides the agents to learn complicated policies. Most of the current deep RL approaches with demonstrations assume that there is a sufficient amount of high-quality demonstrations. However, for most real-world learning cases, the available demonstrations are often limited in terms of amount and quality. In this paper, we present an accelerated deep RL approach with dual replay buffer management and dynamic frame skipping on demonstrations. The dual replay buffer manager manages a human replay buffer and an actor replay buffer with independent sampling policies. We also propose dynamic frame skipping on demonstrations called DFS-ER (Dynamic Frame Skipping-Experience Replay) that learns the action repetition factor of the demonstrations. By implementing DFS-ER, we can accelerate deep RL by improving the efficiency of demonstration utilization, thereby yielding a faster exploration of the environment. We verified the training acceleration in three dense reward environments and one sparse reward environment compared to the conventional approach. In our evaluation using the Atari game environments, the proposed approach showed 21.7%-39.1% reduction in training iterations in a sparse reward environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

References

  1. Bellemare, M.G., et al.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Article  Google Scholar 

  2. Brockman, G., et al.: Openai gym. arXiv:1606.01540 (2016)

  3. Dhariwal, P., et al.: OpenAI Baselines: high-quality implementations of reinforcement learning algorithms. https://github.com/openai/baselines (2019). Accessed 25 Apr 2019

  4. Espeholt, L., et al.: Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv:1802.01561 (2018)

  5. Gao, Y., et al.: Reinforcement learning from imperfect demonstrations. arXiv:1802.05313 (2018)

  6. Garmulewicz, M., Michalewski, H., Miłoś, P.: Expert-augmented actor-critic for vizdoom and montezumas revenge. arXiv:1809.03447 (2018)

  7. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396 (2017)

  8. Hester, T., et al.: Deep q-learning from demonstrations. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

  9. Horgan, D., et al.: Distributed prioritized experience replay. arXiv:1803.00933 (2018)

  10. Kurin, V., et al.: The atari grand challenge dataset. arXiv:1705.10998 (2017)

  11. Lakshminarayanan, A.S., Sharma, S., Ravindran, B.: Dynamic frame skip deep q network. arXiv:1605.05365 (2016)

  12. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv :1509.02971 (2015)

  13. Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)

  14. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

  15. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature. 518(7540), 529 (2015)

    Article  Google Scholar 

  16. Ng, A.Y., et al.: Feature selection, L1 vs. L2 regularization, and rotational variance. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 78 (2004)

  17. OpenAI Authors: OpenAI Five. https://openai.com/blog/openai-five/ (2018). Accessed 25 Apr 2019

  18. Peng, J., et al.: Incremental Multi step Q-learning. Machine Learning Proceedings 1994, pp 226–232 (1994)

  19. Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. arXiv:1712.04621 (2017)

  20. Pohlen, T., et al.: Observe and look further: achieving consistent performance on atari. arXiv:1805.11593 (2018)

  21. Salimans, T., Chen, R.: Learning Montezuma’s revenge from a single demonstration. arXiv:1812.03381 (2018)

  22. Sallab, A.E.L., et al.: Deep reinforcement learning framework for autonomous driving. Electron. Imag. 2017(19), 70–76 (2017)

    Article  Google Scholar 

  23. Schulman, J., et al.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)

  24. Sharma, S., Lakshminarayanan, A.S., Ravindran, B.: Learning to repeat: Fine grained action repetition for deep reinforcement learning. arXiv:1702.06054 (2017)

  25. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature. 529(7587), 484 (2016)

    Article  Google Scholar 

  26. Stadie, B.C., Abbeel, P., Sutskever, I..: Third-person imitation learning. arXiv:1703.01703 (2017)

  27. Stooke, A., Abbeel, P.: Accelerated methods for deep reinforcement learning. arXiv:1803.02811 (2018)

  28. TensorFlow Authors: tensorflow/tensorflow. An Open Source Machine Learning Framework for Everyone. https://github.com/tensorflow/tensorflow (2019). Accessed 25 Apr 2019

  29. Yeo, S., Oh, S., Lee, M.: Accelerating deep reinforcement learning using human demonstration data based on replay buffer management and online frame skipping. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–8 (2019)

  30. Zhang, R., et al.: Atari-HEAD: Atari human eye-tracking and demonstration dataset. arXiv:1903.06754 (2019)

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07043858, 2018R1D1A1B07049923), the supercomputing department at KISTI (Korea Institute of Science and Technology Information) (K-19-L02-C07-S01), and Technology Innovation Program (P0006720) funded by MOTIE, Korea

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minsu Lee.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Artificial Intelligence and Big Data Computing

Guest Editors: Wookey Lee and Hiroyuki Kitagawa

Electronic supplementary material

ESM 1

(DOCX 24.9 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yeo, S., Oh, S. & Lee, M. Accelerated deep reinforcement learning with efficient demonstration utilization techniques. World Wide Web 24, 1275–1297 (2021). https://doi.org/10.1007/s11280-019-00763-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-019-00763-0

Keywords

Navigation