Skip to main content
Log in

Applications of asynchronous deep reinforcement learning based on dynamic updating weights

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deep reinforcement learning based on the asynchronous method is a new kind of reinforcement learning. It takes a multithreading way to enable multiple agents to update the parameters asynchronously in different exploration spaces. In this way, agents no longer need experience to reply and can update parameters online. At the same time, the asynchronous method can greatly improve the convergence speed of the algorithms and significantly improve the convergence performance of the algorithms. Asynchronous deep reinforcement learning algorithms, especially asynchronous advantage actor-critic algorithm, are very effective in solving practical problems and have been widely used. However, in existing asynchronous deep reinforcement learning algorithms, when each thread pushes updates to the global thread, it adopts a uniform learning rate, and fails to take account of the different information transmitted by different threads at each update. When the update of the agent to global thread is more biased towards failure information, it has no obvious help to update the parameters of the learning system. Therefore, we introduce the dynamic weights to asynchronous deep reinforcement learning algorithms and propose a new reinforcement learning algorithm named asynchronous advantage actor-critic with dynamic updating weights (DWA3C). When the information pushed by an agent is obviously helpful for the improvement of the system performance, we will enhance the update range, otherwise, we will weaken that. In this way, we can significantly improve the convergence efficiencies and convergence performances of the asynchronous deep reinforcement learning algorithms. And we also test the effectiveness of the algorithm through experiments. The experimental results show that, in the same running time, the proposed algorithm can significantly improve the convergence efficiency and convergence performance compared with the existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press

  2. Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing atari with deep reinforcement learning. In: Proceedings of workshops at the 26th neural information processing systems 2013. Lake Tahoe, USA, pp 201–220

  3. Levine S, Finn C, Darrell T et al (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(39):1–40

    MathSciNet  MATH  Google Scholar 

  4. Zhang M, Mccarthy Z, Finn C, et al (2016) Learning deep neural network policies with continuous memory states. In: Proceedings of the international conference on robotics and automation. Stockholm, Sweden, pp 520–527

  5. Satija H, Pineau J (2016) Simultaneous machine translation using deep reinforcement learning. In: Proceedings of the workshops of international conference on machine learning. New York, USA, pp 110–119

  6. Oh J, Guo X, Lee H et al (2015) Action-conditional video prediction using deep networks in atari games. In: Advances in neural information processing systems, pp 2863–2871

  7. Li J, Monroe W, Ritter A et al (2016) Deep reinforcement learning for dialogue generation. In: Proceedings of the conference on empirical methods in natural language processing. Austin, USA, pp 1192–1202

  8. Sallab A, Abdou M, Perot E et al (2017) Deep reinforcement learning framework for autonomous driving. Electron Imag 19:70–76

    Article  Google Scholar 

  9. Caicedo J, Lazebnik S (2015) Active object localization with deep reinforcement learning. In: IEEE international conference on computer vision. IEEE, pp 2488–2496

  10. Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  11. Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359

    Article  Google Scholar 

  12. Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540): 529–533

    Article  Google Scholar 

  13. Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-Learning. AAAI, pp 2094–2100

  14. Wang Z, Freitas N, Lanctot M (2016) Dueling network architectures for deep reinforcement learning. In: Proceedings of the international conference on machine learning. New York, USA, pp 1995–2003

  15. Schaul T, Quan J, Antonoglou I et al (2016) Prioritized experience replay. In: Proceedings of the 4th international conference on learning representations. San Juan, Puerto Rico, pp 322–355

  16. Lillicrap TP, Hunt JJ, Pritzel A et al (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971

  17. Silver D, Lever G, Heess N et al (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st international conference on machine learning, pp 387–395

  18. Konda V, Tsitsiklis J (2000) Actor-critic algorithms3 advances in neural information processing systems, pp 1008–1014

  19. Mnih V, Badia A, Mirza M et al (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937

  20. Watkins CJCH (1989) Learning from delayed rewards. Robot Auton Syst 15(4):233–235

    Google Scholar 

  21. Ding S, Zhang N, Zhang J et al (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8(2):587–595

    Article  Google Scholar 

  22. Liao H, Ding S, Wang M et al (2016) An overview on rough neural networks. Neural Comput Appl 27(7):1805–1816

    Article  Google Scholar 

  23. Schulman J, Moritz P, Levine S et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438

  24. Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. In: COURSERA: neural networks for machine learning, p 4

  25. Kingma DP, Ba J (2014) Adam A method for stochastic optimization. arXiv:1412.6980

  26. Ferreira LA, Bianchi RAC, Santos PE et al (2017) Answer set programming for non-stationary Markov decision processes. Appl Intell 47(4):993–1007

    Article  Google Scholar 

  27. Pakizeh E, Pedram MM, Palhang M (2015) Multi-criteria expertness based cooperative method for SARSA and eligibility trace algorithms. Appl Intell 43(3):487–498

    Article  Google Scholar 

  28. Hessel M, Modayil J, Van Hasselt H et al (2017) Rainbow: combining improvements in deep reinforcement learning. arXiv:1710.02298

  29. Vien NA, Ertel W, Chung TC (2013) Learning via human feedback in continuous state and action spaces. Appl Intell 39(2):267–278

    Article  Google Scholar 

  30. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities(No.2017XKZD03).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shifei Ding.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Ding, S., An, Y. et al. Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49, 581–591 (2019). https://doi.org/10.1007/s10489-018-1296-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1296-x

Keywords

Navigation