Asynchronous reinforcement learning algorithms for solving discrete space path planning problems

Zhao, Xingyu; Ding, Shifei; An, Yuexuan; Jia, Weikuan

doi:10.1007/s10489-018-1241-z

Asynchronous reinforcement learning algorithms for solving discrete space path planning problems

Published: 04 August 2018

Volume 48, pages 4889–4904, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xingyu Zhao¹,
Shifei Ding¹,
Yuexuan An¹ &
…
Weikuan Jia²

2299 Accesses
26 Citations
Explore all metrics

Abstract

Reinforcement learning has great potential in solving practical problems, but when combining it with neural networks to solve small scale discrete space problems, it may easily trap in a local minimum value. Traditional reinforcement learning utilizes continuous updating of a single agent to learn policies, which easily leads to a slow convergence speed. In order to solve the above problems, we combine asynchronous methods with existing tabular reinforcement learning algorithms, propose a parallel architecture to solve the discrete space path planning problem, and present some new variants of asynchronous reinforcement learning algorithms. We apply these algorithms on the standard reinforcement learning environment problems, and the experimental results show that these methods can solve discrete space path planning problems efficiently. One of these algorithms, Asynchronous Phased Dyna-Q, which surpasses existing asynchronous reinforcement learning algorithms, can well balance exploration and exploitation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms

An effective asynchronous framework for small scale reinforcement learning problems

Article 11 June 2019

A new asynchronous reinforcement learning algorithm based on improved parallel PSO

Article 25 May 2019

References

Sutton R, Barto A. (1998) Reinforcement Learning: An introduction. MIT press, Cambridge
Google Scholar
Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359
Article Google Scholar
Silver D, Huang A, Maddison C et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing atari with deep reinforcement learning. Proceedings of Workshops at the 26th Neural Information Processing Systems Lake Tahoe, USA, pp 201–220
Levine S, Pastor P, Krizhevsky A et al (2016) Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection. International Symposium on Experimental Robotics. Springer, pp 173–184
Zhang M, Mccarthy Z, Finn C et al (2016) Learning deep neural network policies with continuous memory states. Proceedings of the International Conference on Robotics and Automation, Stockholm, pp 520-527
Levine S, Finn C, Darrell T et al (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(39):1–040
MathSciNet MATH Google Scholar
Lenz I, Knepper R, Saxena A (2015) Deepmpc: learning deep latent features for model predictive control. Proceedings of the Robotics Science and Systems, Rome, pp 201–209
Satija H, Pineau J (2016) Simultaneous machine translation using deep reinforcement learning. Proceedings of the Workshops of International Conference on Machine Learning, New York, pp 110–119
Oh J, Guo X, LEE H et al (2016) Action-conditional video prediction using deep networks in atari games. Advances in Neural Information Processing Systems, pp 2863–2871
Guo H (2015) Generating text with deep reinforcement learning. Proceedings of the Workshops of Advances in Neural Information Processing Systems, Montreal, 1-9
Li J, Monroe W, RITTER A et al (2016) Deep reinforcement learning for dialogue generation. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Austin, pp 1192–1202
Caicedo J, Lazebnik S (2015) Active Object Localization with Deep Reinforcement Learning. IEEE international conference on computer vision. IEEE, pp 2488–2496
Oh J, Chockalingam V, SINGH S et al (2016) Control of memory, active perception, and action in Minecraft. Proceedings of the International Conference on Machine Learning, New York, pp 2790–2799
Lample G, Chaplot D (2017) Playing FPS Games with Deep Reinforcement Learning. AAAI, pp 2140–2146
Kempka M, Wydmuch M, RUNC G et al (2016) Vizdoom: A doom-based ai research platform for visual reinforcement learning. 2016 IEEE Conference Computational Intelligence and Games (CIG). IEEE, pp 1–8
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44
Google Scholar
Watkins C (1989) Learning from delayed rewards. King’s College, Cambridge
Google Scholar
Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. University of Cambridge Department of Engineering
Singh SP, Sutton RS (1996) Reinforcement learning with replacing eligibility traces. Recent Advances in Reinforcement Learning, pp 123–158
Wang Y, Tzuu-Hseng SL, Chih-Jui L (2013) Backward Q-learning: The combination of Sarsa algorithm and Q-learning. Eng Appl Artif Intell 26(9):2184–2193
Article Google Scholar
Mnih V, Badia AP, Mirza M et al (2016) Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning, pp 1928–1937
Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull 2(4):160–163
Article Google Scholar
Sutton RS, Szepesvari C, Geramifard A et al (2008) Dyna-Style Planning with linear function approximation and prioritized sweeping. Conference on Uncertainty in Artificial Intelligence
Watkins C, Dayan P (1992) Q-learning. Machine Learning, pp 279–292
Konda V, Tsitsiklis J (2000) Actor-critic algorithms. Advances in neural information pro-cessing systems, pp 1008–1014
Peng J, Williams RJ (1993) Efficient learning and planning within the dyna framework. Adapt Behav 1 (4):168–174
Google Scholar
Weiß G (2000) A multiagent variant of Dyna-Q. 2000 Proceedings International Conference on Multiagent Systems. IEEE, pp 461–462
Skoglund A, Palm R, Duckett T (2005) Towards a supervised Dyna-Q application on a robotic manipulator. Advances in Artificial Intelligence in Sweden, pp 148–153
Zhao Y, Chen QW, Wei-Li HU (2009) A phased dyna reinforcement learning algorithm. Comput Simul 26(7):154–158
Google Scholar
Wiering M, Otterlo M (2012) Reinforcement Learning: State-of-the-Art. Springer Publishing Company, Incorporated
Tokic M, Palm G (2011) Value-difference based exploration: adaptive control between epsilon-greedy and softmax. Annual Conference on Artificial Intelligence. Springer, Berlin, pp 335–346
Google Scholar
Nair A, Srinivasan P, Blackwell S et al (2015) Massively parallel methods for deep reinforcement learning. arXiv:1507.04296
Brockman G, Cheung V, Pettersson L et al (2016) OpenAI Gym. arXiv:1606.01540

Download references

Acknowledgments

This work is supported by the Fundamental Research Funds for the Central Universities(No.2017XKZD03).

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
Xingyu Zhao, Shifei Ding & Yuexuan An
School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
Weikuan Jia

Authors

Xingyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shifei Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yuexuan An
View author publications
You can also search for this author in PubMed Google Scholar
Weikuan Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shifei Ding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, X., Ding, S., An, Y. et al. Asynchronous reinforcement learning algorithms for solving discrete space path planning problems. Appl Intell 48, 4889–4904 (2018). https://doi.org/10.1007/s10489-018-1241-z

Download citation

Published: 04 August 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10489-018-1241-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Asynchronous reinforcement learning algorithms for solving discrete space path planning problems

Abstract

Access this article

Similar content being viewed by others

A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms

An effective asynchronous framework for small scale reinforcement learning problems

A new asynchronous reinforcement learning algorithm based on improved parallel PSO

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Asynchronous reinforcement learning algorithms for solving discrete space path planning problems

Abstract

Access this article

Similar content being viewed by others

A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms

An effective asynchronous framework for small scale reinforcement learning problems

A new asynchronous reinforcement learning algorithm based on improved parallel PSO

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation