A new asynchronous reinforcement learning algorithm based on improved parallel PSO

Ding, Shifei; Du, Wei; Zhao, Xingyu; Wang, Lijuan; Jia, Weikuan

doi:10.1007/s10489-019-01487-4

A new asynchronous reinforcement learning algorithm based on improved parallel PSO

Published: 25 May 2019

Volume 49, pages 4211–4222, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shifei Ding¹,
Wei Du¹,
Xingyu Zhao¹,
Lijuan Wang^1,2 &
…
Weikuan Jia³

1922 Accesses
34 Citations
Explore all metrics

Abstract

As an important machine learning method, reinforcement learning plays a more and more important role in practical application. In recent years, many scholars have studied parallel reinforcement learning algorithm, and achieved remarkable results in many applications. However, when using existing parallel reinforcement learning to solve problems, due to the limited search scope of agents, it often fails to reduce the running episodes of algorithms. At the same time, the traditional model-free reinforcement learning algorithm does not necessarily converge to the optimal solution, which may lead to some waste of resources in practical applications. In view of these problems, we apply Particle swarm optimization (PSO) algorithm to asynchronous reinforcement learning algorithm to search for the optimal solution. First, we propose a new asynchronous variant of PSO algorithm. Then we apply it into asynchronous reinforcement learning algorithm, and proposed a new asynchronous reinforcement learning algorithm named Sarsa algorithm based on backward Q-learning and asynchronous particle swarm optimization (APSO-BQSA). Finally, we verify the effectiveness of the asynchronous PSO and APSO-BQSA algorithm proposed in this paper through experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Asynchronous reinforcement learning algorithms for solving discrete space path planning problems

Article 04 August 2018

An effective asynchronous framework for small scale reinforcement learning problems

Article 11 June 2019

Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View

Article 30 September 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Sutton R, Barto A (1998) Reinforcement learning: An introduction. MIT press, Cambridge
MATH Google Scholar
Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing atari with deep rein-forcement learning. Proceedings of Workshops at the 26th Neural Information Pro-cessing. Systems Lake Tahoe, USA, pp 201–220
Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Silver D, Huang A, Maddison M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Mnih V, Badia A, Mirza M et al (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
Google Scholar
Zhao X, Ding S, An Y, Jia W (2018) Asynchronous reinforcement learning algorithms for solving discrete space path planning problems. Appl Intell 48(12):4889–4904
Article Google Scholar
Zhao X, Ding S, An Y (2018) A new asynchronous architecture for tabular re-inforcement learning algorithms. In: Proceedings of the eighth international conference on extreme learning machines, pp 172–180
Google Scholar
Wang Y, Li T, Lin C (2013) Backward Q-learning: the combination of Sarsa algorithm and Q-learning. Eng Appl Artif Intell 26(9):2184–2193
Article Google Scholar
Kennedy J, Eberhart R (1995) Particle swarm optimization. ICNN 1942–1948
Watkins C (1989) Learning from delayed rewards. Robot Auton Syst 15(4):233–235
Google Scholar
Rummery G, Niranjan M (1994) On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering
Sutton R (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44
Google Scholar
Singh S, Sutton R (1996) Reinforcement learning with replacing eligibility traces. Mach Learn 22(1-3):123-158
Article Google Scholar
Silver D, Lever G, Heess N et al (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st international conference on machine learning, pp 387–395
Google Scholar
Schulman J, Levine S, Abbeel P et al (2015) Trust region policy optimization. In: Proceedings of the 32nd international conference on machine learning, pp 1889–1897
Google Scholar
Shi Y, Eberhart R (1998) A modified particle swarm optimizer. IEEE CEC 69–73
Liang J, Suganthan PN (2006) Dynamic multi-swarm particle swarm optimizer with a novel constraint-handling mechanism. 2006 IEEE congress on. Evol Comput:9–16
Tokic M, Palm G (2011) Value-difference based exploration: adaptive control between epsilon-greedy and softmax. In: Annual conference on artificial intelligence. Springer, Berlin, Heidelberg, pp 335–346
Google Scholar
Greg Brockman, Vicki Cheung, Ludwig Pettersson, et al. OpenAI Gym. arXiv preprint arXiv: 1606.01540 (2016)

Download references

Acknowledgments

This work is supported by the Fundamental Research Funds for the Central Universities (No.2017XKZD03).

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
Shifei Ding, Wei Du, Xingyu Zhao & Lijuan Wang
School of Information and Electrical Engineering, Xuzhou College of Industrial Technology, Xuzhou, 221000, China
Lijuan Wang
School of Information Science and Engineering, Shandong Normal University, Jinan, 265000, China
Weikuan Jia

Authors

Shifei Ding
View author publications
You can also search for this author inPubMed Google Scholar
Wei Du
View author publications
You can also search for this author inPubMed Google Scholar
Xingyu Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Lijuan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Weikuan Jia
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shifei Ding.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, S., Du, W., Zhao, X. et al. A new asynchronous reinforcement learning algorithm based on improved parallel PSO. Appl Intell 49, 4211–4222 (2019). https://doi.org/10.1007/s10489-019-01487-4

Download citation

Published: 25 May 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10489-019-01487-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new asynchronous reinforcement learning algorithm based on improved parallel PSO

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Asynchronous reinforcement learning algorithms for solving discrete space path planning problems

An effective asynchronous framework for small scale reinforcement learning problems

Robustness Assessment of Asynchronous Advantage Actor-Critic Based on Dynamic Skewness and Sparseness Computation: A Parallel Computing View

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now