Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment

Zha, ZhongYi; Wang, Bo; Tang, XueSong

doi:10.1007/s00521-020-05663-3

Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment

S.I.: New Trends of Neural Computing for Advanced Applications
Published: 07 January 2021

Volume 35, pages 12271–12282, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

486 Accesses
6 Citations
Explore all metrics

Abstract

This paper proposes an Advanced Actor-Critic algorithm, which is improved based on the conventional Actor-Critic algorithm, to train the agent to play the complex strategy game StarCraft II. A series of advanced features have been incorporated, including the distributional advantage estimation, information entropy-based uncertainty estimation, self-confidence-based exploration, and normal constraint-based update strategy. A case study including seven StarCraft II mini-games is investigated to identify the effectiveness of the proposed approach, where the famous A3C algorithm is adopted as the comparative baseline. The results verify the superiority of the improved algorithm in accuracy and training efficacy, in complex environment with high-dimensional and hybrid state and action space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving actor-critic structure by relatively optimal historical information for discrete system

Article 26 February 2022

An Advanced Actor-Critic Algorithm for Training Video Game AI

Reinforcement Learning Algorithms with Selector, Tuner, or Estimator

Article 19 September 2023

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Notes

https://github.com/deepmind/pysc2.
The adopted A3C network refers to the project at https://github.com/xhujoy/pysc2-agents.

References

Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: Proceedings of the 34th international conference on machine learning-volume 70, pp 449–458. JMLR. org
Burda Y, Edwards H, Pathak D, Storkey A, Darrell T, Efros A.A (2018) Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355
Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O, et al (2017) Noisy networks for exploration. arXiv preprint arXiv:1706.10295
Grześ M, Kudenko D (2010) Online learning of shaping rewards in reinforcement learning. Neural Netw 23(4):541–550
Article Google Scholar
Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Hou Y, Liu L, Wei Q, Xu X, Chen C (2017) A novel ddpg method with prioritized experience replay. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC), pp 316–321. IEEE
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Konda VR, Borkar VS (1999) Actor-critic-type learning algorithms for markov decision processes. SIAM J Control Optim
Leo R, Milton R, Sibi, S (2014) Reinforcement learning for optimal energy management of a solar microgrid. In: 2014 IEEE global humanitarian technology conference-south asia satellite (GHTC-SAS), pp 183–188. IEEE
Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D (2016) Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541
Mahadevan, S (1994) To discount or not to discount in reinforcement learning: a case study comparing r learning and q learning. In: Machine learning proceedings 1994, pp 164–172. Elsevier
Mnih V, Badia A P , Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu, K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp. 1928–1937
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Ontanón S, Synnaeve G, Uriarte A, Richoux F, Churchill D, Preuss M (2013) A survey of real-time strategy game ai research and competition in starcraft. IEEE Trans Comput Intell AI Games 5(4):293–311
Article Google Scholar
Pathak D, Agrawal P, Efros AA, Darrell, T (2017) Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 16–17
Prasad N, et al (2020) Methods for reinforcement learning in clinical decision support
Sallab AE, Abdou M, Perot E, Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. Electron Imaging 19:70–76
Article Google Scholar
Santoro A, Raposo D, Barrett DG, Malinowski M, Pascanu R, Battaglia P, Lillicrap T (2017) A simple neural network module for relational reasoning. In: Advances in neural information processing systems, pp 4967–4976
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Shantia A, Begue E, Wiering M (2011) Connectionist reinforcement learning for intelligent unit micro management in starcraft. In: The 2011 international joint conference on neural networks, pp 1794–1801. IEEE
Shao K, Zhu Y, Zhao D (2018) Starcraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans Emerg Topics Comput Intell 3(1):73–84
Article Google Scholar
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144
Article MathSciNet MATH Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
Article Google Scholar
Van Hasselt H, Guez, A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350–354
Article Google Scholar
Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A,m Küttler H, Agapiou J, Schrittwieser J, et al (2017) Starcraft ii: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N(2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning, pp 1995–2003
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
Article MATH Google Scholar
Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in neural information processing systems, pp 5279–5288
Xingjian S, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Zhelo O, Zhang J, Tai L, Liu M, Burgard W (2018) Curiosity-driven exploration for mapless navigation with deep reinforcement learning. arXiv preprint arXiv:1804.00456
Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3357–3364. IEEE

Download references

Acknowledgements

This work was supported by the National Nature Science Foundation of China [Grant Nos. 61803162, 61873319, and 61903146].

Author information

Authors and Affiliations

Key Laboratory of Ministry of Education for Image Processing and Intelligent Control, Artificial Intelligence and Automation School, Huazhong University of Science and Technology, Wuhan, 430074, People’s Republic of China
ZhongYi Zha & Bo Wang
College of Information and Science, Donghua University, Shanghai, People’s Republic of China
XueSong Tang

Authors

ZhongYi Zha
View author publications
You can also search for this author inPubMed Google Scholar
Bo Wang
View author publications
You can also search for this author inPubMed Google Scholar
XueSong Tang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Bo Wang.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there is no professional or other personal interest of any nature or kind in any product, service and company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment.”

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zha, Z., Wang, B. & Tang, X. Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment. Neural Comput & Applic 35, 12271–12282 (2023). https://doi.org/10.1007/s00521-020-05663-3

Download citation

Received: 28 August 2020
Accepted: 27 December 2020
Published: 07 January 2021
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00521-020-05663-3

Keywords

Part of a collection:

S.I.: New Trends of Neural Computing for Advanced Applications (vol 35, issue 17)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving actor-critic structure by relatively optimal historical information for discrete system

An Advanced Actor-Critic Algorithm for Training Video Game AI

Reinforcement Learning Algorithms with Selector, Tuner, or Estimator

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now