Abstract
This paper proposes an Advanced Actor-Critic algorithm, which is improved based on the conventional Actor-Critic algorithm, to train the agent to play the complex strategy game StarCraft II. A series of advanced features have been incorporated, including the distributional advantage estimation, information entropy-based uncertainty estimation, self-confidence-based exploration, and normal constraint-based update strategy. A case study including seven StarCraft II mini-games is investigated to identify the effectiveness of the proposed approach, where the famous A3C algorithm is adopted as the comparative baseline. The results verify the superiority of the improved algorithm in accuracy and training efficacy, in complex environment with high-dimensional and hybrid state and action space.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
The adopted A3C network refers to the project at https://github.com/xhujoy/pysc2-agents.
References
Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: Proceedings of the 34th international conference on machine learning-volume 70, pp 449–458. JMLR. org
Burda Y, Edwards H, Pathak D, Storkey A, Darrell T, Efros A.A (2018) Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355
Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O, et al (2017) Noisy networks for exploration. arXiv preprint arXiv:1706.10295
Grześ M, Kudenko D (2010) Online learning of shaping rewards in reinforcement learning. Neural Netw 23(4):541–550
Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Hou Y, Liu L, Wei Q, Xu X, Chen C (2017) A novel ddpg method with prioritized experience replay. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC), pp 316–321. IEEE
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Konda VR, Borkar VS (1999) Actor-critic-type learning algorithms for markov decision processes. SIAM J Control Optim
Leo R, Milton R, Sibi, S (2014) Reinforcement learning for optimal energy management of a solar microgrid. In: 2014 IEEE global humanitarian technology conference-south asia satellite (GHTC-SAS), pp 183–188. IEEE
Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D (2016) Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541
Mahadevan, S (1994) To discount or not to discount in reinforcement learning: a case study comparing r learning and q learning. In: Machine learning proceedings 1994, pp 164–172. Elsevier
Mnih V, Badia A P , Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu, K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp. 1928–1937
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Ontanón S, Synnaeve G, Uriarte A, Richoux F, Churchill D, Preuss M (2013) A survey of real-time strategy game ai research and competition in starcraft. IEEE Trans Comput Intell AI Games 5(4):293–311
Pathak D, Agrawal P, Efros AA, Darrell, T (2017) Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 16–17
Prasad N, et al (2020) Methods for reinforcement learning in clinical decision support
Sallab AE, Abdou M, Perot E, Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. Electron Imaging 19:70–76
Santoro A, Raposo D, Barrett DG, Malinowski M, Pascanu R, Battaglia P, Lillicrap T (2017) A simple neural network module for relational reasoning. In: Advances in neural information processing systems, pp 4967–4976
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Shantia A, Begue E, Wiering M (2011) Connectionist reinforcement learning for intelligent unit micro management in starcraft. In: The 2011 international joint conference on neural networks, pp 1794–1801. IEEE
Shao K, Zhu Y, Zhao D (2018) Starcraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans Emerg Topics Comput Intell 3(1):73–84
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
Van Hasselt H, Guez, A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350–354
Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A,m Küttler H, Agapiou J, Schrittwieser J, et al (2017) Starcraft ii: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N(2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning, pp 1995–2003
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in neural information processing systems, pp 5279–5288
Xingjian S, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Zhelo O, Zhang J, Tai L, Liu M, Burgard W (2018) Curiosity-driven exploration for mapless navigation with deep reinforcement learning. arXiv preprint arXiv:1804.00456
Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3357–3364. IEEE
Acknowledgements
This work was supported by the National Nature Science Foundation of China [Grant Nos. 61803162, 61873319, and 61903146].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there is no professional or other personal interest of any nature or kind in any product, service and company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment.”
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zha, Z., Wang, B. & Tang, X. Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment. Neural Comput & Applic 35, 12271–12282 (2023). https://doi.org/10.1007/s00521-020-05663-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05663-3