Skip to main content

Advertisement

Log in

Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment

  • S.I.: New Trends of Neural Computing for Advanced Applications
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper proposes an Advanced Actor-Critic algorithm, which is improved based on the conventional Actor-Critic algorithm, to train the agent to play the complex strategy game StarCraft II. A series of advanced features have been incorporated, including the distributional advantage estimation, information entropy-based uncertainty estimation, self-confidence-based exploration, and normal constraint-based update strategy. A case study including seven StarCraft II mini-games is investigated to identify the effectiveness of the proposed approach, where the famous A3C algorithm is adopted as the comparative baseline. The results verify the superiority of the improved algorithm in accuracy and training efficacy, in complex environment with high-dimensional and hybrid state and action space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

  1. https://github.com/deepmind/pysc2.

  2. The adopted A3C network refers to the project at https://github.com/xhujoy/pysc2-agents.

References

  1. Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: Proceedings of the 34th international conference on machine learning-volume 70, pp 449–458. JMLR. org

  2. Burda Y, Edwards H, Pathak D, Storkey A, Darrell T, Efros A.A (2018) Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355

  3. Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O, et al (2017) Noisy networks for exploration. arXiv preprint arXiv:1706.10295

  4. Grześ M, Kudenko D (2010) Online learning of shaping rewards in reinforcement learning. Neural Netw 23(4):541–550

    Article  Google Scholar 

  5. Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence

  6. Hou Y, Liu L, Wei Q, Xu X, Chen C (2017) A novel ddpg method with prioritized experience replay. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC), pp 316–321. IEEE

  7. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  8. Konda VR, Borkar VS (1999) Actor-critic-type learning algorithms for markov decision processes. SIAM J Control Optim

  9. Leo R, Milton R, Sibi, S (2014) Reinforcement learning for optimal energy management of a solar microgrid. In: 2014 IEEE global humanitarian technology conference-south asia satellite (GHTC-SAS), pp 183–188. IEEE

  10. Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D (2016) Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541

  11. Mahadevan, S (1994) To discount or not to discount in reinforcement learning: a case study comparing r learning and q learning. In: Machine learning proceedings 1994, pp 164–172. Elsevier

  12. Mnih V, Badia A P , Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu, K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp. 1928–1937

  13. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  14. Ontanón S, Synnaeve G, Uriarte A, Richoux F, Churchill D, Preuss M (2013) A survey of real-time strategy game ai research and competition in starcraft. IEEE Trans Comput Intell AI Games 5(4):293–311

    Article  Google Scholar 

  15. Pathak D, Agrawal P, Efros AA, Darrell, T (2017) Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 16–17

  16. Prasad N, et al (2020) Methods for reinforcement learning in clinical decision support

  17. Sallab AE, Abdou M, Perot E, Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. Electron Imaging 19:70–76

    Article  Google Scholar 

  18. Santoro A, Raposo D, Barrett DG, Malinowski M, Pascanu R, Battaglia P, Lillicrap T (2017) A simple neural network module for relational reasoning. In: Advances in neural information processing systems, pp 4967–4976

  19. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

  20. Shantia A, Begue E, Wiering M (2011) Connectionist reinforcement learning for intelligent unit micro management in starcraft. In: The 2011 international joint conference on neural networks, pp 1794–1801. IEEE

  21. Shao K, Zhu Y, Zhao D (2018) Starcraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans Emerg Topics Comput Intell 3(1):73–84

    Article  Google Scholar 

  22. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144

    Article  MathSciNet  MATH  Google Scholar 

  23. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359

    Article  Google Scholar 

  24. Van Hasselt H, Guez, A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence

  25. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350–354

    Article  Google Scholar 

  26. Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A,m Küttler H, Agapiou J, Schrittwieser J, et al (2017) Starcraft ii: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782

  27. Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N(2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning, pp 1995–2003

  28. Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292

    Article  MATH  Google Scholar 

  29. Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in neural information processing systems, pp 5279–5288

  30. Xingjian S, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810

  31. Zhelo O, Zhang J, Tai L, Liu M, Burgard W (2018) Curiosity-driven exploration for mapless navigation with deep reinforcement learning. arXiv preprint arXiv:1804.00456

  32. Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, Fei-Fei L, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3357–3364. IEEE

Download references

Acknowledgements

This work was supported by the National Nature Science Foundation of China [Grant Nos. 61803162, 61873319, and 61903146].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Wang.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there is no professional or other personal interest of any nature or kind in any product, service and company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment.”

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zha, Z., Wang, B. & Tang, X. Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment. Neural Comput & Applic 35, 12271–12282 (2023). https://doi.org/10.1007/s00521-020-05663-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05663-3

Keywords