Abstract
Action-value has been widely used in multi-agent reinforcement learning. However, action-value is hard to be adapted to scenarios such as real-time strategy games where the number of agents can vary from time to time. In this paper, we explore approaches of avoiding the action-value in systems in order to make multi-agent architectures more scalable. We present a general architecture for real-time strategy games and design the global reward function which can fit into it. In addition, in our architecture, we also propose the algorithm without human knowledge which can work for Semi Markov Decision Processes where rewards cannot be received until actions last for a while. To evaluate the performance of our approach, experiments with respect to micromanagement are carried out on a simplified real-time strategy game called MicroRTS. The result shows that the trained artificial intelligence is highly competitive against strong baseline robots.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Berner, C., et al.: Dota 2 with large scale deep reinforcement learning (2019)
Mnih, V., et al.: Playing atari with deep reinforcement learning (2013)
Ontanón, S., Buro, M.: Adversarial hierarchical-task network planning for complex real-time games. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Churchill, D., Buro, M.: Portfolio greedy search and simulation for large-scale combat in StarCraft. In: 2013 IEEE Conference on Computational Intelligence in Games (CIG). IEEE (2013)
Churchill, D., Saffidine, A., Buro, M.: Fast heuristic search for RTS game combat scenarios. In: Eighth Artificial Intelligence and Interactive Digital Entertainment Conference (2012)
Marino, J.R.H., et al.: Evolving action abstractions for real-time planning in extensive-form games. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33 (2019)
Barriga, N.A., Stanescu, M., Buro, M.: Combining strategic learning with tactical search in real-time strategy games. In: Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference (2017)
Ontanón, S.: The combinatorial multi-armed bandit problem and its application to real-time strategy games. In: Ninth Artificial Intelligence and Interactive Digital Entertainment Conference (2013)
Yang, Y., et al.: Mean field multi-agent reinforcement learning (2018)
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Silva, C.R., et al.: Strategy generation for multi-unit real-time games via voting. IEEE Trans. Games 11, 426–435 (2018)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (2016)
Foerster, J.N., et al.: Counterfactual multi-agent policy gradients. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Lowe, R., et al.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems (2017)
Peng, P., et al.: Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games (2017)
Shapley, L.S.: Stochastic games. Proc. Nat. Acad. Sci. 39(10), 1095–1100 (1953)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3-4), 229–256 (1992)
Oliehoek, F.A., Spaan, M.T.J., Vlassis, N.: Optimal and approximate Q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 32, 289–353 (2008)
Kraemer, L., Banerjee, B.: Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190, 82–94 (2016)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1-2), 181–211 (1999)
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Ontañón, S., et al.: The first microrts artificial intelligence competition. AI Mag. 39(1), 75–83 (2018)
Acknowledgement
This work is supported by the national key research and development program of China under grant No. 2019YFB2102200, National Natural Science Foundation of China under Grant No. 61672154, No. 61972086, and the Fund of ZTE Corporation under Grant No. HC-CN-20190826009.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, Z., Wu, W., Huang, Z. (2020). Scalable Multi-agent Reinforcement Learning Architecture for Semi-MDP Real-Time Strategy Games. In: Zhang, H., Zhang, Z., Wu, Z., Hao, T. (eds) Neural Computing for Advanced Applications. NCAA 2020. Communications in Computer and Information Science, vol 1265. Springer, Singapore. https://doi.org/10.1007/978-981-15-7670-6_36
Download citation
DOI: https://doi.org/10.1007/978-981-15-7670-6_36
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7669-0
Online ISBN: 978-981-15-7670-6
eBook Packages: Computer ScienceComputer Science (R0)