Abstract
It almost reaches a consensus that off-policy algorithms dominated research benchmarks of multi-agent reinforcement learning, while recent work [34] demonstrates that on-policy MARL algorithm, Multi-Agent Proximal Policy Optimization (MAPPO), can also attain comparable performance. In this paper, we propose a training framework based on MAPPO, named async-MAPPO, which supports scalable asynchronous training. We further re-examine async-MAPPO in StarCraftII micromanagement domain and obtain state-of-the-art performances on several hard and super-hard maps. Finally, we analyze three experimental phenomena and provide hypotheses behind the performance improvement of async-MAPPO.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Referred to as ppo_epoch in [34].
References
Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 (2019)
Bard, N., et al.: The Hanabi challenge: a new frontier for AI research. Artif. Intell. 280, 103216 (2020)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Espeholt, L., Marinier, R., Stanczyk, P., Wang, K., Michalski, M.: Seed rl: Scalable and efficient deep-rl with accelerated central inference. arXiv preprint arXiv:1910.06591 (2019)
Espeholt, L., et al.: Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In: International Conference on Machine Learning, pp. 1407–1416. PMLR (2018)
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Foerster, J.N., Assael, Y.M., De Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. arXiv preprint arXiv:1605.06676 (2016)
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4_5
Harris, C.R., et al.: Array programming with numPy. Nature 585(7825), 357–362 (2020)
Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H.: Multi-task deep reinforcement learning with popart. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3796–3803 (2019)
Horgan, D., et al.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)
Hu, H., Foerster, J.N.: Simplified action decoder for deep multi-agent reinforcement learning. arXiv preprint arXiv:1912.02288 (2019)
Kapturowski, S., Ostrovski, G., Quan, J., Munos, R., Dabney, W.: Recurrent experience replay in distributed reinforcement learning. In: International Conference on Learning Representations (2018)
Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J., Graepel, T.: Multi-agent reinforcement learning in sequential social dilemmas. arXiv preprint arXiv:1702.03037 (2017)
Li, S., et al.: Pytorch distributed: Experiences on accelerating data parallel training. arXiv preprint arXiv:2006.15704 (2020)
Li, Y., Schuurmans, D.: MapReduce for parallel reinforcement learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 309–320. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_30
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Neural Information Processing Systems (NIPS) (2017)
Mordatch, I., Abbeel, P.: Emergence of grounded compositional language in multi-agent populations. arXiv preprint arXiv:1703.04908 (2017)
Oliehoek, F.A., Amato, C.: A Concise Introduction to Decentralized POMDPs. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28929-8
OpenAI: Openai five. https://blog.openai.com/openai-five/ (2018)
Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)
Petrenko, A., Huang, Z., Kumar, T., Sukhatme, G., Koltun, V.: Sample factory: Egocentric 3d control from pixels at 100000 fps with asynchronous reinforcement learning. In: International Conference on Machine Learning, pp. 7652–7662. PMLR (2020)
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4295–4304. PMLR (2018)
Samvelyan, M., et al.: The StarCraft Multi-Agent Challenge. CoRR abs/1902.04043 (2019)
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)
Shalev-Shwartz, S., Shammah, S., Shashua, A.: Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295 (2016)
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR (2019)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An introduction. MIT Press, Cambridge (2018)
Vinyals, O., et al.: Grandmaster level in StarCraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Wang, J., Ren, Z., Liu, T., Yu, Y., Zhang, C.: Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020)
Wang, T., Dong, H., Lesser, V., Zhang, C.: Roma: multi-agent reinforcement learning with emergent roles. In: Proceedings of the 37th International Conference on Machine Learning (2020)
Wang, T., Gupta, T., Mahajan, A., Peng, B., Whiteson, S., Zhang, C.: Rode: Learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523 (2020)
Ye, D., et al.: Towards playing full moba games with deep reinforcement learning. arXiv preprint arXiv:2011.12692 (2020)
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of mappo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Fu, W., Yu, C., Li, Y., Wu, Y. (2021). Unlocking the Potential of MAPPO with Asynchronous Optimization. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13070. Springer, Cham. https://doi.org/10.1007/978-3-030-93049-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-93049-3_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93048-6
Online ISBN: 978-3-030-93049-3
eBook Packages: Computer ScienceComputer Science (R0)