ABSTRACT
Recent advancements in reinforcement learning have witnessed remarkable achievements by intelligent agents ranging from game-playing to industrial applications. Of particular interest is the area of multi-agent reinforcement learning (MARL), which holds significant potential for real-world scenarios. However, typical MARL methods are limited in their ability to handle tens of agents, leaving scenarios with up to hundreds or even thousands of agents almost unexplored. The scaling up of the number of agents presents two primary challenges: (1) agent-agent interactions are crucial in multi-agent systems while the number of interactions grows quadratically with the number of agents, resulting in substantial computational complexity and difficulty in strategies-learning; (2) the strengths of interactions among agents exhibit variations both across agents and over time, making it difficult to precisely model such interactions. In this paper, we propose a novel approach named Graph Attention Mean Field (GAT-MF). By converting agent-agent interactions into interactions between each agent and a weighted mean field, we achieve a substantial reduction in computational complexity. The proposed method offers a precise modeling of interaction dynamics with mathematical proofs of its correctness. Additionally, we design a graph attention mechanism to automatically capture the diverse and time-varying strengths of interactions, ensuring an accurate representation of agent interactions. Through extensive experimentation conducted in both manual and real-world scenarios involving over 3000 agents, we validate the efficacy of our method. The results demonstrate that our method outperforms the best baseline method with a remarkable improvement of 42.7%. Furthermore, our method saves 86.4% training time and 19.2% GPU memory compared to the best baseline method. For reproducibility, our source codes and data are available at https://github.com/tsinghua-fib-lab/Large-Scale-MARL-GATMF.
Supplemental Material
- Husamelddin AM Balla, Chen Guang Sheng, and Weipeng Jing. 2021. Reliability-aware: task scheduling in cloud computing using multi-agent reinforcement learning algorithm and neural fitted Q. Int. Arab J. Inf. Technol., Vol. 18, 1 (2021), 36--47.Google Scholar
- Hamsa Bastani, Kimon Drakopoulos, Vishal Gupta, Ioannis Vlachogiannis, Christos Hadjicristodoulou, Pagona Lagiou, Gkikas Magiorkinis, Dimitrios Paraskevis, and Sotirios Tsiodras. 2021. Efficient and targeted COVID-19 border testing via reinforcement learning. Nature, Vol. 599, 7883 (2021), 108--113.Google Scholar
- Wendelin Böhmer, Vitaly Kurin, and Shimon Whiteson. 2020. Deep coordination graphs. In International Conference on Machine Learning. PMLR, 980--991.Google Scholar
- Lukas Brunke, Melissa Greeff, Adam W Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P Schoellig. 2022. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, Vol. 5 (2022), 411--444.Google ScholarCross Ref
- Serina Chang, Emma Pierson, Pang Wei Koh, Jaline Gerardin, Beth Redbird, David Grusky, and Jure Leskovec. 2021a. Mobility network models of COVID-19 explain inequities and inform reopening. Nature, Vol. 589, 7840 (2021), 82--87.Google Scholar
- Serina Chang, Mandy L Wilson, Bryan Lewis, Zakaria Mehrab, Komal K Dudakiya, Emma Pierson, Pang Wei Koh, Jaline Gerardin, Beth Redbird, David Grusky, et al. 2021b. Supporting covid-19 policy response with large-scale mobility-based modeling. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2632--2642.Google ScholarDigital Library
- Dezhi Chen, Qi Qi, Zirui Zhuang, Jingyu Wang, Jianxin Liao, and Zhu Han. 2020. Mean field deep reinforcement learning for fair and efficient UAV control. IEEE Internet of Things Journal, Vol. 8, 2 (2020), 813--828.Google ScholarCross Ref
- Lin Chen, Fengli Xu, Zhenyu Han, Kun Tang, Pan Hui, James Evans, and Yong Li. 2022b. Strategic COVID-19 vaccine distribution can simultaneously elevate social utility and equity. Nature Human Behaviour (2022), 1--12.Google Scholar
- Yiqun Chen, Hangyu Mao, Tianle Zhang, Shiguang Wu, Bin Zhang, Jianye Hao, Dong Li, Bin Wang, and Hongxing Chang. 2022a. PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2210.08872 (2022).Google Scholar
- Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip HS Torr, Mingfei Sun, and Shimon Whiteson. 2020. Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533 (2020).Google Scholar
- Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de Las Casas, et al. 2022. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, Vol. 602, 7897 (2022), 414--419.Google Scholar
- Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2018. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarCross Ref
- Carlos Guestrin, Michail G. Lagoudakis, and Ronald Parr. 2002. Coordinated Reinforcement Learning. In Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), University of New South Wales, Sydney, Australia, July 8-12, 2002, Claude Sammut and Achim G. Hoffmann (Eds.). Morgan Kaufmann, 227--234.Google Scholar
- Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).Google Scholar
- Qianyue Hao, Wenzhen Huang, Fengli Xu, Kun Tang, and Yong Li. 2022. Reinforcement Learning Enhances the Experts: Large-scale COVID-19 Vaccine Allocation with Multi-factor Contact Network. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4684--4694.Google ScholarDigital Library
- Qianyue Hao, Fengli Xu, Lin Chen, Pan Hui, and Yong Li. 2021. Hierarchical Reinforcement Learning for Scarce Medical Resource Allocation with Imperfect Information. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2955--2963.Google ScholarDigital Library
- Jiechuan Jiang, Chen Dun, Tiejun Huang, and Zongqing Lu. 2018. Graph convolutional reinforcement learning. arXiv preprint arXiv:1810.09202 (2018).Google Scholar
- Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang. 2021. Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251 (2021).Google Scholar
- Jingbo Li, Xingjun Zhang, Jia Wei, Zeyu Ji, and Zheng Wei. 2022. GARLSched: Generative adversarial deep reinforcement learning task scheduling optimization for large-scale high performance computing systems. Future Generation Computer Systems (2022).Google Scholar
- Minne Li, Zhiwei Qin, Yan Jiao, Yaodong Yang, Jun Wang, Chenxi Wang, Guobin Wu, and Jieping Ye. 2019. Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In The world wide web conference. 983--994.Google Scholar
- Yong Liu, Weixun Wang, Yujing Hu, Jianye Hao, Xingguo Chen, and Yang Gao. 2020. Multi-Agent Game Abstraction via Graph Attention Neural Network. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 7211--7218. https://ojs.aaai.org/index.php/AAAI/article/view/6211Google ScholarCross Ref
- Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, Vol. 30 (2017).Google ScholarDigital Library
- Yi Ma, Xiaotian Hao, Jianye Hao, Jiawen Lu, Xing Liu, Tong Xialiang, Mingxuan Yuan, Zhigang Li, Jie Tang, and Zhaopeng Meng. 2021. A hierarchical reinforcement learning based optimization framework for large-scale dynamic pickup and delivery problems. Advances in Neural Information Processing Systems, Vol. 34 (2021), 23609--23620.Google Scholar
- Hangyu Mao, Wulong Liu, Jianye Hao, Jun Luo, Dong Li, Zhengchao Zhang, Jun Wang, and Zhen Xiao. 2020a. Neighborhood cognition consistent multi-agent reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 7219--7226.Google ScholarCross Ref
- Hangyu Mao, Zhengchao Zhang, Zhen Xiao, and Zhibo Gong. 2018. Modelling the dynamic joint policy of teammates with attention multi-agent DDPG. arXiv preprint arXiv:1811.07029 (2018).Google Scholar
- Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, and Yan Ni. 2020b. Learning agent communication under limited bandwidth by message pruning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5142--5149.Google ScholarCross Ref
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).Google Scholar
- Ranjit Nair, Pradeep Varakantham, Milind Tambe, and Makoto Yokoo. 2005. Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs. In Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, July 9-13, 2005, Pittsburgh, Pennsylvania, USA, Manuela M. Veloso and Subbarao Kambhampati (Eds.). AAAI Press / The MIT Press, 133--139. http://www.aaai.org/Library/AAAI/2005/aaai05-022.phpGoogle Scholar
- Yaru Niu, Rohan R Paleja, and Matthew C Gombolay. 2021a. Multi-Agent Graph-Attention Communication and Teaming.. In AAMAS. 964--973.Google Scholar
- Yaru Niu, Rohan R. Paleja, and Matthew C. Gombolay. 2021b. Multi-Agent Graph-Attention Communication and Teaming. In AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, Virtual Event, United Kingdom, May 3-7, 2021, Frank Dignum, Alessio Lomuscio, Ulle Endriss, and Ann Nowé (Eds.). ACM, 964--973. https://doi.org/10.5555/3463952.3464065Google ScholarDigital Library
- Frans A. Oliehoek, Stefan J. Witwicki, and Leslie Pack Kaelbling. 2021. A Sufficient Statistic for Influence in Structured Multiagent Environments. J. Artif. Intell. Res., Vol. 70 (2021), 789--870. https://doi.org/10.1613/jair.1.12136Google ScholarDigital Library
- Dawei Qiu, Yujian Ye, Dimitrios Papadaskalopoulos, and Goran Strbac. 2021. Scalable coordinated management of peer-to-peer energy trading: A multi-cluster deep reinforcement learning approach. Applied Energy, Vol. 292 (2021), 116940.Google ScholarCross Ref
- Shuhui Qu, Jie Wang, and Juergen Jasperneite. 2019. Dynamic scheduling in modern processing systems using expert-guided distributed reinforcement learning. In 2019 24th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 459--466.Google ScholarDigital Library
- Tabish Rashid, Mikayel Samvelyan, Christian Schroeder, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2018 Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning. PMLR, 4295--4304.Google Scholar
- Tao Ren, Jianwei Niu, Bin Dai, Xuefeng Liu, Zheyuan Hu, Mingliang Xu, and Mohsen Guizani. 2021. Enabling Efficient Scheduling in Large-Scale UAV-Assisted Mobile-Edge Computing via Hierarchical Reinforcement Learning. IEEE Internet of Things Journal, Vol. 9, 10 (2021), 7095--7109.Google ScholarCross Ref
- Jingqing Ruan, Yali Du, Xuantang Xiong, Dengpeng Xing, Xiyun Li, Linghui Meng, Haifeng Zhang, Jun Wang, and Bo Xu. 2022. GCS: Graph-Based Coordination Strategy for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2201.06257 (2022).Google Scholar
- Heechang Ryu, Hayong Shin, and Jinkyoo Park. 2020a. Multi-agent actor-critic with hierarchical graph attention network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7236--7243.Google ScholarCross Ref
- Heechang Ryu, Hayong Shin, and Jinkyoo Park. 2020b. Multi-Agent Actor-Critic with Hierarchical Graph Attention Network. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 7236--7243. https://ojs.aaai.org/index.php/AAAI/article/view/6214Google ScholarCross Ref
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google Scholar
- Dian Shi, Hao Gao, Li Wang, Miao Pan, Zhu Han, and H Vincent Poor. 2020. Mean field game guided deep reinforcement learning for task placement in cooperative multiaccess edge computing. IEEE Internet of Things Journal, Vol. 7, 10 (2020), 9330--9340.Google ScholarCross Ref
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In International conference on machine learning. PMLR, 387--395.Google Scholar
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. Nature, Vol. 550, 7676 (2017), 354--359.Google Scholar
- Samarth Sinha, Ajay Mandlekar, and Animesh Garg. 2022. S4RL: Surprisingly simple self-supervision for offline reinforcement learning in robotics. In Conference on Robot Learning. PMLR, 907--917.Google Scholar
- Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, and Yung Yi. 2019. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning. PMLR, 5887--5896.Google Scholar
- Andrew J Stier, Marc G Berman, and Luis Bettencourt. 2020. COVID-19 attack rate increases with city size. arXiv preprint arXiv:2003.10376 (2020).Google Scholar
- Sainbayar Sukhbaatar, Rob Fergus, et al. 2016. Learning multiagent communication with backpropagation. Advances in neural information processing systems, Vol. 29 (2016).Google Scholar
- Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. 2017. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017).Google Scholar
- Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, Vol. 12 (1999).Google Scholar
- Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, Vol. 575, 7782 (2019), 350--354.Google Scholar
- Jianhong Wang, Wangkun Xu, Yunjie Gu, Wenbin Song, and Tim C Green. 2021b. Multi-agent reinforcement learning for active voltage control on power distribution networks. Advances in Neural Information Processing Systems, Vol. 34 (2021), 3271--3284.Google Scholar
- Tong Wang, Jiahua Cao, and Azhar Hussain. 2021a. Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning. Transportation research part C: emerging technologies, Vol. 125 (2021), 103046.Google Scholar
- Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning, Vol. 8, 3 (1992), 279--292.Google Scholar
- Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. 2018. Mean field multi-agent reinforcement learning. In International conference on machine learning. PMLR, 5571--5580.Google Scholar
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, and Yang Gao. 2021. Mastering atari games with limited data. Advances in Neural Information Processing Systems, Vol. 34 (2021), 25476--25488.Google Scholar
- Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. 2021. The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021).Google Scholar
- Jian Zhao, Xunhan Hu, Mingyu Yang, Wengang Zhou, Jiangcheng Zhu, and Houqiang Li. 2022. CTDS: Centralized Teacher with Decentralized Student for Multi-Agent Reinforcement Learning. IEEE Transactions on Games (2022).Google Scholar
Index Terms
- GAT-MF: Graph Attention Mean Field for Very Large Scale Multi-Agent Reinforcement Learning
Recommendations
Mediated Multi-Agent Reinforcement Learning
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent SystemsThe majority of Multi-Agent Reinforcement Learning (MARL) literature equates the cooperation of self-interested agents in mixed environments to the problem of social welfare maximization, allowing agents to arbitrarily share rewards and private ...
Deep reinforcement learning for multi-agent interaction
Multi-agent systems research in the United KingdomThe development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel ...
Training Cooperative Agents for Multi-Agent Reinforcement Learning
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent SystemsDeep Learning and back-propagation has been successfully used to perform centralized training with communication protocols among multiple agents in a cooperative environment. In this paper we present techniques for centralized training of Multi-Agent (...
Comments