research-article

Open Access

GAT-MF: Graph Attention Mean Field for Very Large Scale Multi-Agent Reinforcement Learning

Authors:
Qianyue Hao

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0002-7109-3588
View Profile

,
Wenzhen Huang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0003-0454-7516
View Profile

,
Tao Feng

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0002-7341-0225
View Profile

,
Jian Yuan

Tsinghua University, Tsinghua University, Beijing, China

Tsinghua University, Tsinghua University, Beijing, China

0000-0001-9734-6056
View Profile

,
Yong Li

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0001-5617-1659
View Profile

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAugust 2023Pages 685–697https://doi.org/10.1145/3580305.3599359

Published:04 August 2023Publication History

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 685–697

ABSTRACT

Recent advancements in reinforcement learning have witnessed remarkable achievements by intelligent agents ranging from game-playing to industrial applications. Of particular interest is the area of multi-agent reinforcement learning (MARL), which holds significant potential for real-world scenarios. However, typical MARL methods are limited in their ability to handle tens of agents, leaving scenarios with up to hundreds or even thousands of agents almost unexplored. The scaling up of the number of agents presents two primary challenges: (1) agent-agent interactions are crucial in multi-agent systems while the number of interactions grows quadratically with the number of agents, resulting in substantial computational complexity and difficulty in strategies-learning; (2) the strengths of interactions among agents exhibit variations both across agents and over time, making it difficult to precisely model such interactions. In this paper, we propose a novel approach named Graph Attention Mean Field (GAT-MF). By converting agent-agent interactions into interactions between each agent and a weighted mean field, we achieve a substantial reduction in computational complexity. The proposed method offers a precise modeling of interaction dynamics with mathematical proofs of its correctness. Additionally, we design a graph attention mechanism to automatically capture the diverse and time-varying strengths of interactions, ensuring an accurate representation of agent interactions. Through extensive experimentation conducted in both manual and real-world scenarios involving over 3000 agents, we validate the efficacy of our method. The results demonstrate that our method outperforms the best baseline method with a remarkable improvement of 42.7%. Furthermore, our method saves 86.4% training time and 19.2% GPU memory compared to the best baseline method. For reproducibility, our source codes and data are available at https://github.com/tsinghua-fib-lab/Large-Scale-MARL-GATMF.

Supplemental Material

rtfp0549-2min-promo.mp4

mp4

31.1 MB

Download

References

Husamelddin AM Balla, Chen Guang Sheng, and Weipeng Jing. 2021. Reliability-aware: task scheduling in cloud computing using multi-agent reinforcement learning algorithm and neural fitted Q. Int. Arab J. Inf. Technol., Vol. 18, 1 (2021), 36--47.Google Scholar
Hamsa Bastani, Kimon Drakopoulos, Vishal Gupta, Ioannis Vlachogiannis, Christos Hadjicristodoulou, Pagona Lagiou, Gkikas Magiorkinis, Dimitrios Paraskevis, and Sotirios Tsiodras. 2021. Efficient and targeted COVID-19 border testing via reinforcement learning. Nature, Vol. 599, 7883 (2021), 108--113.Google Scholar
Wendelin Böhmer, Vitaly Kurin, and Shimon Whiteson. 2020. Deep coordination graphs. In International Conference on Machine Learning. PMLR, 980--991.Google Scholar
Lukas Brunke, Melissa Greeff, Adam W Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P Schoellig. 2022. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, Vol. 5 (2022), 411--444.Google ScholarCross Ref
Serina Chang, Emma Pierson, Pang Wei Koh, Jaline Gerardin, Beth Redbird, David Grusky, and Jure Leskovec. 2021a. Mobility network models of COVID-19 explain inequities and inform reopening. Nature, Vol. 589, 7840 (2021), 82--87.Google Scholar
Serina Chang, Mandy L Wilson, Bryan Lewis, Zakaria Mehrab, Komal K Dudakiya, Emma Pierson, Pang Wei Koh, Jaline Gerardin, Beth Redbird, David Grusky, et al. 2021b. Supporting covid-19 policy response with large-scale mobility-based modeling. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2632--2642.Google ScholarDigital Library
Dezhi Chen, Qi Qi, Zirui Zhuang, Jingyu Wang, Jianxin Liao, and Zhu Han. 2020. Mean field deep reinforcement learning for fair and efficient UAV control. IEEE Internet of Things Journal, Vol. 8, 2 (2020), 813--828.Google ScholarCross Ref
Lin Chen, Fengli Xu, Zhenyu Han, Kun Tang, Pan Hui, James Evans, and Yong Li. 2022b. Strategic COVID-19 vaccine distribution can simultaneously elevate social utility and equity. Nature Human Behaviour (2022), 1--12.Google Scholar
Yiqun Chen, Hangyu Mao, Tianle Zhang, Shiguang Wu, Bin Zhang, Jianye Hao, Dong Li, Bin Wang, and Hongxing Chang. 2022a. PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2210.08872 (2022).Google Scholar
Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip HS Torr, Mingfei Sun, and Shimon Whiteson. 2020. Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533 (2020).Google Scholar
Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de Las Casas, et al. 2022. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, Vol. 602, 7897 (2022), 414--419.Google Scholar
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2018. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarCross Ref
Carlos Guestrin, Michail G. Lagoudakis, and Ronald Parr. 2002. Coordinated Reinforcement Learning. In Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), University of New South Wales, Sydney, Australia, July 8-12, 2002, Claude Sammut and Achim G. Hoffmann (Eds.). Morgan Kaufmann, 227--234.Google Scholar
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).Google Scholar
Qianyue Hao, Wenzhen Huang, Fengli Xu, Kun Tang, and Yong Li. 2022. Reinforcement Learning Enhances the Experts: Large-scale COVID-19 Vaccine Allocation with Multi-factor Contact Network. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4684--4694.Google ScholarDigital Library
Qianyue Hao, Fengli Xu, Lin Chen, Pan Hui, and Yong Li. 2021. Hierarchical Reinforcement Learning for Scarce Medical Resource Allocation with Imperfect Information. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2955--2963.Google ScholarDigital Library
Jiechuan Jiang, Chen Dun, Tiejun Huang, and Zongqing Lu. 2018. Graph convolutional reinforcement learning. arXiv preprint arXiv:1810.09202 (2018).Google Scholar
Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang. 2021. Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251 (2021).Google Scholar
Jingbo Li, Xingjun Zhang, Jia Wei, Zeyu Ji, and Zheng Wei. 2022. GARLSched: Generative adversarial deep reinforcement learning task scheduling optimization for large-scale high performance computing systems. Future Generation Computer Systems (2022).Google Scholar
Minne Li, Zhiwei Qin, Yan Jiao, Yaodong Yang, Jun Wang, Chenxi Wang, Guobin Wu, and Jieping Ye. 2019. Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In The world wide web conference. 983--994.Google Scholar
Yong Liu, Weixun Wang, Yujing Hu, Jianye Hao, Xingguo Chen, and Yang Gao. 2020. Multi-Agent Game Abstraction via Graph Attention Neural Network. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 7211--7218. https://ojs.aaai.org/index.php/AAAI/article/view/6211Google ScholarCross Ref
Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, Vol. 30 (2017).Google ScholarDigital Library
Yi Ma, Xiaotian Hao, Jianye Hao, Jiawen Lu, Xing Liu, Tong Xialiang, Mingxuan Yuan, Zhigang Li, Jie Tang, and Zhaopeng Meng. 2021. A hierarchical reinforcement learning based optimization framework for large-scale dynamic pickup and delivery problems. Advances in Neural Information Processing Systems, Vol. 34 (2021), 23609--23620.Google Scholar
Hangyu Mao, Wulong Liu, Jianye Hao, Jun Luo, Dong Li, Zhengchao Zhang, Jun Wang, and Zhen Xiao. 2020a. Neighborhood cognition consistent multi-agent reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 7219--7226.Google ScholarCross Ref
Hangyu Mao, Zhengchao Zhang, Zhen Xiao, and Zhibo Gong. 2018. Modelling the dynamic joint policy of teammates with attention multi-agent DDPG. arXiv preprint arXiv:1811.07029 (2018).Google Scholar
Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong, and Yan Ni. 2020b. Learning agent communication under limited bandwidth by message pruning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5142--5149.Google ScholarCross Ref
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).Google Scholar
Ranjit Nair, Pradeep Varakantham, Milind Tambe, and Makoto Yokoo. 2005. Networked Distributed POMDPs: A Synthesis of Distributed Constraint Optimization and POMDPs. In Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, July 9-13, 2005, Pittsburgh, Pennsylvania, USA, Manuela M. Veloso and Subbarao Kambhampati (Eds.). AAAI Press / The MIT Press, 133--139. http://www.aaai.org/Library/AAAI/2005/aaai05-022.phpGoogle Scholar
Yaru Niu, Rohan R Paleja, and Matthew C Gombolay. 2021a. Multi-Agent Graph-Attention Communication and Teaming.. In AAMAS. 964--973.Google Scholar
Yaru Niu, Rohan R. Paleja, and Matthew C. Gombolay. 2021b. Multi-Agent Graph-Attention Communication and Teaming. In AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, Virtual Event, United Kingdom, May 3-7, 2021, Frank Dignum, Alessio Lomuscio, Ulle Endriss, and Ann Nowé (Eds.). ACM, 964--973. https://doi.org/10.5555/3463952.3464065Google ScholarDigital Library
Frans A. Oliehoek, Stefan J. Witwicki, and Leslie Pack Kaelbling. 2021. A Sufficient Statistic for Influence in Structured Multiagent Environments. J. Artif. Intell. Res., Vol. 70 (2021), 789--870. https://doi.org/10.1613/jair.1.12136Google ScholarDigital Library
Dawei Qiu, Yujian Ye, Dimitrios Papadaskalopoulos, and Goran Strbac. 2021. Scalable coordinated management of peer-to-peer energy trading: A multi-cluster deep reinforcement learning approach. Applied Energy, Vol. 292 (2021), 116940.Google ScholarCross Ref
Shuhui Qu, Jie Wang, and Juergen Jasperneite. 2019. Dynamic scheduling in modern processing systems using expert-guided distributed reinforcement learning. In 2019 24th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 459--466.Google ScholarDigital Library
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2018 Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International conference on machine learning. PMLR, 4295--4304.Google Scholar
Tao Ren, Jianwei Niu, Bin Dai, Xuefeng Liu, Zheyuan Hu, Mingliang Xu, and Mohsen Guizani. 2021. Enabling Efficient Scheduling in Large-Scale UAV-Assisted Mobile-Edge Computing via Hierarchical Reinforcement Learning. IEEE Internet of Things Journal, Vol. 9, 10 (2021), 7095--7109.Google ScholarCross Ref
Jingqing Ruan, Yali Du, Xuantang Xiong, Dengpeng Xing, Xiyun Li, Linghui Meng, Haifeng Zhang, Jun Wang, and Bo Xu. 2022. GCS: Graph-Based Coordination Strategy for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2201.06257 (2022).Google Scholar
Heechang Ryu, Hayong Shin, and Jinkyoo Park. 2020a. Multi-agent actor-critic with hierarchical graph attention network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7236--7243.Google ScholarCross Ref
Heechang Ryu, Hayong Shin, and Jinkyoo Park. 2020b. Multi-Agent Actor-Critic with Hierarchical Graph Attention Network. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 7236--7243. https://ojs.aaai.org/index.php/AAAI/article/view/6214Google ScholarCross Ref
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google Scholar
Dian Shi, Hao Gao, Li Wang, Miao Pan, Zhu Han, and H Vincent Poor. 2020. Mean field game guided deep reinforcement learning for task placement in cooperative multiaccess edge computing. IEEE Internet of Things Journal, Vol. 7, 10 (2020), 9330--9340.Google ScholarCross Ref
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In International conference on machine learning. PMLR, 387--395.Google Scholar
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. Nature, Vol. 550, 7676 (2017), 354--359.Google Scholar
Samarth Sinha, Ajay Mandlekar, and Animesh Garg. 2022. S4RL: Surprisingly simple self-supervision for offline reinforcement learning in robotics. In Conference on Robot Learning. PMLR, 907--917.Google Scholar
Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, and Yung Yi. 2019. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning. PMLR, 5887--5896.Google Scholar
Andrew J Stier, Marc G Berman, and Luis Bettencourt. 2020. COVID-19 attack rate increases with city size. arXiv preprint arXiv:2003.10376 (2020).Google Scholar
Sainbayar Sukhbaatar, Rob Fergus, et al. 2016. Learning multiagent communication with backpropagation. Advances in neural information processing systems, Vol. 29 (2016).Google Scholar
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. 2017. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017).Google Scholar
Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, Vol. 12 (1999).Google Scholar
Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, Vol. 575, 7782 (2019), 350--354.Google Scholar
Jianhong Wang, Wangkun Xu, Yunjie Gu, Wenbin Song, and Tim C Green. 2021b. Multi-agent reinforcement learning for active voltage control on power distribution networks. Advances in Neural Information Processing Systems, Vol. 34 (2021), 3271--3284.Google Scholar
Tong Wang, Jiahua Cao, and Azhar Hussain. 2021a. Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning. Transportation research part C: emerging technologies, Vol. 125 (2021), 103046.Google Scholar
Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning, Vol. 8, 3 (1992), 279--292.Google Scholar
Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. 2018. Mean field multi-agent reinforcement learning. In International conference on machine learning. PMLR, 5571--5580.Google Scholar
Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, and Yang Gao. 2021. Mastering atari games with limited data. Advances in Neural Information Processing Systems, Vol. 34 (2021), 25476--25488.Google Scholar
Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. 2021. The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021).Google Scholar
Jian Zhao, Xunhan Hu, Mingyu Yang, Wengang Zhou, Jiangcheng Zhu, and Houqiang Li. 2022. CTDS: Centralized Teacher with Decentralized Student for Multi-Agent Reinforcement Learning. IEEE Transactions on Games (2022).Google Scholar

Index Terms

GAT-MF: Graph Attention Mean Field for Very Large Scale Multi-Agent Reinforcement Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Multi-agent reinforcement learning

Recommendations

Mediated Multi-Agent Reinforcement Learning
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

The majority of Multi-Agent Reinforcement Learning (MARL) literature equates the cooperation of self-interested agents in mixed environments to the problem of social welfare maximization, allowing agents to arbitrarily share rewards and private ...
Read More
Deep reinforcement learning for multi-agent interaction
Multi-agent systems research in the United Kingdom

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel ...
Read More
Training Cooperative Agents for Multi-Agent Reinforcement Learning
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Deep Learning and back-propagation has been successfully used to perform centralized training with communication protocols among multiple agents in a cooperative environment. In this paper we present techniques for centralized training of Multi-Agent (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
ISBN:9798400701030
DOI:10.1145/3580305
General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 August 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph attention
large-scale decision problem
mean field
multi-agent reinforcement learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 678
  Total Downloads
- Downloads (Last 12 months)678
- Downloads (Last 6 weeks)108
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GAT-MF: Graph Attention Mean Field for Very Large Scale Multi-Agent Reinforcement Learning

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Mediated Multi-Agent Reinforcement Learning

Deep reinforcement learning for multi-agent interaction

Training Cooperative Agents for Multi-Agent Reinforcement Learning