Abstract
Multi-agent reinforcement learning (MARL) is an important way to realize multi-agent cooperation. But there are still many challenges, including the scalability and the uncertainty of the environment that limit its application. In this paper, we explored to solve those problems through the graph network and the attention mechanism. Finally we succeeded in extending the existing algorithm and obtaining a new algorithm called GAMA. Specifically through the graph network, we made the environment information shared among agents. Meanwhile, the unimportant information was filtered out with the help of the attention mechanism, which helped to improve the communication efficiency. As a result, GAMA obtained the highest mean episode rewards compared to the baselines as well as excellent scalability. The reason why we choose the graph network is that understanding the relationship among agents plays a key role in solving multi-agent problems. And the graph network is very suitable for relational induction bias. Through the integration with the attention mechanism, it was shown that agents could figure out their relationship and focus on the influential environment factors in our experiment.
Similar content being viewed by others
References
Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Mali-nowski M, Tacchetti A, Raposo D, Santoro A, Faulkner R (2018) Relational inductive biases, deep learning and graph networks
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym
Demis H, Dharshan K, Maguire EA (2007) Using imagination to understand the neural basis of episodic memory. Journal of Neuroscience the Official Journal of the Society for Neuroscience 27(52):1436574
Rubenstein M, Cornejo A, Nagpal R (2014) Programmable self-assembly in a thousand-robot swarm. Science 345(6198):795–799
Werfel J, Petersen K, Nagpal R (2014) Designing collective behavior in a termite-inspired robot construction team. Science 343(6172):754–758
Souidi MEH, et al. (2016) “Multi-agent cooperation pursuit based on an extension of AALAADIN organisational model”. Journal of Experimental & Theoretical Artificial Intelligence 28.6:1075–1088
Kotb Y, Ridhawi I, Aloqaily M, Baker T, Jararweh Y, Tawfik H (2019) Cloud-Based Multi-Agent Cooperation for IoT Devices Using Workflow-Nets, vol 17
Singh AJ, Kumar A (2019) Graph based optimization for multiagent cooperation. In: Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Montreal, Canada, May 13-17. 1497-1505. Research Collection School Of Information Systems
Buccafurri F, Palopoli L, Rosaci D, Sarnè GML (2004) Modeling Cooperation in Multi-Agent Communities. Cognitive Systems Research Journal (CSRJ) (lavoro invitato) 5(3):171—190. Elsevier
Rosaci D (2007) CILIOS : Connectionist Inductive Learning And Inter-Ontology Similarities for Recommending Information Agents. Information Systems 32(6):793–825 . Elsevier
Diestel R (2013) Graph Theory
Guttenberg N, Yu Y, Kanai R (2017) Counterfactual control for free from generative models
Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D (2017) Deep reinforcement learning that matters. arXiv:1709.06560
Iqbal S, Sha F (2018) Actor-attention-critic for multi-agent reinforcement learning. arXiv:1810.02912
Kingma DP, Ba J (2014) Adam:, A method for stochastic optimization. arXiv:1412.6980
Konda V, Tsitsiklis J (2001) Actor-critic algorithms, vol 42
Li Y (2017) Deep reinforcement learning: An overview. arXiv:1701.07274
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 63796390. http://papers.nips.cc/paper/7217-multi-agent-actor-critic-for-mixed-cooperative-competitive-environments
Raileanu R, Denton E, Szlam A, Fergus R (2018) Modeling others using oneself in multi-agent reinforcement learning. arXiv:1802.09640
Schacter DL, Addis DR, Hassabis D, Martin VC, Spreng RN, Szpunar KK (2012) The future of memory: Remembering, imagining, and the brain. Neuron 76(4):677694
Shi W, Song S, Wu C (2019) Soft policy gradient method for maximum entropy deep reinforcement learning. pp 34253431
Busoniu L, Babu Ska R, Schutter BD (2010) Multi-agent reinforcement learning: An overview
Tan M (1998) Readings in agents. chap. Multi-agent Reinforcement learning: Independent vs.Cooperative Agents. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, p 487494. http://dl.acm.org/citation.cfm?id=284860.284934
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762
Velikovi P, Cucurull G, Casanova A, Romero A, Li P, Bengio Y (2018) Graph attention networks. In: International Conference on Learning Representations. https://openreview.net/forum?id=rJXMpikCZ
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8(3-4):229256
Zhu Y, Mottaghi R, Kolve E, Lim JJ, Farhadi A (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: IEEE International conference on robotics & automation
Ackermann J, Gabler V, Osa T, Sugiyama M (2019) Reducing overestimation bias in multi-agent domains using double centralized critics
Sutton RS, Barto AG (1988) Reinforcement learning: an introduction. IEEE Transactions on Neural Networks 16:285286
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, H., Liu, Y., Zhou, Z. et al. GAMA: Graph Attention Multi-agent reinforcement learning algorithm for cooperation. Appl Intell 50, 4195–4205 (2020). https://doi.org/10.1007/s10489-020-01755-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01755-8