Multi-agent deep reinforcement learning with type-based hierarchical group communication

Jiang, Hao; Shi, Dianxi; Xue, Chao; Wang, Yajie; Wang, Gongju; Zhang, Yongjun

doi:10.1007/s10489-020-02065-9

Multi-agent deep reinforcement learning with type-based hierarchical group communication

Published: 16 January 2021

Volume 51, pages 5793–5808, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Hao Jiang¹,
Dianxi Shi^2,3,
Chao Xue^2,3,
Yajie Wang¹,
Gongju Wang² &
…
Yongjun Zhang²

1149 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

Real-world multi-agent tasks often involve varying types and quantities of agents. These agents connected by complex interaction relationships causes great difficulty for policy learning because they need to learn various interaction types to complete a given task. Therefore, simplifying the learning process is an important issue. In multi-agent systems, agents with a similar type often interact more with each other and exhibit behaviors more similar. That means there are stronger collaborations between these agents. Most existing multi-agent reinforcement learning (MARL) algorithms expect to learn the collaborative strategies of all agents directly in order to maximize the common rewards. This causes the difficulty of policy learning to increase exponentially as the number and types of agents increase. To address this problem, we propose a type-based hierarchical group communication (THGC) model. This model uses prior domain knowledge or predefine rule to group agents, and maintains the group’s cognitive consistency through knowledge sharing. Subsequently, we introduce a group communication and value decomposition method to ensure cooperation between the various groups. Experiments demonstrate that our model outperforms state-of-the-art MARL methods on the widely adopted StarCraft II benchmarks across different scenarios, and also possesses potential value for large-scale real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Deep learning: systematic review, models, challenges, and research directions

Article Open access 07 September 2023

References

Bear A, Kagan A, Rand DG (2017) Co-evolution of cooperation and cognition: the impact of imperfect deliberation and context-sensitive intuition. Proc Royal Soc B Biol Sci 284(1851):20162326
Article Google Scholar
Bresciani PG, Giunchiglia P, Mylopoulos F, Perini J, TROPOS A (2004) An agent oriented software development methodology. Journal of autonomous agents and multiagent systems, Kluwer Academic Publishers
Butler E (2012) The condensed wealth of nations. Centre for Independent Studies
Carion N, Usunier N, Synnaeve G, Lazaric A (2019) A structured prediction approach for generalization in cooperative multi-agent reinforcement learning. In: Advances in neural information processing systems, pp 8130–8140
Chen Y, Zhou M, Wen Y, Yang Y, Su Y, Zhang W, Zhang D, Wang J, Liu H (2018) Factorized q-learning for large-scale multi-agent systems. arXiv:1809.03738
Chuang L, Chao X, Jie H, Wenzhuo L, et al. (2017) Hierarchical architecture design of computer system. Chinese J Comput 40(09):1996–2017
MathSciNet Google Scholar
Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv:1511.07289
Cossentino M, Gaglio S, Sabatucci L, Seidita V (2005) The passi and agile passi mas meta-models compared with a unifying proposal. In: International central and eastern european conference on multi-agent systems, pp 183–192. Springer
Cossentino M, Hilaire V, Molesini A, Seidita V (2014) Handbook on agent-oriented design processes. Springer, Berlin
Book Google Scholar
Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J (2018) Tarmac: Targeted multi-agent communication. arXiv:1810.11187
Dugas C, Bengio Y, Bélisle F., Nadeau C, Garcia R (2009) Incorporating functional knowledge in neural networks. J Mach Learn Res 10(Jun):1239–1262
MathSciNet MATH Google Scholar
Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: Thirty-second AAAI conference on artificial intelligence
Gordon DM (1996) The organization of work in social insect colonies. Nature 380(6570):121–124
Article Google Scholar
Ha D, Dai A, Le QV (2016) Hypernetworks. arXiv:1609.09106
Henriques R, Madeira SC (2016) Bicnet: Flexible module discovery in large-scale biological networks using biclustering. Algorithms Mol Biol 11(1):14
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computat 9(8):1735–1780
Article Google Scholar
Iqbal S, Sha F (2018) Actor-attention-critic for multi-agent reinforcement learning. arXiv:1810.02912
Jeanson R, Kukuk PF, Fewell JH (2005) Emergence of division of labour in halictine bees: contributions of social interactions and behavioural variance. Anim Behav 70(5):1183–1193
Article Google Scholar
Jiang J, Dun C, Lu Z (2018) Graph convolutional reinforcement learning for multi-agent cooperation. arXiv:1810.09202,2(3)
Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Liu Y, Hu Y, Gao Y, Chen Y, Fan C (2019) Value function transfer for deep multi-agent reinforcement learning based on n-step returns. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, pp 457–463
Liu Y, Wang W, Hu Y, Hao J, Chen X, Gao Y (2019) Multi-agent game abstraction via graph attention neural network. arXiv:1911.10715
Long Q, Zhou Z, Gupta A, Fang F, Wu Y, Wang X (2020) Evolutionary population curriculum for scaling multi-agent reinforcement learning. arXiv:2003.10423
Lowe R, Wu YI, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390
Mao H, Liu W, Hao J, Luo J, Li D, Zhang Z, Wang J, Xiao Z (2019) Neighborhood cognition consistent multi-agent reinforcement learning. arXiv:1912.01160
Melo FS, Veloso M (2011) Decentralized mdps with sparse interactions. Artif Intell 175 (11):1757–1789
Article MathSciNet Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICML
Oliehoek FA, Amato C, et al. (2016) A concise introduction to decentralized POMDPs, vol 1. Springer, Berlin
Book Google Scholar
OroojlooyJadid A, Hajinezhad D (2019) A review of cooperative multi-agent deep reinforcement learning. arXiv:1908.03963
Pal SK, Mitra S (1992) Multilayer perceptron, fuzzy sets classifiaction
Ryu H, Shin H, Park J (2020) Multi-agent actor-critic with hierarchical graph attention network. In: AAAI, pp 7236–7243
Samvelyan M, Rashid T, de Witt CS, Farquhar G, Nardelli N, Rudner TG, Hung CM, Torr PH, Foerster J, Whiteson S (2019) The starcraft multi-agent challenge. arXiv:1902.04043
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Singh A, Jain T, Sukhbaatar S (2018) Learning when to communicate at scale in multiagent cooperative and competitive tasks. arXiv:1812.09755
Son K, Kim D, Kang WJ, Hostallero DE, Yi Y (2019) Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. arXiv:1905.05408
Stone P, Veloso M (2000) Multiagent systems: a survey from a machine learning perspective. Auton Robot 8(3):345–383
Article Google Scholar
Sukhbaatar S, Fergus R, et al. (2016) Learning multiagent communication with backpropagation. In: Advances in neural information processing systems, pp 2244–2252
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K et al (2017) Value-decomposition networks for cooperative multi-agent learning. arXiv:1706.05296
Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv:1710.10903
Wang W, Yang T, Liu Y, Hao J, Hao X, Hu Y, Chen Y, Fan C, Gao Y (2020) From few to more: large-scale dynamic multiagent curriculum learning. In: AAAI, pp 7293–7300
Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78 (10):1550–1560
Article Google Scholar
Whiteson S (2018) Qmix: Monotonic value function factorisation for deep multi- agent reinforcement learning
Wooldridge M, Jennings NR, Kinny D (2000) The gaia methodology for agent-oriented analysis and design. Auton Agents Multi-Agent Syst 3(3):285–312
Article Google Scholar
Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning. arXiv:1802.05438
Yu C, Zhang M, Ren F, Tan G (2015) Multiagent learning of coordination in loosely coupled multiagent systems. IEEE Trans Cybern 45(12):2853–2867
Article Google Scholar
Zhang Z, Yang J, Zha H (2019) Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization. arXiv:1909.10651

Download references

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China under Grant No.2017YFB1001901, in part by the Key Program of Tianjin Science and Technology Development Plan under Grant No.18ZXZNGX00120 and in part by the China Postdoctoral Science Foundation under Grant No.2018M643900.

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, China
Hao Jiang & Yajie Wang
Artificial Intelligence Research Center, National Innovation Institute of Defense Technology, Beijing, China
Dianxi Shi, Chao Xue, Gongju Wang & Yongjun Zhang
Tianjin Artificial Intelligence Innovation Center, Tianjin, China
Dianxi Shi & Chao Xue

Authors

Hao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Dianxi Shi
View author publications
You can also search for this author in PubMed Google Scholar
Chao Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yajie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gongju Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yongjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dianxi Shi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Environment details

We follow the settings of SMAC [34], which could be referred in the SMAC paper. For clarity and completeness, we state these environment details again.

1.1 A.1 States and observations

At each time step, agents receive local observations within their field of view. This encompasses information about the map within a circular area around each unit with a radius equal to the sight range, which is set to 9. The sight range makes the environment partially observable for agents. An agent can only observe others if they are both alive and located within its sight range. Hence, there is no way for agents to distinguish whether their teammates are far away or dead. If one unit (both for allies and enemies) is dead or unseen from another agent’s observation, then its unit feature vector is reset to all zeros. The feature vector observed by each agent contains the following attributes for both allied and enemy units within the sight range: distance, relative x, relative y, health, shield, and unit type. If agents are homogeneous, the unit type feature will be omitted. All Protos units have shields, which serve as a source of protection to offset the damage and can regenerate if no new damage is received. Lastly, agents can observe the terrain features surrounding them, in particular, the values of eight points at a fixed radius indicating height and walkability.

The global state is composed of the joint unit features of both ally and enemy soldiers. Specifically, the state vector includes the coordinates of all agents relative to the center of the map, together with unit features present in the observations. Additionally, the state stores the energy/cooldown of the allied units based on the unit property, which represents the minimum delay between attacks/healing. All features, both in the global state and in individual observations of agents, are normalized by their maximum values

1.2 A.2 Action space

The discrete set of actions which agents are allowed to take consists of move[direction], attack[enemy id], stop and no-op. Dead agents can only take no-op action while live agents cannot. Agents can only move with a fixed movement amount 2 in four directions: north, south, east, or west. To ensure decentralization of the task, agents are restricted to use the attack[enemy id] action only towards enemies in their shooting range. This additionally constrains the ability of the units to use the built-in attack-move micro-actions on the enemies that are far away. The shooting range is set to be 6 for all agents. Having a larger sight range than a shooting range allows agents to make use of the move commands before starting to fire. The unit behavior of automatically responding to enemy fire without being explicitly ordered is also disabled. As healer units, Medivacs use heal[agent id] actions instead of attack[enemy id].

1.3 A.3 Rewards

At each time step, the agents receive a joint reward equal to the total damage dealt on the enemy units. In addition, agents receive a bonus of 10 points after killing each opponent, and 200 points after killing all opponents for winning the battle. The rewards are scaled so that the maximum cumulative reward achievable in each scenario is around 20.

Appendix B: Training configurations

The training time is about 14 hours to 24 hours on these maps (Intel (R) Core (TM) i7-8700 CPU @ 3.20GHz, 32 GB RAM, Nvidia GTX 1050 GPU), which is ranging based on the agent numbers and map features of each map. The number of the total training steps is about 2 million and every 10 thousand steps we train and test the model. When training, a batch of 32 epochs are retrieved from the replay buffer which contains the most recent 1000 epochs. We use 𝜖-greedy policy for exploration. The starting exploration rate is set to 1 and the end exploration rate is 0.05. Exploration rate decays linearly at the first 50 thousand steps. We keep the default configurations of environment parameters. Hyperparameters were based on the PyMARL [34] implementation of QMIX and are listed in Table 3. All hyperparameters are the same in StarCraft II.

Table 3 Hyperparameter settings across all runs and algorithms/baselines

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, H., Shi, D., Xue, C. et al. Multi-agent deep reinforcement learning with type-based hierarchical group communication. Appl Intell 51, 5793–5808 (2021). https://doi.org/10.1007/s10489-020-02065-9

Download citation

Accepted: 04 November 2020
Published: 16 January 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10489-020-02065-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent deep reinforcement learning with type-based hierarchical group communication

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Deep learning: systematic review, models, challenges, and research directions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Environment details

1.1 A.1 States and observations

1.2 A.2 Action space

1.3 A.3 Rewards

Appendix B: Training configurations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-agent deep reinforcement learning with type-based hierarchical group communication

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Deep learning: systematic review, models, challenges, and research directions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Environment details

1.1 A.1 States and observations

1.2 A.2 Action space

1.3 A.3 Rewards

Appendix B: Training configurations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation