BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning

Wang, Gongju; Shi, Dianxi; Xue, Chao; Jiang, Hao; Wang, Yajie

doi:10.1007/978-3-030-67540-0_20

BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning

Gongju Wang²¹,
Dianxi Shi^21,22,
Chao Xue^21,22,
Hao Jiang²³ &
…
Yajie Wang²³

Conference paper
First Online: 22 January 2021

830 Accesses
2 Citations

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 350))

Abstract

Multi-agent reinforcement learning (MARL) often faces the problem of policy learning under large action space. There are two reasons for the complex action space: first, the decision space of a single agent in a multi-agent system is huge. Second, the complexity of the joint action space caused by the combination of the action spaces of different agents increases exponentially from the increase in the number of agents. How to learn a robust policy in multi-agent cooperative scenarios is a challenge. To address this challenge we propose an algorithm called bidirectionally-coordinated Deep Deterministic Policy Gradient (BiC-DDPG). In BiC-DDPG three mechanisms were designed based on our insights against the challenge: we used a centralized training and decentralized execution architecture to ensure Markov property and thus ensure the convergence of the algorithm, then we used bi-directional rnn structures to achieve information communication when agents cooperate, finally we used a mapping method to map the continuous joint action space output to the discrete joint action space to solve the problem of agents’ decision-making on large joint action space. A series of fine grained experiments in which include scenarios with cooperative and adversarial relationships between homogeneous agents were designed to evaluate our algorithm. The experiment results show that our algorithm out performing the baseline. -

This work was supported in part by the Key Program of Tianjin Science and Technology Development Plan under Grant No. 18ZXZNGX00120 and in part by the China Postdoctoral Science Foundation under Grant No. 2018M643900.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Brown, N., Sandholm, T.: Safe and nested endgame solving for imperfect-information games. In: Workshops at the Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5(1), QT06, 1–7, 9–21, 23–43, 45–65, 67–105, 107–115, 117–127 (2012)
Google Scholar
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Chapter Google Scholar
Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC-6, 325–327 (1976)
Google Scholar
Dulac-Arnold, G., et al.: Deep reinforcement learning in large discrete action spaces. http://arxiv.org/abs/ArtificialIntelligence (2015)
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients (2017)
Google Scholar
Liu, J., Li, P., Chen, W., Qin, K., Qi, L.: Distributed formation control of fractional-order multi-agent systems with relative damping and nonuniform time-delays. ISA Trans. 93, 189–198 (2019)
Article Google Scholar
Kolokoltsov, V.N., Malafeyev, O.A.: Multi-agent interaction and nonlinear Markov games (2019)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. Comput. Sci. 8(6), A187 (2015)
Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Machine Learning Proceedings 1994, pp. 157–163. Elsevier (1994)
Google Scholar
Littman, M.L.: Friend-or-foe Q-learning in general-sum games. ICML 1, 322–328 (2001)
Google Scholar
Littman, M.L.: Value-function reinforcement learning in Markov games. Cogn. Syst. Res. 2, 55–66 (2001)
Article Google Scholar
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. ArXiv abs/1706.02275 (2017)
Google Scholar
Mnih, V., et al.: Playing Atari with deep reinforcement learning. ArXiv abs/1312.5602 (2013)
Google Scholar
Peng, P., et al.: Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play StarCraft combat games (2017)
Google Scholar
Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J.N., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)
Google Scholar
Samvelyan, M., et al.: The StarCraft multi-agent challenge. In: AAMAS (2019)
Google Scholar
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Seabold, S., Perktold, J.: Statsmodels: econometric and statistical modeling with Python (2010)
Google Scholar
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Sukhbaatar, S., Szlam, A., Fergus, R.: Learning multiagent communication with backpropagation. ArXiv abs/1605.07736 (2016)
Google Scholar
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. In: AAMAS (2018)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. IEEE Trans. Neural Netw. 16, 285–286 (1998)
MATH Google Scholar
Tavares, A.R., Azpurua, H., Santos, A., Chaimowicz, L.: Rock, paper, StarCraft: strategy selection in real-time strategy games. In: AIIDE (2016)
Google Scholar
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Article Google Scholar
Vinyals, O., et al.: StarCraft II: a new challenge for reinforcement learning (2017)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed reward. Ph.D. thesis, Kings College University of Cambridge (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Research Center (AIRC), National Innovation Institute of Defense Technology (NIIDT), Beijing, 100166, China
Gongju Wang, Dianxi Shi & Chao Xue
Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin, 300457, China
Dianxi Shi & Chao Xue
College of Computer, National University of Defense Technology, Changsha, 410073, China
Hao Jiang & Yajie Wang

Authors

Gongju Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dianxi Shi
View author publications
You can also search for this author in PubMed Google Scholar
Chao Xue
View author publications
You can also search for this author in PubMed Google Scholar
Hao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yajie Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Dianxi Shi or Chao Xue .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Xi’an Jiaotong-Liverpool University, Suzhou, China
Xinheng Wang
London South Bank University, London, UK
Muddesar Iqbal
Hangzhou Dianzi University, Hangzhou, China
Yuyu Yin
Zhejiang University, Hangzhou, China
Jianwei Yin
Fudan University, Shanghai, China
Ning Gu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, G., Shi, D., Xue, C., Jiang, H., Wang, Y. (2021). BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning. In: Gao, H., Wang, X., Iqbal, M., Yin, Y., Yin, J., Gu, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 350. Springer, Cham. https://doi.org/10.1007/978-3-030-67540-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-67540-0_20
Published: 22 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67539-4
Online ISBN: 978-3-030-67540-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics