Skip to main content

BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning

  • Conference paper
  • First Online:

Abstract

Multi-agent reinforcement learning (MARL) often faces the problem of policy learning under large action space. There are two reasons for the complex action space: first, the decision space of a single agent in a multi-agent system is huge. Second, the complexity of the joint action space caused by the combination of the action spaces of different agents increases exponentially from the increase in the number of agents. How to learn a robust policy in multi-agent cooperative scenarios is a challenge. To address this challenge we propose an algorithm called bidirectionally-coordinated Deep Deterministic Policy Gradient (BiC-DDPG). In BiC-DDPG three mechanisms were designed based on our insights against the challenge: we used a centralized training and decentralized execution architecture to ensure Markov property and thus ensure the convergence of the algorithm, then we used bi-directional rnn structures to achieve information communication when agents cooperate, finally we used a mapping method to map the continuous joint action space output to the discrete joint action space to solve the problem of agents’ decision-making on large joint action space. A series of fine grained experiments in which include scenarios with cooperative and adversarial relationships between homogeneous agents were designed to evaluate our algorithm. The experiment results show that our algorithm out performing the baseline. -

This work was supported in part by the Key Program of Tianjin Science and Technology Development Plan under Grant No. 18ZXZNGX00120 and in part by the China Postdoctoral Science Foundation under Grant No. 2018M643900.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Brown, N., Sandholm, T.: Safe and nested endgame solving for imperfect-information games. In: Workshops at the Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  2. Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5(1), QT06, 1–7, 9–21, 23–43, 45–65, 67–105, 107–115, 117–127 (2012)

    Google Scholar 

  3. Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7

    Chapter  Google Scholar 

  4. Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC-6, 325–327 (1976)

    Google Scholar 

  5. Dulac-Arnold, G., et al.: Deep reinforcement learning in large discrete action spaces. http://arxiv.org/abs/ArtificialIntelligence (2015)

  6. Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients (2017)

    Google Scholar 

  7. Liu, J., Li, P., Chen, W., Qin, K., Qi, L.: Distributed formation control of fractional-order multi-agent systems with relative damping and nonuniform time-delays. ISA Trans. 93, 189–198 (2019)

    Article  Google Scholar 

  8. Kolokoltsov, V.N., Malafeyev, O.A.: Multi-agent interaction and nonlinear Markov games (2019)

    Google Scholar 

  9. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. Comput. Sci. 8(6), A187 (2015)

    Google Scholar 

  10. Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Machine Learning Proceedings 1994, pp. 157–163. Elsevier (1994)

    Google Scholar 

  11. Littman, M.L.: Friend-or-foe Q-learning in general-sum games. ICML 1, 322–328 (2001)

    Google Scholar 

  12. Littman, M.L.: Value-function reinforcement learning in Markov games. Cogn. Syst. Res. 2, 55–66 (2001)

    Article  Google Scholar 

  13. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. ArXiv abs/1706.02275 (2017)

    Google Scholar 

  14. Mnih, V., et al.: Playing Atari with deep reinforcement learning. ArXiv abs/1312.5602 (2013)

    Google Scholar 

  15. Peng, P., et al.: Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play StarCraft combat games (2017)

    Google Scholar 

  16. Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J.N., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)

    Google Scholar 

  17. Samvelyan, M., et al.: The StarCraft multi-agent challenge. In: AAMAS (2019)

    Google Scholar 

  18. Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  19. Seabold, S., Perktold, J.: Statsmodels: econometric and statistical modeling with Python (2010)

    Google Scholar 

  20. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  21. Sukhbaatar, S., Szlam, A., Fergus, R.: Learning multiagent communication with backpropagation. ArXiv abs/1605.07736 (2016)

    Google Scholar 

  22. Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. In: AAMAS (2018)

    Google Scholar 

  23. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. IEEE Trans. Neural Netw. 16, 285–286 (1998)

    MATH  Google Scholar 

  24. Tavares, A.R., Azpurua, H., Santos, A., Chaimowicz, L.: Rock, paper, StarCraft: strategy selection in real-time strategy games. In: AIIDE (2016)

    Google Scholar 

  25. Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)

    Article  Google Scholar 

  26. Vinyals, O., et al.: StarCraft II: a new challenge for reinforcement learning (2017)

    Google Scholar 

  27. Watkins, C.J.C.H.: Learning from delayed reward. Ph.D. thesis, Kings College University of Cambridge (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dianxi Shi or Chao Xue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, G., Shi, D., Xue, C., Jiang, H., Wang, Y. (2021). BiC-DDPG: Bidirectionally-Coordinated Nets for Deep Multi-agent Reinforcement Learning. In: Gao, H., Wang, X., Iqbal, M., Yin, Y., Yin, J., Gu, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 350. Springer, Cham. https://doi.org/10.1007/978-3-030-67540-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67540-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67539-4

  • Online ISBN: 978-3-030-67540-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics