Skip to main content
Log in

Centralized reinforcement learning for multi-agent cooperative environments

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

We study reinforcement learning methods in multi-agent domains where a central controller collects all information and decides an action for every agent. However, multi-agent reinforcement learning (MARL) suffers from the combinatorial explosion of action space. In this work, we propose an improved proximal policy optimization (PPO) algorithm, whose neural network is based on attention mechanism, to solve the combinatorial explosion issue. Our model outputs joint-action instead of distributed action. Parameter sharing of attention mechanism enables the size of neural network linearly with local observation’s length of single agent regardless of the agents’ number. Besides, credit assignment of multi-agent is naturally addressed by gradient ascent in the attention layer. Experiment results demonstrate that our method outperforms independent PPO and centralized PPO with other networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  CAS  PubMed  ADS  Google Scholar 

  2. Vinyals O, Babuschkin I, Czarnecki WM et al (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354

    Article  CAS  PubMed  ADS  Google Scholar 

  3. Levine S, Finn C, Darrell T et al (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373

    MathSciNet  Google Scholar 

  4. Oliehoek FA, Spaan MTJ, Vlassis N (2008) Optimal and approximate Q-value functions for decentralized POMDPs. J Artif Intell Res 32:289–353

    Article  MathSciNet  Google Scholar 

  5. Kraemer L, Banerjee B (2016) Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190:82–94

    Article  Google Scholar 

  6. Tavakoli A, Pardo F, Kormushev P 2018 Action branching architectures for deep reinforcement learning. In: Proceedings of the 32nd AAAI conference on artificial intelligence (AAAI 2018)

  7. Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337

  8. Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  9. Devlin J, Chang M W, Lee K, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805

  10. Brown T B, Mann B, Ryder N, et al (2020) Language models are few-shot learners. https://arxiv.org/abs/2005.14165

  11. Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. https://arxiv.org/abs/2010.11929

  12. Zhang S, Yao L, Sun A et al (2019) Deep learning based recommender system: a survey and new perspectives. ACM Comput Surv (CSUR) 52(1):1–38

    Article  ADS  Google Scholar 

  13. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, ICLR 2015

  14. Sunehag P, Lever G, Gruslys A, et al Value-decomposition networks for cooperative multi-agent learning based on team reward. In: AAMAS. 2018: 2085–2087

  15. Lowe R, Wu Y I, Tamar A, et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390

  16. illicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: International conference on learning representations

  17. Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 2961–2970

  18. Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264

  19. Khan A, Zhang C, Lee D D, et al (2018) Scalable centralized deep multi-agent reinforcement learning via policy gradients. https://arxiv.org/abs/1805.08776

  20. Sutton R S, McAllester D A, Singh S P, et al (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063

  21. Schulman J, Levine S, Abbeel P, et al (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897

  22. Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347

  23. Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. https://arxiv.org/abs/1506.02438

  24. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  CAS  PubMed  Google Scholar 

  25. Tang Y, Agrawal S (2020) Discretizing continuous action space for on-policy optimization. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, no (04), pp 5981–5988

  26. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chongxiao Qu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, C., Bao, Q., Xia, S. et al. Centralized reinforcement learning for multi-agent cooperative environments. Evol. Intel. 17, 267–273 (2024). https://doi.org/10.1007/s12065-022-00703-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-022-00703-4

Keywords

Navigation