Centralized reinforcement learning for multi-agent cooperative environments

Lu, Chengxuan; Bao, Qihao; Xia, Shaojie; Qu, Chongxiao

doi:10.1007/s12065-022-00703-4

Centralized reinforcement learning for multi-agent cooperative environments

Special Issue
Published: 20 June 2022

Volume 17, pages 267–273, (2024)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Chengxuan Lu ORCID: orcid.org/0000-0001-9091-7952¹,
Qihao Bao¹,
Shaojie Xia¹ &
…
Chongxiao Qu¹

1001 Accesses
2 Citations
Explore all metrics

Abstract

We study reinforcement learning methods in multi-agent domains where a central controller collects all information and decides an action for every agent. However, multi-agent reinforcement learning (MARL) suffers from the combinatorial explosion of action space. In this work, we propose an improved proximal policy optimization (PPO) algorithm, whose neural network is based on attention mechanism, to solve the combinatorial explosion issue. Our model outputs joint-action instead of distributed action. Parameter sharing of attention mechanism enables the size of neural network linearly with local observation’s length of single agent regardless of the agents’ number. Besides, credit assignment of multi-agent is naturally addressed by gradient ascent in the attention layer. Experiment results demonstrate that our method outperforms independent PPO and centralized PPO with other networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Emerging trends in federated learning: from model fusion to federated X learning

Article Open access 02 April 2024

References

Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article CAS PubMed ADS Google Scholar
Vinyals O, Babuschkin I, Czarnecki WM et al (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354
Article CAS PubMed ADS Google Scholar
Levine S, Finn C, Darrell T et al (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373
MathSciNet Google Scholar
Oliehoek FA, Spaan MTJ, Vlassis N (2008) Optimal and approximate Q-value functions for decentralized POMDPs. J Artif Intell Res 32:289–353
Article MathSciNet Google Scholar
Kraemer L, Banerjee B (2016) Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190:82–94
Article Google Scholar
Tavakoli A, Pardo F, Kormushev P 2018 Action branching architectures for deep reinforcement learning. In: Proceedings of the 32nd AAAI conference on artificial intelligence (AAAI 2018)
Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Devlin J, Chang M W, Lee K, et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805
Brown T B, Mann B, Ryder N, et al (2020) Language models are few-shot learners. https://arxiv.org/abs/2005.14165
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. https://arxiv.org/abs/2010.11929
Zhang S, Yao L, Sun A et al (2019) Deep learning based recommender system: a survey and new perspectives. ACM Comput Surv (CSUR) 52(1):1–38
Article ADS Google Scholar
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, ICLR 2015
Sunehag P, Lever G, Gruslys A, et al Value-decomposition networks for cooperative multi-agent learning based on team reward. In: AAMAS. 2018: 2085–2087
Lowe R, Wu Y I, Tamar A, et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390
illicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: International conference on learning representations
Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 2961–2970
Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264
Khan A, Zhang C, Lee D D, et al (2018) Scalable centralized deep multi-agent reinforcement learning via policy gradients. https://arxiv.org/abs/1805.08776
Sutton R S, McAllester D A, Singh S P, et al (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063
Schulman J, Levine S, Abbeel P, et al (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897
Schulman J, Wolski F, Dhariwal P, et al (2017) Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347
Schulman J, Moritz P, Levine S, et al (2015) High-dimensional continuous control using generalized advantage estimation. https://arxiv.org/abs/1506.02438
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article CAS PubMed Google Scholar
Tang Y, Agrawal S (2020) Discretizing continuous action space for on-policy optimization. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, no (04), pp 5981–5988
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge
Google Scholar

Download references

Author information

Authors and Affiliations

The 52Nd Research Institute of China Electronics Technology Group Corporation, No.9, Wenfu Road, Yuhang, Hangzhou, 311100, China
Chengxuan Lu, Qihao Bao, Shaojie Xia & Chongxiao Qu

Authors

Chengxuan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Qihao Bao
View author publications
You can also search for this author in PubMed Google Scholar
Shaojie Xia
View author publications
You can also search for this author in PubMed Google Scholar
Chongxiao Qu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chongxiao Qu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, C., Bao, Q., Xia, S. et al. Centralized reinforcement learning for multi-agent cooperative environments. Evol. Intel. 17, 267–273 (2024). https://doi.org/10.1007/s12065-022-00703-4

Download citation

Received: 09 May 2021
Revised: 06 January 2022
Accepted: 04 February 2022
Published: 20 June 2022
Issue Date: February 2024
DOI: https://doi.org/10.1007/s12065-022-00703-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Centralized reinforcement learning for multi-agent cooperative environments

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-agent deep reinforcement learning: a survey

Emerging trends in federated learning: from model fusion to federated X learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Centralized reinforcement learning for multi-agent cooperative environments

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-agent deep reinforcement learning: a survey

Emerging trends in federated learning: from model fusion to federated X learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation