Skip to main content
Log in

Learning controlled and targeted communication with the centralized critic for the multi-agent system

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-agent deep reinforcement learning (MDRL) has attracted attention for solving complex tasks. Two main challenges of MDRL are non-stationarity and partial observability from the perspective of agents, impacting the performance of agents’ learning cooperative policies. In this study, Controlled and Targeted Communication with the Centralized Critic (COTAC) is proposed, thereby constructing the paradigm of centralized learning and decentralized execution with partial communication. It is capable of decoupling how the MAS obtains environmental information during training and execution. Specifically, COTAC can make the environment faced by agents to be stationarity in the training phase and learn partial communication to overcome the limitation of partial observability in the execution phase. Based on this, decentralized actors learn controlled and targeted communication and policies optimized by centralized critics during training. As a result, agents comprehensively learn when to communicate during the sending and how to target information aggregation during the receiving. Apart from that, COTAC is evaluated on two multi-agent scenarios with continuous space. Experimental results demonstrated that partial agents with important information choose to send messages and targeted aggregate received information by identifying the relevant important information, which can still have better cooperation performance while reducing the communication traffic of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Chen F, Ren W et al (2019) On the control of multi-agent systems: a survey. Found Trends® Syst Control 6(4):339–499

    Article  Google Scholar 

  2. D’Souza F, Costa J, Pires JN (2020) Development of a solution for adding a collaborative robot to an industrial agv. Ind Rob:, Int J Rob Res Appl 47(5):723–735

    Article  Google Scholar 

  3. Mahdoui N, Frémont V, Natalizio E (2020) Communicating multi-uav system for cooperative slam-based exploration. J Intell Rob Syst 98(2):325–343

    Article  Google Scholar 

  4. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Proc Mag 34(6):26–38

    Article  Google Scholar 

  5. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  6. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press

  7. Hernandez-Leal P, Kartal B, Taylor ME (2019) A survey and critique of multiagent deep reinforcement learning. Auton Agent Multi-Agent Syst 33(6):750–797

    Article  Google Scholar 

  8. Fan T, Long P, Liu W, Pan J (2020) Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Int J Rob Res 39(7):856–892

    Article  Google Scholar 

  9. Xiao Y, Hoffman J, Xia T, Amato C (2020) Learning multi-robot decentralized macro-action-based policies via a centralized q-net. In: 2020 IEEE International conference on robotics and automation (ICRA), pp 10695–10701. IEEE

  10. Kiran BR, Sobh I, Talpaert V, Mannion P, Perez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst pp(99):1–18

    Google Scholar 

  11. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350–354

    Article  Google Scholar 

  12. Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern 50(9):3826–3839

    Article  Google Scholar 

  13. Xiao Y, Lyu X, Amato C (2021) Local advantage actor-critic for robust multi-agent deep reinforcement learning. In: 2021 International symposium on multi-robot and multi-agent systems (MRS), pp 155–163. IEEE

  14. Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on artificial intelligence, vol 32

  15. Lowe R, WU Y, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 30:6379–6390

    Google Scholar 

  16. Du W, Ding S (2021) A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artif Intell Rev 54(5):3215–3238

    Article  Google Scholar 

  17. Sukhbaatar S, Fergus R et al (2016) Learning multiagent communication with backpropagation. In: Advances in neural information processing systems, pp 2244–2252

  18. Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J (2019) Tarmac: targeted multi-agent communication. In: International conference on machine learning, pp 1538–1546

  19. Simões D, Lau N, Reis LP (2020) Multi-agent actor centralized-critic with communication. Neurocomputing 390:40–56

    Article  Google Scholar 

  20. Singh A, Jain T, Sukhbaatar S (2019) Individualized controlled continuous communication model for multiagent cooperative and competitive tasks. In: International conference on learning representations

  21. Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning, pp 2961–2970. PMLR

  22. Liu W, Liu S, Cao J, Wang Q, Lang X, Liu Y (2021) Learning communication for cooperation in dynamic agent-number environment. IEEE/ASME Trans Mechatronics 26(4):1846– 1857

    Article  Google Scholar 

  23. Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264

  24. Mu C, Wang K, Ni Z (2021) Adaptive learning and sampled-control for nonlinear game systems using dynamic event-triggering strategy. IEEE Transactions on Neural Networks and Learning Systems

  25. Mu C, Wang K, Sun C (2020) Learning control supported by dynamic event communication applying to industrial systems. IEEE Trans Ind Inf 17(4):2325–2335

    Article  Google Scholar 

  26. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057. PMLR

  27. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  28. Soydaner D (2022) Attention mechanism in neural networks: where it comes and where it goes. Neural Computing and Applications, pp 1–15

  29. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  30. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3):229–256

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (No.61876151, 62032018) and the Fundamental Research Funds for the Central Universities (No.3102019DX1005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Yao.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Q., Yao, Y., Yi, P. et al. Learning controlled and targeted communication with the centralized critic for the multi-agent system. Appl Intell 53, 14819–14837 (2023). https://doi.org/10.1007/s10489-022-04225-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04225-5

Keywords

Navigation