Abstract
Multi-agent deep reinforcement learning (MDRL) has attracted attention for solving complex tasks. Two main challenges of MDRL are non-stationarity and partial observability from the perspective of agents, impacting the performance of agents’ learning cooperative policies. In this study, Controlled and Targeted Communication with the Centralized Critic (COTAC) is proposed, thereby constructing the paradigm of centralized learning and decentralized execution with partial communication. It is capable of decoupling how the MAS obtains environmental information during training and execution. Specifically, COTAC can make the environment faced by agents to be stationarity in the training phase and learn partial communication to overcome the limitation of partial observability in the execution phase. Based on this, decentralized actors learn controlled and targeted communication and policies optimized by centralized critics during training. As a result, agents comprehensively learn when to communicate during the sending and how to target information aggregation during the receiving. Apart from that, COTAC is evaluated on two multi-agent scenarios with continuous space. Experimental results demonstrated that partial agents with important information choose to send messages and targeted aggregate received information by identifying the relevant important information, which can still have better cooperation performance while reducing the communication traffic of the system.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Chen F, Ren W et al (2019) On the control of multi-agent systems: a survey. Found Trends® Syst Control 6(4):339–499
D’Souza F, Costa J, Pires JN (2020) Development of a solution for adding a collaborative robot to an industrial agv. Ind Rob:, Int J Rob Res Appl 47(5):723–735
Mahdoui N, Frémont V, Natalizio E (2020) Communicating multi-uav system for cooperative slam-based exploration. J Intell Rob Syst 98(2):325–343
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Proc Mag 34(6):26–38
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press
Hernandez-Leal P, Kartal B, Taylor ME (2019) A survey and critique of multiagent deep reinforcement learning. Auton Agent Multi-Agent Syst 33(6):750–797
Fan T, Long P, Liu W, Pan J (2020) Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Int J Rob Res 39(7):856–892
Xiao Y, Hoffman J, Xia T, Amato C (2020) Learning multi-robot decentralized macro-action-based policies via a centralized q-net. In: 2020 IEEE International conference on robotics and automation (ICRA), pp 10695–10701. IEEE
Kiran BR, Sobh I, Talpaert V, Mannion P, Perez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst pp(99):1–18
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350–354
Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern 50(9):3826–3839
Xiao Y, Lyu X, Amato C (2021) Local advantage actor-critic for robust multi-agent deep reinforcement learning. In: 2021 International symposium on multi-robot and multi-agent systems (MRS), pp 155–163. IEEE
Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on artificial intelligence, vol 32
Lowe R, WU Y, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 30:6379–6390
Du W, Ding S (2021) A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications. Artif Intell Rev 54(5):3215–3238
Sukhbaatar S, Fergus R et al (2016) Learning multiagent communication with backpropagation. In: Advances in neural information processing systems, pp 2244–2252
Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J (2019) Tarmac: targeted multi-agent communication. In: International conference on machine learning, pp 1538–1546
Simões D, Lau N, Reis LP (2020) Multi-agent actor centralized-critic with communication. Neurocomputing 390:40–56
Singh A, Jain T, Sukhbaatar S (2019) Individualized controlled continuous communication model for multiagent cooperative and competitive tasks. In: International conference on learning representations
Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning, pp 2961–2970. PMLR
Liu W, Liu S, Cao J, Wang Q, Lang X, Liu Y (2021) Learning communication for cooperation in dynamic agent-number environment. IEEE/ASME Trans Mechatronics 26(4):1846– 1857
Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems, pp 7254–7264
Mu C, Wang K, Ni Z (2021) Adaptive learning and sampled-control for nonlinear game systems using dynamic event-triggering strategy. IEEE Transactions on Neural Networks and Learning Systems
Mu C, Wang K, Sun C (2020) Learning control supported by dynamic event communication applying to industrial systems. IEEE Trans Ind Inf 17(4):2325–2335
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057. PMLR
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Soydaner D (2022) Attention mechanism in neural networks: where it comes and where it goes. Neural Computing and Applications, pp 1–15
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3):229–256
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (No.61876151, 62032018) and the Fundamental Research Funds for the Central Universities (No.3102019DX1005).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, Q., Yao, Y., Yi, P. et al. Learning controlled and targeted communication with the centralized critic for the multi-agent system. Appl Intell 53, 14819–14837 (2023). https://doi.org/10.1007/s10489-022-04225-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04225-5