Abstract
Multi-agent reinforcement learning is efficient to deal with tasks that require cooperation among different individuals. And communication plays an important role to enhance the cooperation of agents in scalable and unstable environments. However, there are still many challenges because some information of communication may fail to facilitate cooperation or even have a negative effect. Thus, how to explore efficient information for the cooperation of agents is a critical issue to be solved. In this paper, we propose a multi-agent reinforcement learning algorithm with cognition differences and consistent representation (CDCR). The criteria of cognition differences are formulated to explore information possessed by different agents, to help each agent have a better understanding of others. We further train a cognition encoding network to obtain the global cognition consistent representation for each agent, then the representation is used to realize the cognitive consistency of the agent for the environment. To validate the effectiveness of the CDCR, we carry out experiments in Predator-Prey and StarCraft II environments. The results in Predator-Prey demonstrate that the proposed cognition differences can achieve effective communication among agents; the results in StarCraft II demonstrate that considering both cognition differences and consistent representation can increase the test win rate of the baseline algorithm by 29% in the best case, and the ablation studies further demonstrate the positive roles played by the proposed strategies.
Similar content being viewed by others
References
Bernstein D S, Givan R, Immerman N, Zilberstein S (2002) The complexity of decentralized control of markov decision processes. Math Oper Res 27(4):819–840
Cao Y, Yu W, Ren W, Chen G (2012) An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inf 9(1):427–438
Chen H, Liu Y, Zhou Z, Hu D, Zhang M (2020) Gama: Graph attention multi-agent reinforcement learning algorithm for cooperation. Appl Intell 50(12):4195–4205
Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J (2019) Tarmac: Targeted multi-agent communication. In: International Conference on Machine Learning, pp 1538–1546
Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PH, Kohli P, Whiteson S (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp 1146–1155
Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: Association for the Advancement of Artificial Intelligence, pp 2974–2982
Ge H, Song Y, Wu C, Ren J, Tan G (2019) Cooperative deep q-learning with q-value transfer for multi-intersection signal control. IEEE Access 7:40797–40809
Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation, pp 3389–3396
Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp 2961–2970
Jiang H, Shi D, Xue C, Wang Y, Zhang Y (2021) Multi-agent deep reinforcement learning with type-based hierarchical group communication. Appl Intell 51(8):5793–5808
Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in Neural Information Processing Systems, pp 7254–7264
Kim D, Moon S, Hostallero D, Kang W J, Lee T, Son K, Yi Y (2018) Learning to schedule communication in multi-agent reinforcement learning. In: International Conference on Learning Representations
Kraemer L, Banerjee B (2016) Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190:82–94
Lakkaraju K, Speed A (2019) A cognitive-consistency based model of population wide attitude change. In: Complex Adaptive Systems. Springer, pp 17–38
Li S, Gupta J K, Morales P, Allen R, Kochenderfer MJ (2021) Deep implicit coordination graphs for multi-agent reinforcement learning. International Conference on Autonomous Agents and Multiagent Systems
Liu Y, Wang W, Hu Y, Hao J, Chen X, Gao Y (2020) Multi-agent game abstraction via graph attention neural network. In: Association for the Advancement of Artificial Intelligence, pp 7211–7218
Lobov S A, Mikhaylov A N, Shamshin M, Makarov V A, Kazantsev V B (2020) Spatial properties of stdp in a self-learning spiking neural network enable controlling a mobile robot. Front Neurosci 14:88
Lowe R, Wu YI, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp 6379–6390
Mao H, Liu W, Hao J, Luo J, Li D, Zhang Z, Wang J, Xiao Z (2019) Neighborhood cognition consistent multi-agent reinforcement learning. arXiv:191201160
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Oh J, Chockalingam V, Lee H et al (2016) Control of memory, active perception, and action in minecraft. In: International Conference on Machine Learning, pp 2790–2799
Oliehoek F A, Spaan M T, Vlassis N (2008) Optimal and approximate q-value functions for decentralized pomdps. J Artif Intell Res 32:289–353
Padakandla S, Prabuchandran K, Bhatnagar S (2020) Reinforcement learning algorithm for non-stationary environments. Applied Intelligence (11):3590–3606
Palmer G, Tuyls K, Bloembergen D, Savani R (2018) Lenient multi-agent deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems, pp 443–451
Peng P, Wen Y, Yang Y, Yuan Q, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv:170310069
Prashanth L, Bhatnagar S (2011) Reinforcement learning with average cost for adaptive control of traffic lights at intersections. In: 2011 14th International IEEE Conference on Intelligent Transportation Systems, pp 1640–1645
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp 4295–4304
Russo J E, Carlson K A, Meloy M G, Yong K (2008) The goal of consistency as a cause of information distortion. J Exp Psychol Gen 137(3):456–470
Samvelyan M, Rashid T, de Witt CS, Farquhar G, Nardelli N, Rudner TG, Hung CM, Torr PH, Foerster J, Whiteson S (2019) The starcraft multi-agent challenge. In: International Conference on Autonomous Agents and Multiagent Systems, pp 2186– 2188
Simon D, Snow C J, Read S J (2004) The redux of cognitive consistency theories: evidence judgments by constraint satisfaction. J Personal Social Psychol 86(6):814–837
Singh A, Jain T, Sukhbaatar S (2019) Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: International Conference on Learning Representations
Son K, Kim D, Kang W J, Hostallero D, Yi Y (2019) Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning
Sukhbaatar S, Fergus R et al (2016) Learning multiagent communication with backpropagation. In: Advances in Neural Information Processing Systems, pp 2244–2252
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi VF, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K et al (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: International Conference on Autonomous Agents and Multiagent Systems, pp 2085–2087
Tan M (1993) Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the Tenth International Conference on Machine Learning, pp 330–337
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008
Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A, Küttler H, Agapiou J, Schrittwieser J et al (2017) Starcraft ii: A new challenge for reinforcement learning. arXiv:170804782
Wiering M (2000) Multi-agent reinforcement learning for traffic light control. In: Machine Learning: Proceedings of the Seventeenth International Conference, pp 1151–1158
Yang S, Wang J, Deng B, Liu C, Li H, Fietkiewicz C, Loparo K A (2018) Real-time neuromorphic system for large-scale conductance-based spiking neural networks. IEEE Trans Cybern 49(7):2490–2503
Yang S, Deng B, Wang J, Li H, Lu M, Che Y, Wei X, Loparo K A (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Networks Learn Syst 31(1):148–162
Yang S, Gao T, Wang J, Deng B, Lansdell B, Linares-Barranco B (2021a) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 15:601109
Yang S, Wang J, Deng B, Azghadi MR, Linares-Barranco B (2021b) Neuromorphic context-dependent learning framework with fault-tolerant spike routing. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3084250
Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi MR (2021c) Cerebellumorphic: large-scale neuromorphic model and architecture for supervised motor learning. IEEE Transactions on Neural Networks and Learning Systems https://doi.org/10.1109/TNNLS.2021.3057070
Yang Y, Hao J, Liao B, Shao K, Chen G, Liu W, Tang H (2020) Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv:200203939
Zhang SQ, Zhang Q, Lin J (2019) Efficient communication in multi-agent reinforcement learning via variance based control. In: Advances in Neural Information Processing Systems, pp 3235–3244
Zhang SQ, Lin J, Zhang Q (2020) Succinct and robust multi-agent communication with temporal message control. arXiv:201014391
Acknowledgements
This work was supported by the National Key Research and Development Program of China (2018YFB1600600), the National Natural Science Foundation of China under (61976034, 61572104, U1808206), and the Dalian Science and Technology Innovation Fund (2019J12GX035).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ge, H., Ge, Z., Sun, L. et al. Enhancing cooperation by cognition differences and consistent representation in multi-agent reinforcement learning. Appl Intell 52, 9701–9716 (2022). https://doi.org/10.1007/s10489-021-02873-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02873-7