Skip to main content
Log in

Enhancing cooperation by cognition differences and consistent representation in multi-agent reinforcement learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-agent reinforcement learning is efficient to deal with tasks that require cooperation among different individuals. And communication plays an important role to enhance the cooperation of agents in scalable and unstable environments. However, there are still many challenges because some information of communication may fail to facilitate cooperation or even have a negative effect. Thus, how to explore efficient information for the cooperation of agents is a critical issue to be solved. In this paper, we propose a multi-agent reinforcement learning algorithm with cognition differences and consistent representation (CDCR). The criteria of cognition differences are formulated to explore information possessed by different agents, to help each agent have a better understanding of others. We further train a cognition encoding network to obtain the global cognition consistent representation for each agent, then the representation is used to realize the cognitive consistency of the agent for the environment. To validate the effectiveness of the CDCR, we carry out experiments in Predator-Prey and StarCraft II environments. The results in Predator-Prey demonstrate that the proposed cognition differences can achieve effective communication among agents; the results in StarCraft II demonstrate that considering both cognition differences and consistent representation can increase the test win rate of the baseline algorithm by 29% in the best case, and the ablation studies further demonstrate the positive roles played by the proposed strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Bernstein D S, Givan R, Immerman N, Zilberstein S (2002) The complexity of decentralized control of markov decision processes. Math Oper Res 27(4):819–840

    Article  MathSciNet  Google Scholar 

  2. Cao Y, Yu W, Ren W, Chen G (2012) An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inf 9(1):427–438

    Article  Google Scholar 

  3. Chen H, Liu Y, Zhou Z, Hu D, Zhang M (2020) Gama: Graph attention multi-agent reinforcement learning algorithm for cooperation. Appl Intell 50(12):4195–4205

    Article  Google Scholar 

  4. Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J (2019) Tarmac: Targeted multi-agent communication. In: International Conference on Machine Learning, pp 1538–1546

  5. Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PH, Kohli P, Whiteson S (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp 1146–1155

  6. Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: Association for the Advancement of Artificial Intelligence, pp 2974–2982

  7. Ge H, Song Y, Wu C, Ren J, Tan G (2019) Cooperative deep q-learning with q-value transfer for multi-intersection signal control. IEEE Access 7:40797–40809

    Article  Google Scholar 

  8. Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation, pp 3389–3396

  9. Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp 2961–2970

  10. Jiang H, Shi D, Xue C, Wang Y, Zhang Y (2021) Multi-agent deep reinforcement learning with type-based hierarchical group communication. Appl Intell 51(8):5793–5808

    Article  Google Scholar 

  11. Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Advances in Neural Information Processing Systems, pp 7254–7264

  12. Kim D, Moon S, Hostallero D, Kang W J, Lee T, Son K, Yi Y (2018) Learning to schedule communication in multi-agent reinforcement learning. In: International Conference on Learning Representations

  13. Kraemer L, Banerjee B (2016) Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190:82–94

    Article  Google Scholar 

  14. Lakkaraju K, Speed A (2019) A cognitive-consistency based model of population wide attitude change. In: Complex Adaptive Systems. Springer, pp 17–38

  15. Li S, Gupta J K, Morales P, Allen R, Kochenderfer MJ (2021) Deep implicit coordination graphs for multi-agent reinforcement learning. International Conference on Autonomous Agents and Multiagent Systems

  16. Liu Y, Wang W, Hu Y, Hao J, Chen X, Gao Y (2020) Multi-agent game abstraction via graph attention neural network. In: Association for the Advancement of Artificial Intelligence, pp 7211–7218

  17. Lobov S A, Mikhaylov A N, Shamshin M, Makarov V A, Kazantsev V B (2020) Spatial properties of stdp in a self-learning spiking neural network enable controlling a mobile robot. Front Neurosci 14:88

    Article  Google Scholar 

  18. Lowe R, Wu YI, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp 6379–6390

  19. Mao H, Liu W, Hao J, Luo J, Li D, Zhang Z, Wang J, Xiao Z (2019) Neighborhood cognition consistent multi-agent reinforcement learning. arXiv:191201160

  20. Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  21. Oh J, Chockalingam V, Lee H et al (2016) Control of memory, active perception, and action in minecraft. In: International Conference on Machine Learning, pp 2790–2799

  22. Oliehoek F A, Spaan M T, Vlassis N (2008) Optimal and approximate q-value functions for decentralized pomdps. J Artif Intell Res 32:289–353

    Article  MathSciNet  Google Scholar 

  23. Padakandla S, Prabuchandran K, Bhatnagar S (2020) Reinforcement learning algorithm for non-stationary environments. Applied Intelligence (11):3590–3606

  24. Palmer G, Tuyls K, Bloembergen D, Savani R (2018) Lenient multi-agent deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems, pp 443–451

  25. Peng P, Wen Y, Yang Y, Yuan Q, Tang Z, Long H, Wang J (2017) Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv:170310069

  26. Prashanth L, Bhatnagar S (2011) Reinforcement learning with average cost for adaptive control of traffic lights at intersections. In: 2011 14th International IEEE Conference on Intelligent Transportation Systems, pp 1640–1645

  27. Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp 4295–4304

  28. Russo J E, Carlson K A, Meloy M G, Yong K (2008) The goal of consistency as a cause of information distortion. J Exp Psychol Gen 137(3):456–470

    Article  Google Scholar 

  29. Samvelyan M, Rashid T, de Witt CS, Farquhar G, Nardelli N, Rudner TG, Hung CM, Torr PH, Foerster J, Whiteson S (2019) The starcraft multi-agent challenge. In: International Conference on Autonomous Agents and Multiagent Systems, pp 2186– 2188

  30. Simon D, Snow C J, Read S J (2004) The redux of cognitive consistency theories: evidence judgments by constraint satisfaction. J Personal Social Psychol 86(6):814–837

    Article  Google Scholar 

  31. Singh A, Jain T, Sukhbaatar S (2019) Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: International Conference on Learning Representations

  32. Son K, Kim D, Kang W J, Hostallero D, Yi Y (2019) Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning

  33. Sukhbaatar S, Fergus R et al (2016) Learning multiagent communication with backpropagation. In: Advances in Neural Information Processing Systems, pp 2244–2252

  34. Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi VF, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K et al (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: International Conference on Autonomous Agents and Multiagent Systems, pp 2085–2087

  35. Tan M (1993) Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the Tenth International Conference on Machine Learning, pp 330–337

  36. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008

  37. Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A, Küttler H, Agapiou J, Schrittwieser J et al (2017) Starcraft ii: A new challenge for reinforcement learning. arXiv:170804782

  38. Wiering M (2000) Multi-agent reinforcement learning for traffic light control. In: Machine Learning: Proceedings of the Seventeenth International Conference, pp 1151–1158

  39. Yang S, Wang J, Deng B, Liu C, Li H, Fietkiewicz C, Loparo K A (2018) Real-time neuromorphic system for large-scale conductance-based spiking neural networks. IEEE Trans Cybern 49(7):2490–2503

    Article  Google Scholar 

  40. Yang S, Deng B, Wang J, Li H, Lu M, Che Y, Wei X, Loparo K A (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Networks Learn Syst 31(1):148–162

    Article  Google Scholar 

  41. Yang S, Gao T, Wang J, Deng B, Lansdell B, Linares-Barranco B (2021a) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 15:601109

    Article  Google Scholar 

  42. Yang S, Wang J, Deng B, Azghadi MR, Linares-Barranco B (2021b) Neuromorphic context-dependent learning framework with fault-tolerant spike routing. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3084250

  43. Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi MR (2021c) Cerebellumorphic: large-scale neuromorphic model and architecture for supervised motor learning. IEEE Transactions on Neural Networks and Learning Systems https://doi.org/10.1109/TNNLS.2021.3057070

  44. Yang Y, Hao J, Liao B, Shao K, Chen G, Liu W, Tang H (2020) Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv:200203939

  45. Zhang SQ, Zhang Q, Lin J (2019) Efficient communication in multi-agent reinforcement learning via variance based control. In: Advances in Neural Information Processing Systems, pp 3235–3244

  46. Zhang SQ, Lin J, Zhang Q (2020) Succinct and robust multi-agent communication with temporal message control. arXiv:201014391

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2018YFB1600600), the National Natural Science Foundation of China under (61976034, 61572104, U1808206), and the Dalian Science and Technology Innovation Fund (2019J12GX035).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Sun.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ge, H., Ge, Z., Sun, L. et al. Enhancing cooperation by cognition differences and consistent representation in multi-agent reinforcement learning. Appl Intell 52, 9701–9716 (2022). https://doi.org/10.1007/s10489-021-02873-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02873-7

Keywords

Navigation