BRGR: Multi-agent cooperative reinforcement learning with bidirectional real-time gain representation

He, Xin; Ge, Hongwei; Sun, Liang; Li, Qifeng; Hou, Yaqing

doi:10.1007/s10489-022-04426-y

BRGR: Multi-agent cooperative reinforcement learning with bidirectional real-time gain representation

Published: 17 February 2023

Volume 53, pages 19044–19059, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xin He¹,
Hongwei Ge ORCID: orcid.org/0000-0002-8937-1515¹,
Liang Sun¹,
Qifeng Li¹ &
…
Yaqing Hou¹

437 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In the multi-agent cooperative decision-making process, an agent needs to learn cooperatively with its neighbors to obtain the optimal strategy. The actions of agents can be classified into independent actions and interactive actions. The overall information and neighbor information obtained by the agent are different in guiding the selection of the two types of actions. Generally, real-time interaction between the representations of overall information and neighborhood information can facilitate the cooperative decision-making of agents. Therefore, this paper proposes a bidirectional real-time gain representation (BRGR) mechanism, which explicitly enables such real-time interactions. On the one hand, real-time effective neighborhood information representation is incorporated into the overall information representation via an attention module to achieve the gain of the overall information. The gains provide a better understanding and utilization of neighborhood information and guide the agents to make independent action selections. On the other hand, real-time overall information representation is integrated into the neighborhood information representation to achieve the gain of neighborhood information, which guarantees that the interactive actions are based on the current state of agent. The gains make the agents select proper interactive actions. Thus, the proposed BRGR mechanism enables the agents to effectively learn the optimal cooperative strategies. The BRGR is applied to state-of-the-art multi-agent reinforcement learning algorithms. The experimental results show that the BRGR significantly improves the performance of the base algorithms, and have more advantage in complex environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention-Aware Actor for Cooperative Multi-agent Reinforcement Learning

Multi-agent Cooperation and Competition with Two-Level Attention Network

GAMA: Graph Attention Multi-agent reinforcement learning algorithm for cooperation

Article 14 July 2020

Notes

StarCraft II is a trademark of BlizzardEntertainment^TM.
https://github.com/google-research/football
https://github.com/facebookresearch/CollaQ.
https://github.com/MAS-anony/ASN.

References

Wurman PR, Barrett S, Kawamoto K, MacGlashan J, Subramanian K, Walsh TJ, Capobianco R, Devlic A, Eckert F, Fuchs F, Gilpin L, Khandelwal P, Kompella V, Lin H, MacAlpine P, Oller D, Seno T, Sherstan C, Thomure MD, Aghabozorgi H, Barrett L, Douglas R, Whitehead D, Dvrr P, Stone P, Spranger M, Kitano H (2022) Outracing champion gran turismo drivers with deep reinforcement learning. Nature 602:223–228
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap TP, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359
Article Google Scholar
Yang Z, Merrick KE, Jin L, Abbass HA (2018) Hierarchical deep reinforcement learning for continuous action control. IEEE Transactions on Neural Networks and Learning Systems 29(11):5174–5184
Article MathSciNet Google Scholar
Mendonca M, Chrun I, Neves F, Arruda L (2017) A cooperative architecture for swarm robotic based on dynamic fuzzy cognitive maps. Eng Appl Artif Intell 59:122–132
Article Google Scholar
Chai R, Niu H, Carrasco J, Arvin F, Yin H, Lennox B (2022) Design and experimental validation of deep reinforcement learning-based fast trajectory planning and control for mobile robot in unknown environment. IEEE Transactions on Neural Networks and Learning Systems, 1–15
Zhang R, Xu X, Zhang X, Xiong Q, Yu C, Ma Q, Peng Y (2022) Kernel-based multiagent reinforcement learning for near-optimal formation control of mobile robots. Appl Intell, 1–13
Barros P, Sciutti A (2022) All by myself: learning individualized competitive behavior with a contrastive reinforcement learning optimization. Neural Netw 150:364–376
Article Google Scholar
Vinyals O, Babuschkin I, Czarnecki W, Mathieu M, Dudzik A, Chung J, Choi D, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou J, Jaderberg M, Vezhnevets A, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine T, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, Wunsch D, McKinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575:350–354
Article Google Scholar
Wei Q, Li Y, Zhang J, Wang F (2022) Vgn: Value decomposition with graph attention networks for multiagent reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems, 1–14
Zhang Z, Yang J, Zha H (2020) Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization. In: Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, pp 2083–2085. International Foundation for Autonomous Agents and Multiagent Systems
Ge H, Gao D, Sun L, Hou Y, Yu C, Wang Y, Tan G (2021) Multi-agent transfer reinforcement learning with multi-view encoder for adaptive traffic signal control. IEEE Trans Intell Transp Syst 23:12572–12587
Article Google Scholar
Yu C, Wang X, Xu X, Zhang M, Ge H, Ren J, Sun L, Chen B, Tan G (2020) Distributed multiagent coordinated learning for autonomous driving in highways based on dynamic coordination graphs. IEEE Trans Intell Transp Syst 21:735–748
Article Google Scholar
Grover A, Al-Shedivat M, Gupta JK, Burda Y, Edwards H (2018) Learning policy representations in multiagent systems. In: Proceedings of the 35th International Conference on Machine Learning, vol 80, pp 1797-1806. PMLR, Stockholmsmässan, Stockholm, Sweden
Ling CK, Fang F, Kolter JZ (2018) What game are we playing? end-to-end learning in normal and extensive form games. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp 396–402. ijcai.org
Majumdar S, Khadka S, Miret S, McAleer S, Tumer K (2020) Evolutionary reinforcement learning for sample-efficient multiagent coordination. In: Proceedings of the 37th International Conference on Machine Learning, vol 119, pp 6651–6660. PMLR
Hennes D, Morrill D, Omidshafiei S, Munos R, Pérolat J, Lanctot M, Gruslys A, Lespiau J, Parmas P, Duéñez-Guzmán EA, Tuyls K (2020) Neural replicator dynamics: multiagent learning via hedging policy gradients. In: Proceedings of the 19th international conference on autonomous agents and multiagent systems, pp 492–501. International Foundation for Autonomous Agents and Multiagent Systems
Wang T, Dong H, Lesser VR, Zhang C (2020) ROMA: multi-agent reinforcement learning with emergent roles. In: Proceedings of the 37th International Conference on Machine Learning, vol 119, pp 9876–9886. PMLR
Wang T, Gupta T, Mahajan A, Peng B, Whiteson S, Zhang C (2021) RODE: learning roles to decompose multi-agent tasks. In: 9Th international conference on learning representations, ICLR 2021. Openreview.net
Zhang SQ, Zhang Q, Lin J (2019) Efficient communication in multi-agent reinforcement learning via variance based control. In: Advances in neural information processing systems 32: Annual conference on neural information processing systems 2019, neurIPS 2019, december 8-14, 2019, vancouver, BC, Canada, pp. 3230–3239
Wang T, Wang J, Zheng C, Zhang C (2020) Learning nearly decomposable value functions via communication minimization. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net, Addis Ababa, Ethiopia
Wang J, Ren Z, Liu T, Yu Y, Zhang C (2021) QPLEX: duplex dueling multi-agent q-learning. In: 9Th international conference on learning representations, ICLR 2021. Openreview.net
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi VF, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, Graepel T (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp 2085-2087. International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, USA / ACM, Stockholm, Sweden
Rashid T, Samvelyan M, de Witt CS, Farquhar G, Foerster JN, Whiteson S (2018) QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th international conference on machine learning, vol 80, pp 4292-4301. PMLR, Stockholmsmässan, Stockholm, Sweden
Son K, Kim D, Kang WJ, Hostallero D, Yi Y (2019) QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th international conference on machine learning, ICML 2019, vol 97, pp 5887–5896. PMLR
Chai J, Li W, Zhu Y, Zhao D, Ma Z, Sun K, Ding J (2021) Unmas: multiagent reinforcement learning for unshaped cooperative scenarios. IEEE Transactions on Neural Networks and Learning Systems, 1–12
Zhang T, Xu H, Wang X, Wu Y, Keutzer K, Gonzalez JE, Tian Y (2020) Multi-agent collaboration via reward attribution decomposition. CoRR 2010.08531
Wang W, Yang T, Liu Y, Hao J, Hao X, Hu Y, Chen Y, Fan C, Gao Y (2020) Action semantics network: considering the effects of actions in multiagent systems. In: 8Th international conference on learning representations. Openreview.net, Addis Ababa, Ethiopia
Hansen EA, Bernstein DS, Zilberstein S (2004) Dynamic programming for partially observable stochastic games. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 709–715. AAAI Press / The MIT Press
Jaderberg M, Czarnecki W, Dunning I, Marris L, Lever G, Castaneda A, Beattie C, Rabinowitz N, Morcos A, Ruderman A, Sonnerat N, Green T, Deason L, Leibo J, Silver D, Hassabis D, Kavukcuoglu K, Graepel T (2019) Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science 364:859–865
Article MathSciNet Google Scholar
Nguyen ND, Nguyen T, Nahavandi S (2019) Multi-agent behavioral control system using deep reinforcement learning. Neurocomputing 359:58–68
Article Google Scholar
Lemos LL, Bazzan ALC (2019) Combining adaptation at supply and demand levels in microscopic traffic simulation: a multiagent learning approach. Transportation Research Procedia 37:465–472
Article Google Scholar
Gong Y, Abdel-Aty M, Cai Q, Rahman MS (2019) Decentralized network level adaptive signal control by multi-agent deep reinforcement learning. Transportation Research Interdisciplinary Perspectives 100020:1
Google Scholar
Long Q, Zhou Z, Gupta A, Fang F, Wu Y, Wang X (2020) Evolutionary population curriculum for scaling multi-agent reinforcement learning. In: 8th international conference on learning representations, ICLR 2020. Openreview.net
Wang W, Yang T, Liu Y, Hao J, Hao X, Hu Y, Chen Y, Fan C, Gao Y (2020) From few to more: large-scale dynamic multiagent curriculum learning. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, pp. 7293–7300. AAAI Press
Hoshen Y (2017) VAIN: attentional multi-agent predictive modeling. In: Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, december 4-9, 2017, long beach, CA, USA, pp 2701–2711
Mao H, Zhang Z, Xiao Z, Gong Z (2019) Modelling the dynamic joint policy of teammates with attention multi-agent DDPG. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019, pp 1108–1116
Ge H, Ge Z, Sun L, Wang Y (2022) Enhancing cooperation by cognition differences and consistent representation in multi-agent reinforcement learning. Appl Intell 52(9):9701–9716
Article Google Scholar
Liu X, Tan Y (2022) Attentive relational state representation in decentralized multiagent reinforcement learning. IEEE Trans Cybern 52(1):252–264
Article Google Scholar
Samvelyan M, Rashid T, de Witt CS, Farquhar G, Nardelli N, Rudner TGJ, Hung C-M, Torr PHS, Foerster JN, Whiteson S (2019) The starcraft multi-agent challenge. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 2186–2188. International Foundation for Autonomous Agents and Multiagent Systems
Kurach K, Raichuk A, Stanczyk P, Zajac M, Bachem O, Espeholt L, Riquelme C, Vincent D, Michalski M, Bousquet O, Gelly S (2020) Google research football: a novel reinforcement learning environment. In: Thirty-fourth association for the advancement of artificial intelligence, vol 34, pp 4501–4510
Tan M (1993) Multi-agent reinforcement learning: independent versus cooperative agents. In: Proceedings of the 10th international conference machine learning, pp 330–337. Morgan Kaufmann
Wang Z, Schaul T, Hessel M, van Hasselt H, Lanctot M, de Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33nd international conference on machine learning, vol 48, pp 1995–2003. JMLR.org, New York City, NY, USA

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61976034,U1808206), the Dalian Science and Technology Innovation Fund (2022JJ12GX013), the Natural Science Foundation of Liaoning Province(2022YGJC20), and the Fundamental Research Funds for the Central Universities (DUT21YG106).

Author information

Authors and Affiliations

School of Computer Science and Technology, Dalian University of Technology, Dalian, 160023, Liaoning, China
Xin He, Hongwei Ge, Liang Sun, Qifeng Li & Yaqing Hou

Authors

Xin He
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Ge
View author publications
You can also search for this author in PubMed Google Scholar
Liang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Qifeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yaqing Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongwei Ge.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, X., Ge, H., Sun, L. et al. BRGR: Multi-agent cooperative reinforcement learning with bidirectional real-time gain representation. Appl Intell 53, 19044–19059 (2023). https://doi.org/10.1007/s10489-022-04426-y

Download citation

Accepted: 23 December 2022
Published: 17 February 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10489-022-04426-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BRGR: Multi-agent cooperative reinforcement learning with bidirectional real-time gain representation

Abstract

Access this article

Similar content being viewed by others

Attention-Aware Actor for Cooperative Multi-agent Reinforcement Learning

Multi-agent Cooperation and Competition with Two-Level Attention Network

GAMA: Graph Attention Multi-agent reinforcement learning algorithm for cooperation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BRGR: Multi-agent cooperative reinforcement learning with bidirectional real-time gain representation

Abstract

Access this article

Similar content being viewed by others

Attention-Aware Actor for Cooperative Multi-agent Reinforcement Learning

Multi-agent Cooperation and Competition with Two-Level Attention Network

GAMA: Graph Attention Multi-agent reinforcement learning algorithm for cooperation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation