MRRC: Multi-agent Reinforcement Learning with Rectification Capability in Cooperative Tasks

Yu, Sheng; Zhu, Wei; Liu, Shuhong; Gong, Zhengwen; Chen, Haoran

doi:10.1007/978-981-99-8082-6_16

Sheng Yu¹²,
Wei Zhu¹²,
Shuhong Liu¹²,
Zhengwen Gong¹² &
…
Haoran Chen¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14448))

Included in the following conference series:

International Conference on Neural Information Processing

1160 Accesses

Abstract

Motivated by the centralised training with decentralised execution (CTDE) paradigm, multi-agent reinforcement learning (MARL) algorithms have made significant strides in addressing cooperative tasks. However, the challenges of sparse environmental rewards and limited scalability have impeded further advancements in MARL. In response, MRRC, a novel actor-critic-based approach is proposed. MRRC tackles the sparse reward problem by equipping each agent with both an individual policy and a cooperative policy, harnessing the benefits of the individual policy’s rapid convergence and the cooperative policy’s global optimality. To enhance scalability, MRRC employs a monotonic mix network to rectify the state-action value function Q for each agent, yielding the joint value function ${Q_{tot}}$ to facilitate global updates of the entire critic network. Additionally, the Gumbel-Softmax technique is introduced to rectify discrete actions, enabling MRRC to handle discrete tasks effectively. By comparing MRRC with advanced baseline algorithms in the “Predator-Prey” and challenging “SMAC” environments, as well as conducting ablation experiments, the superior performance of MRRC is demonstrated in this study. The experimental results reveal the efficacy of MRRC in reward-sparse environments and its ability to scale well with increasing numbers of agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aleardi, M., Vinciguerra, A., Stucchi, E., Hojat, A.: Machine learning-accelerated gradient-based Markov chain monte Carlo inversion applied to electrical resistivity tomography. Near Surface Geophys. 20(4), 440–461 (2022)
Article Google Scholar
Barth-Maron, G., et al.: Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617 (2018)
Castiglioni, I., et al.: AI applications to medical images: from machine learning to deep learning. Physica Med. 83, 9–24 (2021)
Article Google Scholar
Cetinic, E., She, J.: Understanding and creating art with AI: review and outlook. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 18(2), 1–22 (2022)
Article Google Scholar
Dalal, G., Hallak, A., Dalton, S., Mannor, S., Chechik, G., et al.: Improve agents without retraining: parallel tree search with off-policy correction. Adv. Neural. Inf. Process. Syst. 34, 5518–5530 (2021)
Google Scholar
Ha, D., Dai, A., Le, Q.V.: Hypernetworks. arXiv preprint arXiv:1609.09106 (2016)
Huang, S., et al.: A constrained multi-objective reinforcement learning framework. In: Conference on Robot Learning, pp. 883–893. PMLR (2022)
Google Scholar
Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 2961–2970. PMLR (2019)
Google Scholar
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
Jin, L., Qian, S., Owens, A., Fouhey, D.F.: Planar surface reconstruction from sparse views. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12991–13000 (2021)
Google Scholar
Kraemer, L., Banerjee, B.: Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190, 82–94 (2016)
Article Google Scholar
Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Lu, Y., Li, W.: Techniques and paradigms in modern game AI systems. Algorithms 15(8), 282 (2022)
Article Google Scholar
Majumdar, S., Khadka, S., Miret, S., McAleer, S., Tumer, K.: Evolutionary reinforcement learning for sample-efficient multiagent coordination. In: International Conference on Machine Learning, pp. 6651–6660. PMLR (2020)
Google Scholar
Mansour, R.F., El Amraoui, A., Nouaouri, I., Díaz, V.G., Gupta, D., Kumar, S.: Artificial intelligence and internet of things enabled disease diagnosis model for smart healthcare systems. IEEE Access 9, 45137–45146 (2021)
Article Google Scholar
Nian, R., Liu, J., Huang, B.: A review on reinforcement learning: introduction and applications in industrial process control. Comput. Chem. Eng. 139, 106886 (2020)
Article Google Scholar
Oliehoek, F.A., Amato, C.: A concise introduction to decentralized pomdps (2015)
Google Scholar
Peng, B., et al.: FACMAC: factored multi-agent centralised policy gradients. In: Advances in Neural Information Processing Systems, vol. 34, pp. 12208–12221 (2021)
Google Scholar
Qin, Z., Zhang, K., Chen, Y., Chen, J., Fan, C.: Learning safe multi-agent control with decentralized neural barrier certificates. arXiv preprint arXiv:2101.05436 (2021)
Rajeswar, S., et al.: Haptics-based curiosity for sparse-reward tasks. In: Conference on Robot Learning, pp. 395–405. PMLR (2022)
Google Scholar
Rashid, T., Samvelyan, M., De Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 21(1), 7234–7284 (2020)
MathSciNet MATH Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)
Shao, Y., et al.: Multi-objective neural evolutionary algorithm for combinatorial optimization problems. IEEE Trans. Neural Networks Learn. Syst. 34, 2133–2143 (2021)
Article Google Scholar
Sharma, P.K., Fernandez, R., Zaroukian, E., Dorothy, M., Basak, A., Asher, D.E.: Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training. In: Artificial Intelligence and Machine Learning for Multi-domain Operations Applications III, vol. 11746, pp. 665–676. SPIE (2021)
Google Scholar
Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: Hugginggpt: solving AI tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580 (2023)
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR (2019)
Google Scholar
Vieillard, N., Kozuno, T., Scherrer, B., Pietquin, O., Munos, R., Geist, M.: Leverage the average: an analysis of kl regularization in reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 12163–12174 (2020)
Google Scholar
Wang, J., Ren, Z., Liu, T., Yu, Y., Zhang, C.: Qplex: duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020)
Wang, J., Zhang, Y., Gu, Y., Kim, T.K.: Shaq: Incorporating shapley value theory into multi-agent q-learning. Adv. Neural. Inf. Process. Syst. 35, 5941–5954 (2022)
Google Scholar
Wang, L., et al.: Individual reward assisted multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 23417–23432. PMLR (2022)
Google Scholar
Wang, T., Wang, J., Zheng, C., Zhang, C.: Learning nearly decomposable value functions via communication minimization. arXiv preprint arXiv:1910.05366 (2019)
Wang, Y., Han, B., Wang, T., Dong, H., Zhang, C.: Off-policy multi-agent decomposed policy gradients. arXiv preprint arXiv:2007.12322 (2020)
Yan, Y., Chow, A.H., Ho, C.P., Kuo, Y.H., Wu, Q., Ying, C.: Reinforcement learning for logistics and supply chain management: methodologies, state of the art, and future opportunities. Transp. Res. Part E Logist. Transp. Rev. 162, 102712 (2022)
Article Google Scholar
Zhang, R., McNeese, N.J., Freeman, G., Musick, G.: “An ideal human’’ expectations of AI teammates in human-AI teaming. Proc. ACM Hum.-Comput. Inter. 4(CSCW3), 1–25 (2021)
Article Google Scholar
Zhang, T., Li, Y., Wang, C., Xie, G., Lu, Z.: Fop: factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 12491–12500. PMLR (2021)
Google Scholar
Zhou, H., Lan, T., Aggarwal, V.: PAC: assisted value factorisation with counterfactual predictions in multi-agent reinforcement learning. arXiv preprint arXiv:2206.11420 (2022)

Download references

Acknowledgements

This work is sponsored by Equipment Advance Research Fund (NO. 61406190118).

Author information

Authors and Affiliations

School of Information and Communication, National University of Defense Technology, Wuhan, 430014, China
Sheng Yu, Wei Zhu, Shuhong Liu, Zhengwen Gong & Haoran Chen

Authors

Sheng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shuhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengwen Gong
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Zhu .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, S., Zhu, W., Liu, S., Gong, Z., Chen, H. (2024). MRRC: Multi-agent Reinforcement Learning with Rectification Capability in Cooperative Tasks. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_16

Download citation

DOI: https://doi.org/10.1007/978-981-99-8082-6_16
Published: 15 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8081-9
Online ISBN: 978-981-99-8082-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MRRC: Multi-agent Reinforcement Learning with Rectification Capability in Cooperative Tasks