PRACM: Predictive Rewards for Actor-Critic with Mixing Function in Multi-Agent Reinforcement Learning

Yu, Sheng; Liu, Bo; Zhu, Wei; Liu, Shuhong

doi:10.1007/978-3-031-40292-0_7

Sheng Yu¹³,
Bo Liu¹³,
Wei Zhu¹³ &
…
Shuhong Liu¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14120))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

764 Accesses

Abstract

Inspired by the centralised training with decentralised execution (CTDE) paradigm, the field of multi-agent reinforcement learning (MARL) has made significant progress in tackling cooperative problems with discrete action spaces. Nevertheless, many existing algorithms suffer from significant performance degradation when faced with large numbers of agents or more challenging tasks. Furthermore, some specific scenarios, such as cooperative environments with penalties, pose significant challenges to these algorithms , which often lack sufficient cooperative behavior to converge successfully. A new approach, called PRACM, based on the Actor-Critic framework is proposed in this study to address these issues. PRACM employs a monotonic mixing function to generate a global action value function, \(Q_{tot}\), which is used to compute the loss function for updating the critic network. To handle the discrete action space, PRACM uses Gumbel-Softmax. And to promote cooperation among agents and to adapt to cooperative environments with penalties, the predictive rewards is introduced. PRACM was evaluated against several baseline algorithms in “Cooperative Predator-Prey” and the challenging “SMAC” scenarios. The results of this study illustrate that PRACM scales well as the number of agents and task difficulty increase, and performs better in cooperative tasks with penalties, demonstrating its usefulness in promoting collaboration among agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Böhmer, W., Kurin, V., Whiteson, S.: Deep coordination graphs. In: International Conference on Machine Learning. pp. 980–991. PMLR (2020)
Google Scholar
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Hao, J., et al.: Exploration in deep reinforcement learning: from single-agent to multiagent domain. IEEE Trans. Neural Netw. Learn. Syst. (2023)
Google Scholar
He, W., Chen, T.: Scalable online disease diagnosis via multi-model-fused actor-critic reinforcement learning. arXiv preprint arXiv:2206.03659 (2022)
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
Kim, D., et al.: Learning to schedule communication in multi-agent reinforcement learning. arXiv preprint arXiv:1902.01554 (2019)
Kraemer, L., Banerjee, B.: Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190, 82–94 (2016)
Article Google Scholar
Li, J., Wu, F., Shi, H., Hwang, K.S.: A collaboration of multi-agent model using an interactive interface. Inf. Sci. 611, 349–363 (2022)
Article Google Scholar
Li, W., Liu, W., Shao, S., Huang, S., Song, A.: Attention-based intrinsic reward mixing network for credit assignment in multi-agent reinforcement learning. IEEE Trans. Games (2023)
Google Scholar
Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Inf. Process. Syst. 30 (2017)
Google Scholar
Ma, Z., Wang, R., Li, F.F., Bernstein, M., Krishna, R.: Elign: Expectation alignment as a multi-agent intrinsic reward. Adv. Neural Inf. Process. Syst. 35, 8304–8317 (2022)
Google Scholar
Ndousse, K.K., Eck, D., Levine, S., Jaques, N.: Emergent social learning via multi-agent reinforcement learning. In: International Conference on Machine Learning. pp. 7991–8004. PMLR (2021)
Google Scholar
Omidshafiei, S., et al.: Learning to teach in cooperative multiagent reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 6128–6136 (2019)
Google Scholar
Peng, B., et al.: Facmac: Factored multi-agent centralised policy gradients. Adv. Neural Inf. Process. Syst. 34, 12208–12221 (2021)
Google Scholar
Pina, R., De Silva, V., Hook, J., Kondoz, A.: Residual q-networks for value function factorizing in multiagent reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. (2022)
Google Scholar
Prajapat, M., Turchetta, M., Zeilinger, M., Krause, A.: Near-optimal multi-agent learning for safe coverage control. Adv. Neural Inf. Process. Syst. 35, 14998–15012 (2022)
Google Scholar
Puente-Castro, A., Rivero, D., Pazos, A., Fernandez-Blanco, E.: A review of artificial intelligence applied to path planning in UAV swarms. Neural Comput. Appl. 34(1), 153–170 (2021). https://doi.org/10.1007/s00521-021-06569-4
Article Google Scholar
Rashid, T., Samvelyan, M., De Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 21(1), 7234–7284 (2020)
MathSciNet MATH Google Scholar
Ryu, H., Shin, H., Park, J.: Multi-agent actor-critic with hierarchical graph attention network. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 7236–7243 (2020)
Google Scholar
Sahoo, S., Lo, C.Y.: Smart manufacturing powered by recent technological advancements: a review. J. Manuf. Syst. 64, 236–250 (2022)
Article Google Scholar
Samvelyan, M., et al.: The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043 (2019)
Sharma, R., Shishodia, A., Gunasekaran, A., Min, H., Munim, Z.H.: The role of artificial intelligence in supply chain management: mapping the territory. Int. J. Prod. Res. 60(24), 7527–7550 (2022)
Article Google Scholar
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International conference on machine learning. pp. 5887–5896. PMLR (2019)
Google Scholar
Wang, J., Ren, Z., Liu, T., Yu, Y., Zhang, C.: QPLEX: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020)
Wang, J., Li, Y., Gao, R.X., Zhang, F.: Hybrid physics-based and data-driven models for smart manufacturing: Modelling, simulation, and explainability. J. Manuf. Syst. 63, 381–391 (2022)
Article Google Scholar
Wang, Y., Han, B., Wang, T., Dong, H., Zhang, C.: Off-policy multi-agent decomposed policy gradients. arXiv preprint arXiv:2007.12322 (2020)
Wang, Z., Li, M., Lu, J., Cheng, X.: Business innovation based on artificial intelligence and blockchain technology. Inf. Process. Manag. 59(1), 102759 (2022)
Article Google Scholar
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International conference on machine learning. pp. 1995–2003. PMLR (2016)
Google Scholar
Wen, M., et al.: Multi-agent reinforcement learning is a sequence modeling problem. Adv. Neural Inf. Process. Syst. 35, 16509–16521 (2022)
Google Scholar
Yang, J., Ni, J., Li, Y., Wen, J., Chen, D.: The intelligent path planning system of agricultural robot via reinforcement learning. Sensors 22(12), 4316 (2022)
Article Google Scholar
Yang, T., et al.: Exploration in deep reinforcement learning: a comprehensive survey. arXiv preprint arXiv:2109.06668 (2021)
Ye, Z., Chen, Y., Jiang, X., Song, G., Yang, B., Fan, S.: Improving sample efficiency in Multi-Agent Actor-Critic methods. Appl. Intell. 1–14 (2021). https://doi.org/10.1007/s10489-021-02554-5
Zhang, T., Li, Y., Wang, C., Xie, G., Lu, Z.: Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In: International Conference on Machine Learning. pp. 12491–12500. PMLR (2021)
Google Scholar
Zhang, T., Liu, Z., Wu, S., Pu, Z., Yi, J.: Intrinsic reward with peer incentives for cooperative multi-agent reinforcement learning. In: 2022 International Joint Conference on Neural Networks (IJCNN). pp. 1–7. IEEE (2022)
Google Scholar
Zhang, X., Liu, Y., Xu, X., Huang, Q., Mao, H., Carie, A.: Structural relational inference actor-critic for multi-agent reinforcement learning. Neurocomputing 459, 383–394 (2021)
Article Google Scholar
Zhou, D., Gayah, V.V.: Scalable multi-region perimeter metering control for urban networks: a multi-agent deep reinforcement learning approach. Transp. Res. Part C Emerg. Technol. 148, 104033 (2023)
Article Google Scholar
Zhou, M., Liu, Z., Sui, P., Li, Y., Chung, Y.Y.: Learning implicit credit assignment for cooperative multi-agent reinforcement learning. Adv. Neural Inf. Process. Syst. 33, 11853–11864 (2020)
Google Scholar

Download references

Acknowledgements

This work is sponsored by Equipment Advance Research Fund (NO.61406190118).

Author information

Authors and Affiliations

School of Information and Communication, National University of Defense Technology, Wuhan, 430014, China
Sheng Yu, Bo Liu, Wei Zhu & Shuhong Liu

Authors

Sheng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shuhong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Zhu .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhi Jin
South China Normal University, Guangzhou, China
Yuncheng Jiang
Babeș-Bolyai University, Cluj-Napoca, Romania
Robert Andrei Buchmann
Ulster University, Belfast, UK
Yaxin Bi
Babeș-Bolyai University, Cluj-Napoca, Romania
Ana-Maria Ghiran
South China Normal University, Guangzhou, China
Wenjun Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, S., Liu, B., Zhu, W., Liu, S. (2023). PRACM: Predictive Rewards for Actor-Critic with Mixing Function in Multi-Agent Reinforcement Learning. In: Jin, Z., Jiang, Y., Buchmann, R.A., Bi, Y., Ghiran, AM., Ma, W. (eds) Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science(), vol 14120. Springer, Cham. https://doi.org/10.1007/978-3-031-40292-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-40292-0_7
Published: 09 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40291-3
Online ISBN: 978-3-031-40292-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PRACM: Predictive Rewards for Actor-Critic with Mixing Function in Multi-Agent Reinforcement Learning