Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning

Zhang, Yanqiang; Feng, Dawei; Ding, Bo

doi:10.1007/978-981-99-8082-6_28

Yanqiang Zhang¹²,
Dawei Feng¹² &
Bo Ding¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14448))

Included in the following conference series:

International Conference on Neural Information Processing

526 Accesses

Abstract

Multi-agent reinforcement learning methods have shown significant progress, however, they continue to exhibit exploration problems in complex and challenging environments. To address the above issue, current research has introduced several exploration-enhanced methods for multi-agent reinforcement learning, they are still faced with the issues of inefficient exploration and low performance in challenging tasks that necessitate complex cooperation among agents. This paper proposes the prediction-action Qmix (PQmix) method, an action prediction-based multi-agent intrinsic reward construction approach. The PQmix method employs the joint local observation of agents and the next joint local observation after executing actions to predict the real joint action of agents. The method calculates the action prediction error as the intrinsic reward to measure the novel of the joint state and encourages agents to actively explore the action and state spaces in the environment. We compare PQmix with strong baselines on the MARL benchmark to validate it. The result of experiments demonstrates that PQmix outperforms the state-of-the-art algorithms on the StarCraft Multi-Agent Challenge (SMAC). In the end, the stability of the method is verified by experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: an overview. In: Srinivasan, D., Jain, L.C. (eds.) Innovations in Multi-agent Systems and Applications-1, pp. 183–221. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14435-6_7
Chapter Google Scholar
Dilokthanakul, N., Kaplanis, C., Pawlowski, N., Shanahan, M.: Feature control as intrinsic motivation for hierarchical reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3409–3418 (2019)
Article Google Scholar
Du, Y., Han, L., Fang, M., Liu, J., Dai, T., Tao, D.: LIIR: learning individual intrinsic reward in multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Goyal, A., et al.: Infobot: transfer and exploration via the information bottleneck. arXiv preprint arXiv:1901.10902 (2019)
Graves, A.: Practical variational inference for neural networks. In: Advances in Neural Information Processing Systems, vol. 24 (2011)
Google Scholar
Hao, J., et al.: Exploration in deep reinforcement learning: from single-agent to multiagent domain. IEEE Trans. Neural Netw. Learn. Syst. (2023)
Google Scholar
Hu, J., Jiang, S., Harding, S.A., Wu, H., Liao, S.W.: Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning (2021)
Google Scholar
Jaques, N., et al.: Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: International Conference on Machine Learning, pp. 3040–3049. PMLR (2019)
Google Scholar
Liang, L., Ye, H., Li, G.Y.: Spectrum sharing in vehicular networks based on multi-agent reinforcement learning. IEEE J. Sel. Areas Commun. 37(10), 2282–2292 (2019)
Article Google Scholar
Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Mahajan, A., Rashid, T., Samvelyan, M., Whiteson, S.: Maven: multi-agent variational exploration. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Oliehoek, F.A., Amato, C.: A Concise Introduction to Decentralized POMDPs. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28929-8
Book MATH Google Scholar
Oliehoek, F.A., Spaan, M.T., Vlassis, N.: Optimal and approximate Q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 32, 289–353 (2008)
Article MathSciNet MATH Google Scholar
Oroojlooy, A., Hajinezhad, D.: A review of cooperative multi-agent deep reinforcement learning. Appl. Intell. 1–46 (2022)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Google Scholar
Rashid, T., Farquhar, G., Peng, B., Whiteson, S.: Weighted QMIX: expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 10199–10210 (2020)
Google Scholar
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4295–4304. PMLR (2018)
Google Scholar
Samvelyan, M., et al.: The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043 (2019)
Shalev-Shwartz, S., Shammah, S., Shashua, A.: Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295 (2016)
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR (2019)
Google Scholar
Su, J., Adams, S., Beling, P.: Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11352–11360 (2021)
Google Scholar
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017)
Tang, H., et al.: # exploration: a study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, J., Ren, Z., Liu, T., Yu, Y., Zhang, C.: Qplex: duplex dueling multi-agent Q-learning. arXiv preprint arXiv:2008.01062 (2020)
Wang, T., Gupta, T., Peng, B., Mahajan, A., Whiteson, S., Zhang, C.: Rode: learning roles to decompose multi-agent tasks. In: Proceedings of the International Conference on Learning Representations. OpenReview (2021)
Google Scholar
Wang, T., Wang, J., Wu, Y., Zhang, C.: Influence-based multi-agent exploration. arXiv preprint arXiv:1910.05512 (2019)
Wang, Y., Han, B., Wang, T., Dong, H., Zhang, C.: Off-policy multi-agent decomposed policy gradients. arXiv preprint arXiv:2007.12322 (2020)
Zhou, M., Liu, Z., Sui, P., Li, Y., Chung, Y.Y.: Learning implicit credit assignment for cooperative multi-agent reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 11853–11864 (2020)
Google Scholar

Download references

Acknowledgements

This work is partially supported by the major Science and Technology Innovation 2030 “New Generation Artificial Intelligence” project 2020AAA0104803.

Author information

Authors and Affiliations

National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha, 410073, China
Yanqiang Zhang, Dawei Feng & Bo Ding

Authors

Yanqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Bo Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dawei Feng .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Feng, D., Ding, B. (2024). Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_28

Download citation

DOI: https://doi.org/10.1007/978-981-99-8082-6_28
Published: 15 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8081-9
Online ISBN: 978-981-99-8082-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning