Optimistic Exploration Based on Categorical-DQN for Cooperative Markov Games

Tian, Yu; Zhang, Chengwei; Guo, Qing; Zheng, Kangjie; Fang, Wanqing; Zhao, Xintian; Zhang, Shiqi

doi:10.1007/978-3-031-25549-6_5

Yu Tian¹¹,
Chengwei Zhang ORCID: orcid.org/0000-0002-9157-6050¹²,
Qing Guo¹³,
Kangjie Zheng¹²,
Wanqing Fang¹²,
Xintian Zhao¹² &
…
Shiqi Zhang¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13824))

Included in the following conference series:

International Conference on Distributed Artificial Intelligence

350 Accesses

Abstract

In multiagent reinforcement learning (MARL), independent cooperative learners face numerous challenges when learning the optimal joint policy, such as non-stationarity, stochasticity, and relative over-generalization problems. To achieve multiagent coordination and collaboration, a number of works designed heuristic experience replay mechanisms based on the ‘optimistic’ principle. However, it is difficult to evaluate the quality of an experience effectively, different treatments of experience may lead to overfitting and be prone to converge to sub-optimal policies. In this paper, we propose a new method named optimistic exploration categorical DQN (OE-CDQN) to apply the ‘optimistic’ principle to the action exploration process rather than in the network training process, to bias the probability of choosing an action with the frequency of receiving the maximum reward for that action. OE-CDQN is a combination of the ‘optimistic’ principle and CDQN, using an ‘optimistic’ re-weight function on the distributional value output of the CDQN network. The effectiveness of OE-CDQN is experimentally demonstrated on two well-designed games, i.e., the CMOTP game and a cooperative version of the boat problem which confronts ILs with all the pathologies mentioned above. Experimental results show that OE-CDQN outperforms state-of-the-art independent cooperative methods in terms of both learned return and algorithm robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SCC-rFMQ: a multiagent reinforcement learning method in cooperative Markov games with continuous actions

Article 20 January 2022

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Article 27 March 2020

Fast and slow curiosity for high-level exploration in reinforcement learning

Article Open access 16 September 2020

References

Azar, M.G., Munos, R., Kappen, B.: On the sample complexity of reinforcement learning with a generative model. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012 (2012)
Google Scholar
Barth-Maron, G., et al.: Distributed distributional deterministic policy gradients. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Google Scholar
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017 (2017)
Google Scholar
Cao, Y., Yu, W., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Industr. Inform. 9(1), 427–438 (2013)
Article Google Scholar
Chu, T., Wang, J., Codecà, L., Li, Z.: Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 21, 1086–1095 (2019)
Article Google Scholar
Dabney, W., Ostrovski, G., Silver, D., Munos, R.: Implicit quantile networks for distributional reinforcement learning. arXiv preprint arXiv:1806.06923 (2018)
Dabney, W., Rowland, M., Bellemare, M.G., Munos, R.: Distributional reinforcement learning with quantile regression. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI 2018) (2018)
Google Scholar
Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Hao, X., Wang, W., Hao, J., Yang, Y.: Independent generative adversarial self-imitation learning in cooperative multiagent systems. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2019 (2019)
Google Scholar
Lattimore, T., Hutter, M.: PAC bounds for discounted MDPs. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS (LNAI), vol. 7568, pp. 320–334. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34106-9_26
Chapter Google Scholar
Lauer, M., Riedmiller, M.A.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the 17h International Conference on Machine Learning, pp. 535–542 (2000)
Google Scholar
Lowe, R., WU, Y., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30, pp. 6379–6390. Curran Associates, Inc. (2017)
Google Scholar
Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Hysteretic Q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IEEE/RSJ International Conference on Intelligent Robots and Systems IROS, pp. 64–69. IEEE (2007)
Google Scholar
Matignon, L., Laurent, G.J., Le Fort Piat, N.: Review: independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)
Google Scholar
Mavrin, B., Yao, H., Kong, L., Wu, K., Yu, Y.: Distributional reinforcement learning for efficient exploration. In: International Conference on Machine Learning, pp. 4424–4434 (2019)
Google Scholar
Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)
Article MathSciNet MATH Google Scholar
Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2681–2690. JMLR. org (2017)
Google Scholar
Palmer, G., Savani, R., Tuyls, K.: Negative update intervals in deep multi-agent reinforcement learning. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (2019)
Google Scholar
Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (2018)
Google Scholar
Panait, L., Sullivan, K., Luke, S.: Lenient learners in cooperative multiagent systems. In: Proceedings of the 5th International Joint conference on Autonomous Agents and Multiagent Systems (2006)
Google Scholar
Rashid, T., Samvelyan, M., Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4292–4301 (2018)
Google Scholar
Son, K., Kim, D., Kang, W.J., Hostallero, D., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019 (2019)
Google Scholar
Zhang, C., Jin, S., Xue, W., Xie, X., Chen, S., Chen, R.: Independent reinforcement learning for weakly cooperative multiagent traffic control problem. IEEE Trans. Veh. Technol. 70(8), 7426–7436 (2021)
Article Google Scholar
Zhang, C., et al.: Neighborhood cooperative multiagent reinforcement learning for adaptive traffic signal control in epidemic regions. IEEE Trans. Intell. Transp. Syst. 1–12 (2022)
Google Scholar

Download references

Acknowledgment

The work is supported by the National Natural Science Foundation of China (Grant Nos.:61906027).

Author information

Authors and Affiliations

Heilongjiang University of Science and Technology, Harbin, China
Yu Tian
Dalian Maritime University, Dalian, China
Chengwei Zhang, Kangjie Zheng, Wanqing Fang, Xintian Zhao & Shiqi Zhang
Nanyang Technological University, Singapore, Singapore
Qing Guo

Authors

Yu Tian
View author publications
You can also search for this author in PubMed Google Scholar
Chengwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Guo
View author publications
You can also search for this author in PubMed Google Scholar
Kangjie Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Wanqing Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xintian Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shiqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chengwei Zhang .

Editor information

Editors and Affiliations

Kyushu University, Kyushu, Japan
Makoto Yokoo
University of Science and Technology, Beijing, China
Hong Qiao
Washington University in St. Louis, Louis, MO, USA
Yevgeniy Vorobeychik
Tianjin University, Tianjin, China
Jianye Hao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, Y. et al. (2023). Optimistic Exploration Based on Categorical-DQN for Cooperative Markov Games. In: Yokoo, M., Qiao, H., Vorobeychik, Y., Hao, J. (eds) Distributed Artificial Intelligence. DAI 2022. Lecture Notes in Computer Science(), vol 13824. Springer, Cham. https://doi.org/10.1007/978-3-031-25549-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-25549-6_5
Published: 22 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25548-9
Online ISBN: 978-3-031-25549-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimistic Exploration Based on Categorical-DQN for Cooperative Markov Games

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SCC-rFMQ: a multiagent reinforcement learning method in cooperative Markov games with continuous actions

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Fast and slow curiosity for high-level exploration in reinforcement learning

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Optimistic Exploration Based on Categorical-DQN for Cooperative Markov Games

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SCC-rFMQ: a multiagent reinforcement learning method in cooperative Markov games with continuous actions

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Fast and slow curiosity for high-level exploration in reinforcement learning

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation