Abstract
In multiagent reinforcement learning (MARL), independent cooperative learners face numerous challenges when learning the optimal joint policy, such as non-stationarity, stochasticity, and relative over-generalization problems. To achieve multiagent coordination and collaboration, a number of works designed heuristic experience replay mechanisms based on the ‘optimistic’ principle. However, it is difficult to evaluate the quality of an experience effectively, different treatments of experience may lead to overfitting and be prone to converge to sub-optimal policies. In this paper, we propose a new method named optimistic exploration categorical DQN (OE-CDQN) to apply the ‘optimistic’ principle to the action exploration process rather than in the network training process, to bias the probability of choosing an action with the frequency of receiving the maximum reward for that action. OE-CDQN is a combination of the ‘optimistic’ principle and CDQN, using an ‘optimistic’ re-weight function on the distributional value output of the CDQN network. The effectiveness of OE-CDQN is experimentally demonstrated on two well-designed games, i.e., the CMOTP game and a cooperative version of the boat problem which confronts ILs with all the pathologies mentioned above. Experimental results show that OE-CDQN outperforms state-of-the-art independent cooperative methods in terms of both learned return and algorithm robustness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Azar, M.G., Munos, R., Kappen, B.: On the sample complexity of reinforcement learning with a generative model. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012 (2012)
Barth-Maron, G., et al.: Distributed distributional deterministic policy gradients. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017 (2017)
Cao, Y., Yu, W., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Industr. Inform. 9(1), 427–438 (2013)
Chu, T., Wang, J., Codecà, L., Li, Z.: Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 21, 1086–1095 (2019)
Dabney, W., Ostrovski, G., Silver, D., Munos, R.: Implicit quantile networks for distributional reinforcement learning. arXiv preprint arXiv:1806.06923 (2018)
Dabney, W., Rowland, M., Bellemare, M.G., Munos, R.: Distributional reinforcement learning with quantile regression. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI 2018) (2018)
Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Hao, X., Wang, W., Hao, J., Yang, Y.: Independent generative adversarial self-imitation learning in cooperative multiagent systems. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2019 (2019)
Lattimore, T., Hutter, M.: PAC bounds for discounted MDPs. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS (LNAI), vol. 7568, pp. 320–334. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34106-9_26
Lauer, M., Riedmiller, M.A.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the 17h International Conference on Machine Learning, pp. 535–542 (2000)
Lowe, R., WU, Y., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30, pp. 6379–6390. Curran Associates, Inc. (2017)
Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Hysteretic Q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IEEE/RSJ International Conference on Intelligent Robots and Systems IROS, pp. 64–69. IEEE (2007)
Matignon, L., Laurent, G.J., Le Fort Piat, N.: Review: independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)
Mavrin, B., Yao, H., Kong, L., Wu, K., Yu, Y.: Distributional reinforcement learning for efficient exploration. In: International Conference on Machine Learning, pp. 4424–4434 (2019)
Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)
Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2681–2690. JMLR. org (2017)
Palmer, G., Savani, R., Tuyls, K.: Negative update intervals in deep multi-agent reinforcement learning. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (2019)
Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (2018)
Panait, L., Sullivan, K., Luke, S.: Lenient learners in cooperative multiagent systems. In: Proceedings of the 5th International Joint conference on Autonomous Agents and Multiagent Systems (2006)
Rashid, T., Samvelyan, M., Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4292–4301 (2018)
Son, K., Kim, D., Kang, W.J., Hostallero, D., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019 (2019)
Zhang, C., Jin, S., Xue, W., Xie, X., Chen, S., Chen, R.: Independent reinforcement learning for weakly cooperative multiagent traffic control problem. IEEE Trans. Veh. Technol. 70(8), 7426–7436 (2021)
Zhang, C., et al.: Neighborhood cooperative multiagent reinforcement learning for adaptive traffic signal control in epidemic regions. IEEE Trans. Intell. Transp. Syst. 1–12 (2022)
Acknowledgment
The work is supported by the National Natural Science Foundation of China (Grant Nos.:61906027).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tian, Y. et al. (2023). Optimistic Exploration Based on Categorical-DQN for Cooperative Markov Games. In: Yokoo, M., Qiao, H., Vorobeychik, Y., Hao, J. (eds) Distributed Artificial Intelligence. DAI 2022. Lecture Notes in Computer Science(), vol 13824. Springer, Cham. https://doi.org/10.1007/978-3-031-25549-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-25549-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25548-9
Online ISBN: 978-3-031-25549-6
eBook Packages: Computer ScienceComputer Science (R0)