Skip to main content

Optimistic Exploration Based on Categorical-DQN for Cooperative Markov Games

  • Conference paper
  • First Online:
Distributed Artificial Intelligence (DAI 2022)

Abstract

In multiagent reinforcement learning (MARL), independent cooperative learners face numerous challenges when learning the optimal joint policy, such as non-stationarity, stochasticity, and relative over-generalization problems. To achieve multiagent coordination and collaboration, a number of works designed heuristic experience replay mechanisms based on the ‘optimistic’ principle. However, it is difficult to evaluate the quality of an experience effectively, different treatments of experience may lead to overfitting and be prone to converge to sub-optimal policies. In this paper, we propose a new method named optimistic exploration categorical DQN (OE-CDQN) to apply the ‘optimistic’ principle to the action exploration process rather than in the network training process, to bias the probability of choosing an action with the frequency of receiving the maximum reward for that action. OE-CDQN is a combination of the ‘optimistic’ principle and CDQN, using an ‘optimistic’ re-weight function on the distributional value output of the CDQN network. The effectiveness of OE-CDQN is experimentally demonstrated on two well-designed games, i.e., the CMOTP game and a cooperative version of the boat problem which confronts ILs with all the pathologies mentioned above. Experimental results show that OE-CDQN outperforms state-of-the-art independent cooperative methods in terms of both learned return and algorithm robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Azar, M.G., Munos, R., Kappen, B.: On the sample complexity of reinforcement learning with a generative model. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012 (2012)

    Google Scholar 

  2. Barth-Maron, G., et al.: Distributed distributional deterministic policy gradients. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)

    Google Scholar 

  3. Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017 (2017)

    Google Scholar 

  4. Cao, Y., Yu, W., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Industr. Inform. 9(1), 427–438 (2013)

    Article  Google Scholar 

  5. Chu, T., Wang, J., Codecà, L., Li, Z.: Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 21, 1086–1095 (2019)

    Article  Google Scholar 

  6. Dabney, W., Ostrovski, G., Silver, D., Munos, R.: Implicit quantile networks for distributional reinforcement learning. arXiv preprint arXiv:1806.06923 (2018)

  7. Dabney, W., Rowland, M., Bellemare, M.G., Munos, R.: Distributional reinforcement learning with quantile regression. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI 2018) (2018)

    Google Scholar 

  8. Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  9. Hao, X., Wang, W., Hao, J., Yang, Y.: Independent generative adversarial self-imitation learning in cooperative multiagent systems. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2019 (2019)

    Google Scholar 

  10. Lattimore, T., Hutter, M.: PAC bounds for discounted MDPs. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS (LNAI), vol. 7568, pp. 320–334. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34106-9_26

    Chapter  Google Scholar 

  11. Lauer, M., Riedmiller, M.A.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the 17h International Conference on Machine Learning, pp. 535–542 (2000)

    Google Scholar 

  12. Lowe, R., WU, Y., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30, pp. 6379–6390. Curran Associates, Inc. (2017)

    Google Scholar 

  13. Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Hysteretic Q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IEEE/RSJ International Conference on Intelligent Robots and Systems IROS, pp. 64–69. IEEE (2007)

    Google Scholar 

  14. Matignon, L., Laurent, G.J., Le Fort Piat, N.: Review: independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)

    Google Scholar 

  15. Mavrin, B., Yao, H., Kong, L., Wu, K., Yu, Y.: Distributional reinforcement learning for efficient exploration. In: International Conference on Machine Learning, pp. 4424–4434 (2019)

    Google Scholar 

  16. Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  17. Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2681–2690. JMLR. org (2017)

    Google Scholar 

  18. Palmer, G., Savani, R., Tuyls, K.: Negative update intervals in deep multi-agent reinforcement learning. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (2019)

    Google Scholar 

  19. Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (2018)

    Google Scholar 

  20. Panait, L., Sullivan, K., Luke, S.: Lenient learners in cooperative multiagent systems. In: Proceedings of the 5th International Joint conference on Autonomous Agents and Multiagent Systems (2006)

    Google Scholar 

  21. Rashid, T., Samvelyan, M., Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4292–4301 (2018)

    Google Scholar 

  22. Son, K., Kim, D., Kang, W.J., Hostallero, D., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019 (2019)

    Google Scholar 

  23. Zhang, C., Jin, S., Xue, W., Xie, X., Chen, S., Chen, R.: Independent reinforcement learning for weakly cooperative multiagent traffic control problem. IEEE Trans. Veh. Technol. 70(8), 7426–7436 (2021)

    Article  Google Scholar 

  24. Zhang, C., et al.: Neighborhood cooperative multiagent reinforcement learning for adaptive traffic signal control in epidemic regions. IEEE Trans. Intell. Transp. Syst. 1–12 (2022)

    Google Scholar 

Download references

Acknowledgment

The work is supported by the National Natural Science Foundation of China (Grant Nos.:61906027).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chengwei Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tian, Y. et al. (2023). Optimistic Exploration Based on Categorical-DQN for Cooperative Markov Games. In: Yokoo, M., Qiao, H., Vorobeychik, Y., Hao, J. (eds) Distributed Artificial Intelligence. DAI 2022. Lecture Notes in Computer Science(), vol 13824. Springer, Cham. https://doi.org/10.1007/978-3-031-25549-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25549-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25548-9

  • Online ISBN: 978-3-031-25549-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics