Enhanced decision framework for two-player zero-sum Markov games with diverse opponent policies

Zhu, Jin; Wang, Xuan; Geir E., Dullerud

doi:10.1007/s10489-025-06344-1

Enhanced decision framework for two-player zero-sum Markov games with diverse opponent policies

Published: 14 February 2025

Volume 55, article number 449, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jin Zhu¹,
Xuan Wang¹ &
Dullerud Geir E.²

106 Accesses
Explore all metrics

Abstract

This paper takes into account a general two-player zero-sum Markov game scenario in which our agent faces multi-type opponents with multiple policies. To enhance our agent’s return against opponent’s diverse policies, a novel Decision-making Framework based on Opponent Distinguishing and Policy Judgment (DF-ODPJ) is proposed. On the basis of the pre-trained Nash equilibrium strategies, DF-ODPJ can distinguish the opponent’s type by sampling from the interaction trajectory. Then a fast criterion is proposed to judge the opponent’s policy which is proven to minimize the misjudgment probability with optimal threshold calculated. According to the identification results, appropriate policies are generated to enhance the return. The proposed DF-ODPJ is more flexible since it is orthogonal to existing Nash equilibrium algorithms and single-agent reinforcement learning algorithms. The experimental results on grid world, video games, and UAV aerial combat environments illustrate the effectiveness of DF-ODPJ. The code is available at https://github.com/ChenXJ295/DF-ODPJ.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 7

Fig. 8

Modeling opponent learning in multiagent repeated games

Article Open access 23 December 2022

Efficient policy detecting and reusing for non-stationarity in Markov games

Article 26 October 2020

Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game

Article 16 June 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The data that support the findings of this study are available on request from the corresponding author upon reasonable request.

References

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Yu X, Wang Y, Qin J, Chen P (2023) A q-based policy gradient optimization approach for doudizhu. Appl Intell 53:15372–15389
Pan C, Min X, Zhang H, Song G, Min F (2023) Behavior imitation of individual board game players. Appl Intell 53:11571–11585
Article MATH Google Scholar
Zhang Y, Li K, Li K, Liu J (2023) Intelligent prediction method for updraft of uav that is based on lstm network. IEEE Trans Cognit Dev Syst 15(2):464–475
Article MATH Google Scholar
Wang D, Zhao M, Ha M, Qiao J (2022) Stability and admissibility analysis for zero-sum games under general value iteration formulation. IEEE Trans Neural Netw Learn Syst
Zhu J, Wei Y, Kang Y, Jiang X, Dullerud GE (2022) Adaptive deep reinforcement learning for non-stationary environments. Sci China Inf Sci 65(10):1–17
Article MathSciNet MATH Google Scholar
Volpi NC, Polani D (2020) Goal-directed empowerment: combining intrinsic motivation and task-oriented behaviour. IEEE Trans Cognit Dev Syst
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Machine Learning Proceedings 1994. Elsevier, San Francisco, CA, pp 157–163
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, et al (2018) Soft actor-critic algorithms and applications. arXiv:1812.05905
Hu Y, Han C, Li H, Guo T (2023) Modeling opponent learning in multiagent repeated games. Appl Intell 53:17194–17210
Article Google Scholar
Garcia F, Rachelson E (2013) Markov decision processes. Markov Decis Process Artif Intell 1–38
Jin C, Liu Q, Yu T (2022) The power of exploiter: provable multi-agent rl in large state spaces. In: International conference on machine learning. PMLR, pp 10251–10279
Puterman ML (1990) Markov decision processes. Handbooks in operations research and management science 2:331–434
Article MathSciNet MATH Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge, MA
MATH Google Scholar
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8:279–292
Article MATH Google Scholar
Daskalakis C, Goldberg PW, Papadimitriou CH (2009) The complexity of computing a nash equilibrium. Commun ACM 52(2):89–97
Article MATH Google Scholar
Hu J, Wellman MP (2003) Nash q-learning for general-sum stochastic games. J Mach Learn Res 4:1039–1069
MathSciNet MATH Google Scholar
Zhu Y, Zhao D (2020) Online minimax q network learning for two-player zero-sum markov games. IEEE Trans Neural Netw Learn Syst 33(3):1228–1241
Article MathSciNet MATH Google Scholar
Casgrain P, Ning B, Jaimungal S (2022) Deep q-learning for nash equilibria: nash-dqn. Appl Math Finance 29(1):62–78
Article MathSciNet MATH Google Scholar
Alvarado M, Rendón AY (2012) Nash equilibrium for collective strategic reasoning. Expert Syst Appl 39(15):12014–12025
Article MATH Google Scholar
Yee A, Rodríguez R, Alvarado M (2014) Analysis of strategies in american football using nash equilibrium. In: Artificial intelligence: methodology, systems, and applications: 16th international conference, AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. Proceedings 16. Springer, pp 286–294
Rizk Y, Awad M, Tunstel EW (2018) Decision making in multiagent systems: a survey. IEEE Trans Cognit Dev Syst 10(3):514–529
Article MATH Google Scholar
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Ding Z, Su D, Liu Q, Jin C (2022) A deep reinforcement learning approach for finding non-exploitable strategies in two-player atari games. arXiv:2207.08894
He H, Boyd-Graber J, Kwok K, Daumé III H (2016) Opponent modeling in deep reinforcement learning. In: International conference on machine learning. PMLR, pp 1804–1813
Zhao Y, Tian Y, Lee J, Du S (2022) Provably efficient policy optimization for two-player zero-sum markov games. In: International conference on artificial intelligence and statistics. PMLR, pp 2736–2761
Heinrich J, Silver D (2016) Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121
Janner M, Fu J, Zhang M, Levine S (2019) When to trust your model: Model-based policy optimization. Adv Neural Inf Process Syst 32
Yu X, Jiang J, Zhang W, Jiang H, Lu Z (2022) Model-based opponent modeling. Advances in Neural Information Processing Systems 35:28208–28221
MATH Google Scholar
Bernardo JM, Smith AF (2009) Bayesian Theory, vol 405. John Wiley & Sons, Chichester, UK
MATH Google Scholar
Chernoff H (1952) A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 493–507
Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. Adv Neural Inf Process Syst 30
Terry J, Black B, Grammel N, Jayakumar M, Hari A, Sullivan R, Santos LS, Dieffendahl C, Horsch C, Perez-Vicente R et al (2021) Pettingzoo: gym for multi-agent reinforcement learning. Adv Neural Inf Process Syst 34:15032–15043
Google Scholar
Berndt J (2004) Jsbsim: an open source flight dynamics model in c++. In: AIAA modeling and simulation technologies conference and exhibit, pp 4923
Zhao E, Yan R, Li J, Li K, Xing J (2022) Alphaholdem: high-performance artificial intelligence for heads-up no-limit poker via end-to-end reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 36, pp 4689–4697

Download references

Acknowledgements

This work is supported by Anhui Provincial Natural Science Foundation under Grant 2008085MF198.

Author information

Authors and Affiliations

Department of Automation, University of Science and Technology of China, Hefei, 230027, Anhui, P. R. China
Jin Zhu & Xuan Wang
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Dullerud Geir E.

Authors

Jin Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Xuan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Dullerud Geir E.
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Jin Zhu: conceptualization, methodology, investigation and writing; Xuan Wang: data curation, software, validation; Geir E Dullerud: conceptualization, supervision.

Corresponding author

Correspondence to Jin Zhu.

Ethics declarations

Conflicts of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, J., Wang, X. & Geir E., D. Enhanced decision framework for two-player zero-sum Markov games with diverse opponent policies. Appl Intell 55, 449 (2025). https://doi.org/10.1007/s10489-025-06344-1

Download citation

Accepted: 04 February 2025
Published: 14 February 2025
DOI: https://doi.org/10.1007/s10489-025-06344-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced decision framework for two-player zero-sum Markov games with diverse opponent policies

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Modeling opponent learning in multiagent repeated games

Efficient policy detecting and reusing for non-stationarity in Markov games

Research on Action Strategies and Simulations of DRL and MCTS-based Intelligent Round Game

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now