Skip to main content
Log in

Enhanced decision framework for two-player zero-sum Markov games with diverse opponent policies

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper takes into account a general two-player zero-sum Markov game scenario in which our agent faces multi-type opponents with multiple policies. To enhance our agent’s return against opponent’s diverse policies, a novel Decision-making Framework based on Opponent Distinguishing and Policy Judgment (DF-ODPJ) is proposed. On the basis of the pre-trained Nash equilibrium strategies, DF-ODPJ can distinguish the opponent’s type by sampling from the interaction trajectory. Then a fast criterion is proposed to judge the opponent’s policy which is proven to minimize the misjudgment probability with optimal threshold calculated. According to the identification results, appropriate policies are generated to enhance the return. The proposed DF-ODPJ is more flexible since it is orthogonal to existing Nash equilibrium algorithms and single-agent reinforcement learning algorithms. The experimental results on grid world, video games, and UAV aerial combat environments illustrate the effectiveness of DF-ODPJ. The code is available at https://github.com/ChenXJ295/DF-ODPJ.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The data that support the findings of this study are available on request from the corresponding author upon reasonable request.

References

  1. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602

  2. Yu X, Wang Y, Qin J, Chen P (2023) A q-based policy gradient optimization approach for doudizhu. Appl Intell 53:15372–15389

  3. Pan C, Min X, Zhang H, Song G, Min F (2023) Behavior imitation of individual board game players. Appl Intell 53:11571–11585

    Article  MATH  Google Scholar 

  4. Zhang Y, Li K, Li K, Liu J (2023) Intelligent prediction method for updraft of uav that is based on lstm network. IEEE Trans Cognit Dev Syst 15(2):464–475

    Article  MATH  Google Scholar 

  5. Wang D, Zhao M, Ha M, Qiao J (2022) Stability and admissibility analysis for zero-sum games under general value iteration formulation. IEEE Trans Neural Netw Learn Syst

  6. Zhu J, Wei Y, Kang Y, Jiang X, Dullerud GE (2022) Adaptive deep reinforcement learning for non-stationary environments. Sci China Inf Sci 65(10):1–17

    Article  MathSciNet  MATH  Google Scholar 

  7. Volpi NC, Polani D (2020) Goal-directed empowerment: combining intrinsic motivation and task-oriented behaviour. IEEE Trans Cognit Dev Syst

  8. Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Machine Learning Proceedings 1994. Elsevier, San Francisco, CA, pp 157–163

  9. Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

  10. Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, et al (2018) Soft actor-critic algorithms and applications. arXiv:1812.05905

  11. Hu Y, Han C, Li H, Guo T (2023) Modeling opponent learning in multiagent repeated games. Appl Intell 53:17194–17210

    Article  Google Scholar 

  12. Garcia F, Rachelson E (2013) Markov decision processes. Markov Decis Process Artif Intell 1–38

  13. Jin C, Liu Q, Yu T (2022) The power of exploiter: provable multi-agent rl in large state spaces. In: International conference on machine learning. PMLR, pp 10251–10279

  14. Puterman ML (1990) Markov decision processes. Handbooks in operations research and management science 2:331–434

    Article  MathSciNet  MATH  Google Scholar 

  15. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge, MA

    MATH  Google Scholar 

  16. Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8:279–292

    Article  MATH  Google Scholar 

  17. Daskalakis C, Goldberg PW, Papadimitriou CH (2009) The complexity of computing a nash equilibrium. Commun ACM 52(2):89–97

    Article  MATH  Google Scholar 

  18. Hu J, Wellman MP (2003) Nash q-learning for general-sum stochastic games. J Mach Learn Res 4:1039–1069

    MathSciNet  MATH  Google Scholar 

  19. Zhu Y, Zhao D (2020) Online minimax q network learning for two-player zero-sum markov games. IEEE Trans Neural Netw Learn Syst 33(3):1228–1241

    Article  MathSciNet  MATH  Google Scholar 

  20. Casgrain P, Ning B, Jaimungal S (2022) Deep q-learning for nash equilibria: nash-dqn. Appl Math Finance 29(1):62–78

    Article  MathSciNet  MATH  Google Scholar 

  21. Alvarado M, Rendón AY (2012) Nash equilibrium for collective strategic reasoning. Expert Syst Appl 39(15):12014–12025

    Article  MATH  Google Scholar 

  22. Yee A, Rodríguez R, Alvarado M (2014) Analysis of strategies in american football using nash equilibrium. In: Artificial intelligence: methodology, systems, and applications: 16th international conference, AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. Proceedings 16. Springer, pp 286–294

  23. Rizk Y, Awad M, Tunstel EW (2018) Decision making in multiagent systems: a survey. IEEE Trans Cognit Dev Syst 10(3):514–529

    Article  MATH  Google Scholar 

  24. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347

  25. Ding Z, Su D, Liu Q, Jin C (2022) A deep reinforcement learning approach for finding non-exploitable strategies in two-player atari games. arXiv:2207.08894

  26. He H, Boyd-Graber J, Kwok K, Daumé III H (2016) Opponent modeling in deep reinforcement learning. In: International conference on machine learning. PMLR, pp 1804–1813

  27. Zhao Y, Tian Y, Lee J, Du S (2022) Provably efficient policy optimization for two-player zero-sum markov games. In: International conference on artificial intelligence and statistics. PMLR, pp 2736–2761

  28. Heinrich J, Silver D (2016) Deep reinforcement learning from self-play in imperfect-information games. arXiv:1603.01121

  29. Janner M, Fu J, Zhang M, Levine S (2019) When to trust your model: Model-based policy optimization. Adv Neural Inf Process Syst 32

  30. Yu X, Jiang J, Zhang W, Jiang H, Lu Z (2022) Model-based opponent modeling. Advances in Neural Information Processing Systems 35:28208–28221

    MATH  Google Scholar 

  31. Bernardo JM, Smith AF (2009) Bayesian Theory, vol 405. John Wiley & Sons, Chichester, UK

    MATH  Google Scholar 

  32. Chernoff H (1952) A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 493–507

  33. Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. Adv Neural Inf Process Syst 30

  34. Terry J, Black B, Grammel N, Jayakumar M, Hari A, Sullivan R, Santos LS, Dieffendahl C, Horsch C, Perez-Vicente R et al (2021) Pettingzoo: gym for multi-agent reinforcement learning. Adv Neural Inf Process Syst 34:15032–15043

    Google Scholar 

  35. Berndt J (2004) Jsbsim: an open source flight dynamics model in c++. In: AIAA modeling and simulation technologies conference and exhibit, pp 4923

  36. Zhao E, Yan R, Li J, Li K, Xing J (2022) Alphaholdem: high-performance artificial intelligence for heads-up no-limit poker via end-to-end reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 36, pp 4689–4697

Download references

Acknowledgements

This work is supported by Anhui Provincial Natural Science Foundation under Grant 2008085MF198.

Author information

Authors and Affiliations

Authors

Contributions

Jin Zhu: conceptualization, methodology, investigation and writing; Xuan Wang: data curation, software, validation; Geir E Dullerud: conceptualization, supervision.

Corresponding author

Correspondence to Jin Zhu.

Ethics declarations

Conflicts of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, J., Wang, X. & Geir E., D. Enhanced decision framework for two-player zero-sum Markov games with diverse opponent policies. Appl Intell 55, 449 (2025). https://doi.org/10.1007/s10489-025-06344-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-025-06344-1

Keywords