Skip to main content

Faster Optimistic Online Mirror Descent for Extensive-Form Games

  • Conference paper
  • First Online:
PRICAI 2022: Trends in Artificial Intelligence (PRICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13629))

Included in the following conference series:

  • 1341 Accesses

Abstract

Online Mirror Descent (OMD) is a kind of regret minimization algorithms for Online Convex Optimization (OCO). Recently, they are applied to solve Extensive-Form Games (EFGs) for approximating Nash equilibrium. Especially, optimistic variants of OMD are developed, which have a better theoretical convergence rate compared to common regret minimization algorithms, e.g., Counterfactual Regret Minimization (CFR), for EFGs. However, despite the theoretical advantage, existing OMD and their optimistic variants have been shown to converge to a Nash equilibrium slower than the state-of-the-art (SOTA) CFR variants in practice. The reason for the inferior performance may be that they usually use constant regularizers whose parameters have to be chosen at the beginning. Inspired by the adaptive nature of CFRs, in this paper, an adaptive method is presented to speed up the optimistic variants of OMD. Based on this method, Adaptive Optimistic OMD (Ada-OOMD) for EFGs is proposed. In this algorithm, the regularizers can adapt to real-time regrets, thus the algorithm may converge faster in practice. Experimental results show that Ada-OOMD is at least two orders of magnitude faster than existing optimistic OMD algorithms. In some extensive-form games, such as Kuhn poker and Goofspiel, the convergence speed of Ada-OOMD even exceeds the SOTA CFRs. https://github.com/github-jhc/ada-oomd

The work is supported by the National Natural Science Foundation of China under Grants No. U19B2044 and No. 61836011.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://tinyurl.com/5yj7ndnz.

References

  1. Nash, J.: Non-cooperative games. Ann. Math., 286–295 (1951)

    Google Scholar 

  2. Daskalakis, C., Goldberg, P.W., Papadimitriou, C.H.: The complexity of computing a nash equilibrium. SIAM J. Comput. 39(1), 195–259 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Tammelin, O., Burch, N., Johanson, M., Bowling, M.: Solving heads-up limit Texas Hold’em. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)

    Google Scholar 

  4. Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit Hold’em poker is solved. Science 347(6218), 145–149 (2015)

    Article  Google Scholar 

  5. Brown, N., Sandholm, T.: Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374), 418–424 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  6. Gilpin, A., Sandholm, T.: Lossless abstraction of imperfect information games. J. ACM (JACM) 54(5), 25-es (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. Brown, N., Ganzfried, S., Sandholm, T.: Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit Texas Hold’em agent. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  8. Ganzfried, S., Sandholm, T.: Endgame solving in large imperfect-information games. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  9. Moravcik, M., Schmid, M., Ha, K., Hladik, M., Gaukrodger, S.: Refining subgames in large imperfect information games. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)

    Google Scholar 

  10. Brown, N., Sandholm, T.: Superhuman AI for multiplayer poker. Science 365(6456), 885–890 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  11. Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121 (2016)

  12. Zinkevich, M., Johanson, M., Bowling, M., Piccione, C.: Regret minimization in games with incomplete information. In: Advances in Neural Information Processing Systems, vol. 20 (2007)

    Google Scholar 

  13. Brown, N., Sandholm, T.: Solving imperfect-information games via discounted regret minimization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1829–1836 (2019)

    Google Scholar 

  14. Liu, W., Li, B., Togelius, J.: Model-free neural counterfactual regret minimization with bootstrap learning. IEEE Trans. Games, 1 (2022)

    Google Scholar 

  15. Farina, G., Kroer, C., Sandholm, T.: Faster game solving via predictive blackwell approachability: connecting regret matching and mirror descent. arXiv preprint arXiv:2007.14358 (2020)

  16. Abernethy, J.D., Hazan, E., Rakhlin, A.: Competing in the dark: an efficient algorithm for bandit linear optimization (2009)

    Google Scholar 

  17. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  18. Hazan, E., et al.: Introduction to online convex optimization. Found. Trends® Opt. 2(3–4), 157–325 (2016)

    Google Scholar 

  19. Syrgkanis, V., Agarwal, A., Luo, H., Schapire, R.E.: Fast convergence of regularized learning in games. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

    Google Scholar 

  20. Farina, G., Kroer, C., Sandholm, T.: Optimistic regret minimization for extensive-form games via dilated distance-generating functions. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  21. Lee, C.-W., Kroer, C., Luo, H.: Last-iterate convergence in extensive-form games. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  22. Kroer, C., Waugh, K., Kılınç-Karzan, F., Sandholm, T.: Faster algorithms for extensive-form game solving via improved smoothing functions. Math. Program. 179(1), 385–417 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  23. Koller, D., Megiddo, N., Von Stengel, B.: Efficient computation of equilibria for extensive two-person games. Games Econom. Behav. 14(2), 247–259 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  24. Liu, W., Jiang, H., Li, B., Li, H.: Equivalence analysis between counterfactual regret minimization and online mirror descent. arXiv preprint arXiv:2110.04961 (2021)

  25. Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 928–936 (2003)

    Google Scholar 

  26. Joulani, P., György, A., Szepesvári, C.: A modular analysis of adaptive (non-) convex optimization: optimism, composite objectives, variance reduction, and variational bounds. Theoret. Comput. Sci. 808, 108–138 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  27. Brown, N., Sandholm, T.: Strategy-based warm starting for regret minimization in games. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)

    Google Scholar 

  28. Orabona, F.: A modern introduction to online learning. arXiv preprint arXiv:1912.13213 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, H., Liu, W., Li, B. (2022). Faster Optimistic Online Mirror Descent for Extensive-Form Games. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13629. Springer, Cham. https://doi.org/10.1007/978-3-031-20862-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20862-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20861-4

  • Online ISBN: 978-3-031-20862-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics