Faster Optimistic Online Mirror Descent for Extensive-Form Games

Jiang, Huacong; Liu, Weiming; Li, Bin

doi:10.1007/978-3-031-20862-1_7

Huacong Jiang¹¹,
Weiming Liu¹¹ &
Bin Li¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13629))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1341 Accesses

Abstract

Online Mirror Descent (OMD) is a kind of regret minimization algorithms for Online Convex Optimization (OCO). Recently, they are applied to solve Extensive-Form Games (EFGs) for approximating Nash equilibrium. Especially, optimistic variants of OMD are developed, which have a better theoretical convergence rate compared to common regret minimization algorithms, e.g., Counterfactual Regret Minimization (CFR), for EFGs. However, despite the theoretical advantage, existing OMD and their optimistic variants have been shown to converge to a Nash equilibrium slower than the state-of-the-art (SOTA) CFR variants in practice. The reason for the inferior performance may be that they usually use constant regularizers whose parameters have to be chosen at the beginning. Inspired by the adaptive nature of CFRs, in this paper, an adaptive method is presented to speed up the optimistic variants of OMD. Based on this method, Adaptive Optimistic OMD (Ada-OOMD) for EFGs is proposed. In this algorithm, the regularizers can adapt to real-time regrets, thus the algorithm may converge faster in practice. Experimental results show that Ada-OOMD is at least two orders of magnitude faster than existing optimistic OMD algorithms. In some extensive-form games, such as Kuhn poker and Goofspiel, the convergence speed of Ada-OOMD even exceeds the SOTA CFRs. https://github.com/github-jhc/ada-oomd

The work is supported by the National Natural Science Foundation of China under Grants No. U19B2044 and No. 61836011.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://tinyurl.com/5yj7ndnz.

References

Nash, J.: Non-cooperative games. Ann. Math., 286–295 (1951)
Google Scholar
Daskalakis, C., Goldberg, P.W., Papadimitriou, C.H.: The complexity of computing a nash equilibrium. SIAM J. Comput. 39(1), 195–259 (2009)
Article MathSciNet MATH Google Scholar
Tammelin, O., Burch, N., Johanson, M., Bowling, M.: Solving heads-up limit Texas Hold’em. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Google Scholar
Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit Hold’em poker is solved. Science 347(6218), 145–149 (2015)
Article Google Scholar
Brown, N., Sandholm, T.: Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374), 418–424 (2018)
Article MathSciNet MATH Google Scholar
Gilpin, A., Sandholm, T.: Lossless abstraction of imperfect information games. J. ACM (JACM) 54(5), 25-es (2007)
Article MathSciNet MATH Google Scholar
Brown, N., Ganzfried, S., Sandholm, T.: Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit Texas Hold’em agent. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Ganzfried, S., Sandholm, T.: Endgame solving in large imperfect-information games. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Moravcik, M., Schmid, M., Ha, K., Hladik, M., Gaukrodger, S.: Refining subgames in large imperfect information games. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Google Scholar
Brown, N., Sandholm, T.: Superhuman AI for multiplayer poker. Science 365(6456), 885–890 (2019)
Article MathSciNet MATH Google Scholar
Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121 (2016)
Zinkevich, M., Johanson, M., Bowling, M., Piccione, C.: Regret minimization in games with incomplete information. In: Advances in Neural Information Processing Systems, vol. 20 (2007)
Google Scholar
Brown, N., Sandholm, T.: Solving imperfect-information games via discounted regret minimization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1829–1836 (2019)
Google Scholar
Liu, W., Li, B., Togelius, J.: Model-free neural counterfactual regret minimization with bootstrap learning. IEEE Trans. Games, 1 (2022)
Google Scholar
Farina, G., Kroer, C., Sandholm, T.: Faster game solving via predictive blackwell approachability: connecting regret matching and mirror descent. arXiv preprint arXiv:2007.14358 (2020)
Abernethy, J.D., Hazan, E., Rakhlin, A.: Competing in the dark: an efficient algorithm for bandit linear optimization (2009)
Google Scholar
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Article MathSciNet MATH Google Scholar
Hazan, E., et al.: Introduction to online convex optimization. Found. Trends® Opt. 2(3–4), 157–325 (2016)
Google Scholar
Syrgkanis, V., Agarwal, A., Luo, H., Schapire, R.E.: Fast convergence of regularized learning in games. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Farina, G., Kroer, C., Sandholm, T.: Optimistic regret minimization for extensive-form games via dilated distance-generating functions. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Lee, C.-W., Kroer, C., Luo, H.: Last-iterate convergence in extensive-form games. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Kroer, C., Waugh, K., Kılınç-Karzan, F., Sandholm, T.: Faster algorithms for extensive-form game solving via improved smoothing functions. Math. Program. 179(1), 385–417 (2020)
Article MathSciNet MATH Google Scholar
Koller, D., Megiddo, N., Von Stengel, B.: Efficient computation of equilibria for extensive two-person games. Games Econom. Behav. 14(2), 247–259 (1996)
Article MathSciNet MATH Google Scholar
Liu, W., Jiang, H., Li, B., Li, H.: Equivalence analysis between counterfactual regret minimization and online mirror descent. arXiv preprint arXiv:2110.04961 (2021)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 928–936 (2003)
Google Scholar
Joulani, P., György, A., Szepesvári, C.: A modular analysis of adaptive (non-) convex optimization: optimism, composite objectives, variance reduction, and variational bounds. Theoret. Comput. Sci. 808, 108–138 (2020)
Article MathSciNet MATH Google Scholar
Brown, N., Sandholm, T.: Strategy-based warm starting for regret minimization in games. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Google Scholar
Orabona, F.: A modern introduction to online learning. arXiv preprint arXiv:1912.13213 (2019)

Download references

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Huacong Jiang, Weiming Liu & Bin Li

Authors

Huacong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Weiming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Li .

Editor information

Editors and Affiliations

CSIRO Australian e-Health Research Centre, Brisbane, QLD, Australia
Sankalp Khanna
Shanghai Jiao Tong University, Shanghai, China
Jian Cao
University of Tasmania, Hobart, TAS, Australia
Quan Bai
University of Technology Sydney, Sydney, NSW, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, H., Liu, W., Li, B. (2022). Faster Optimistic Online Mirror Descent for Extensive-Form Games. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13629. Springer, Cham. https://doi.org/10.1007/978-3-031-20862-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-20862-1_7
Published: 04 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20861-4
Online ISBN: 978-3-031-20862-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Faster Optimistic Online Mirror Descent for Extensive-Form Games