Abstract
Online Mirror Descent (OMD) is a kind of regret minimization algorithms for Online Convex Optimization (OCO). Recently, they are applied to solve Extensive-Form Games (EFGs) for approximating Nash equilibrium. Especially, optimistic variants of OMD are developed, which have a better theoretical convergence rate compared to common regret minimization algorithms, e.g., Counterfactual Regret Minimization (CFR), for EFGs. However, despite the theoretical advantage, existing OMD and their optimistic variants have been shown to converge to a Nash equilibrium slower than the state-of-the-art (SOTA) CFR variants in practice. The reason for the inferior performance may be that they usually use constant regularizers whose parameters have to be chosen at the beginning. Inspired by the adaptive nature of CFRs, in this paper, an adaptive method is presented to speed up the optimistic variants of OMD. Based on this method, Adaptive Optimistic OMD (Ada-OOMD) for EFGs is proposed. In this algorithm, the regularizers can adapt to real-time regrets, thus the algorithm may converge faster in practice. Experimental results show that Ada-OOMD is at least two orders of magnitude faster than existing optimistic OMD algorithms. In some extensive-form games, such as Kuhn poker and Goofspiel, the convergence speed of Ada-OOMD even exceeds the SOTA CFRs. https://github.com/github-jhc/ada-oomd
The work is supported by the National Natural Science Foundation of China under Grants No. U19B2044 and No. 61836011.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Nash, J.: Non-cooperative games. Ann. Math., 286–295 (1951)
Daskalakis, C., Goldberg, P.W., Papadimitriou, C.H.: The complexity of computing a nash equilibrium. SIAM J. Comput. 39(1), 195–259 (2009)
Tammelin, O., Burch, N., Johanson, M., Bowling, M.: Solving heads-up limit Texas Hold’em. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Bowling, M., Burch, N., Johanson, M., Tammelin, O.: Heads-up limit Hold’em poker is solved. Science 347(6218), 145–149 (2015)
Brown, N., Sandholm, T.: Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374), 418–424 (2018)
Gilpin, A., Sandholm, T.: Lossless abstraction of imperfect information games. J. ACM (JACM) 54(5), 25-es (2007)
Brown, N., Ganzfried, S., Sandholm, T.: Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit Texas Hold’em agent. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Ganzfried, S., Sandholm, T.: Endgame solving in large imperfect-information games. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Moravcik, M., Schmid, M., Ha, K., Hladik, M., Gaukrodger, S.: Refining subgames in large imperfect information games. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Brown, N., Sandholm, T.: Superhuman AI for multiplayer poker. Science 365(6456), 885–890 (2019)
Heinrich, J., Silver, D.: Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121 (2016)
Zinkevich, M., Johanson, M., Bowling, M., Piccione, C.: Regret minimization in games with incomplete information. In: Advances in Neural Information Processing Systems, vol. 20 (2007)
Brown, N., Sandholm, T.: Solving imperfect-information games via discounted regret minimization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1829–1836 (2019)
Liu, W., Li, B., Togelius, J.: Model-free neural counterfactual regret minimization with bootstrap learning. IEEE Trans. Games, 1 (2022)
Farina, G., Kroer, C., Sandholm, T.: Faster game solving via predictive blackwell approachability: connecting regret matching and mirror descent. arXiv preprint arXiv:2007.14358 (2020)
Abernethy, J.D., Hazan, E., Rakhlin, A.: Competing in the dark: an efficient algorithm for bandit linear optimization (2009)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Hazan, E., et al.: Introduction to online convex optimization. Found. Trends® Opt. 2(3–4), 157–325 (2016)
Syrgkanis, V., Agarwal, A., Luo, H., Schapire, R.E.: Fast convergence of regularized learning in games. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Farina, G., Kroer, C., Sandholm, T.: Optimistic regret minimization for extensive-form games via dilated distance-generating functions. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Lee, C.-W., Kroer, C., Luo, H.: Last-iterate convergence in extensive-form games. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Kroer, C., Waugh, K., Kılınç-Karzan, F., Sandholm, T.: Faster algorithms for extensive-form game solving via improved smoothing functions. Math. Program. 179(1), 385–417 (2020)
Koller, D., Megiddo, N., Von Stengel, B.: Efficient computation of equilibria for extensive two-person games. Games Econom. Behav. 14(2), 247–259 (1996)
Liu, W., Jiang, H., Li, B., Li, H.: Equivalence analysis between counterfactual regret minimization and online mirror descent. arXiv preprint arXiv:2110.04961 (2021)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 928–936 (2003)
Joulani, P., György, A., Szepesvári, C.: A modular analysis of adaptive (non-) convex optimization: optimism, composite objectives, variance reduction, and variational bounds. Theoret. Comput. Sci. 808, 108–138 (2020)
Brown, N., Sandholm, T.: Strategy-based warm starting for regret minimization in games. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Orabona, F.: A modern introduction to online learning. arXiv preprint arXiv:1912.13213 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, H., Liu, W., Li, B. (2022). Faster Optimistic Online Mirror Descent for Extensive-Form Games. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13629. Springer, Cham. https://doi.org/10.1007/978-3-031-20862-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-20862-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20861-4
Online ISBN: 978-3-031-20862-1
eBook Packages: Computer ScienceComputer Science (R0)