A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Zhang, Li; Chen, Yuxuan; Wang, Wei; Han, Ziliang; Li, Shijian; Pan, Zhijie; Pan, Gang

doi:10.1007/s11704-020-9307-6

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Research Article
Published: 16 July 2021

Volume 15, article number 155334, (2021)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Li Zhang¹,
Yuxuan Chen¹,
Wei Wang¹,
Ziliang Han¹,
Shijian Li¹,
Zhijie Pan¹ &
…
Gang Pan¹

100 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games, e.g., StarCraft and poker. Neural Fictitious Self-Play (NFSP) is an effective algorithm that learns approximate Nash Equilibrium of imperfect-information games from purely self-play without prior domain knowledge. However, it needs to train a neural network in an off-policy manner to approximate the action values. For games with large search spaces, the training may suffer from unnecessary exploration and sometimes fails to converge. In this paper, we propose a new Neural Fictitious Self-Play algorithm that combines Monte Carlo tree search with NFSP, called MC-NFSP, to improve the performance in real-time zero-sum imperfect-information games. With experiments and empirical analysis, we demonstrate that the proposed MC-NFSP algorithm can approximate Nash Equilibrium in games with large-scale search depth while the NFSP can not. Furthermore, we develop an Asynchronous Neural Fictitious Self-Play framework (ANFSP). It uses asynchronous and parallel architecture to collect game experience and improve both the training efficiency and policy quality. The experiments with th e games with hidden state information (Texas Hold’em), and the FPS (firstperson shooter) games demonstrate effectiveness of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural Fictitious Self-Play in Imperfect Information Games with Many Players

Efficient policy detecting and reusing for non-stationarity in Markov games

Article 26 October 2020

Yan Zheng, Jianye Hao, … Changjie Fan

Adaptive Warm-Start MCTS in AlphaZero-Like Deep Reinforcement Learning

References

Arulkumaran K, Cully A, Togelius J. Alphastar: an evolutionary computation perspective. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2019, 314–315
Nash J. Non-cooperative games. Annals of Mathematics, 1951, 54(2): 286–295
Article MathSciNet MATH Google Scholar
Sanholm T. The state of solving large incomplete-information games, and application to poker. AI Magazine, 2010, 31(4): 13–32
Article Google Scholar
Bošanský B, Kiekintveld C, Lisý V, Pĕchouček M. An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information. Journal of Artificial Intelligence Research, 2014, 51: 829–866
Article MathSciNet MATH Google Scholar
Bowling M, Burch N, Johanson M, Tammelin O. Heads-up limit hold’em poker is solved. Science, 2015, 347(6218): 145–149
Article Google Scholar
Browne C B, Powley E, Whitehouse D, Lucas S M, Cowling P I, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S. A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4(1): 1–43
Article Google Scholar
Brown G W. Iterative solution of games by fictitious play. Activity Analysis of Production and Allocation, 1951, 13(1): 374–376
MathSciNet MATH Google Scholar
Heinrich J, Lanctot M, Silver D. Fictitious self-play in extensive-form games. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 805–813
Heinrich J, Silver D. Deep reinforcement learning from self-play in imperfect-information games. 2016, arXiv preprint arXiv:1603.01121
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd ed. London: MIT Press, 1998
MATH Google Scholar
Myerson R B. Game Theory: Analysis of Conflict. 1st ed. London: Harvard University Press, 1991
MATH Google Scholar
Shi L, Li S, Cao L, Yang L, Pan G. TBQ (σ) improving efficiency of trace utilization for off-policy reinforcement learning. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 1025–1032
Yang L, Shi M, Zheng Q, Meng W, Pan G. A unified approach for multi-step temporal-difference learning with eligibility traces in reinforcement learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 2984–2990
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533
Article Google Scholar
Meng W, Zheng Q, Yang L, Li P, Pan G. Qualitative measurements of policy discrepancy for return-based deep q-network. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(10): 4374–4380
Article Google Scholar
Mnih V, Badia A P, Mirza M, Graves A, Lillicrap T P, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 1928–1937
Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. Mastering the game of go with deep neural networks and tree search. Nature, 2016, 529(7587): 484
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, et al. Mastering the game of go without human knowledge. Nature, 2017, 550(7676): 354
Article Google Scholar
Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with back propagation. In: Proceedings of the 30th Annual Conference on Neural Information Processing Systems. 2016, 2244–2252
Peng P, Wen Y, Yang Y, Yuan Q, Tang Z, Long H, Wang J. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games. 2017, arXiv preprint arXiv:1703.10069
Heinrich J, Silver D. Smooth uct search in computer poker. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2015, 554–560
Lisy V, Lanctot M, Bowling M. Online Monte Carlo counterfactual regret minimization for search in imperfect information games. In: Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems. 2015, 27–36
Brown N, Sandholm T. Libratus: the superhuman ai for no-limit poker. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 5226–5228
Moravčík M, Schmid M, Burch N, Lisy V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M. Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science, 2017, 356(6337): 508–513
Article MathSciNet MATH Google Scholar
Leslie D S, Collins E J. Generalised weakened fictitious play. Games and Economic Behavior, 2006, 56(2): 285–298
Article MathSciNet MATH Google Scholar
Hendon E, Jacobsen H J, Sloth B. Fictitious play in extensive form games. Games and Economic Behavior, 1996, 15(2): 177–202
Article MathSciNet MATH Google Scholar
Thrun S, Schwartz A. Issues in using function approximation for reinforcement learning. In: Proceedings of the 4th Connectionist Models Summer School. 1993, 1–7
Anschel O, Baram N, Shimkin N. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 176–185
Foerster J, Nardelli N, Farquhar G, Afouras T, Torr P H, Kohli P, Whiteson S. Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1146–1155
Kocsis L, Szepesvári C. Bandit based Monte-Carlo planning. In: Proceedings of the 17th European Conference on Machine Learning. 2006, 282–293
Audibert J Y, Munos R, Szepesvári C. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 2009, 410(19): 1876–1902
Article MathSciNet MATH Google Scholar
Shah D, Xie Q, Xu Z. On reinforcement learning using Monte Carlo tree search with supervised learning: non-asymptotic analysis. 2019, arXiv preprint arXiv:1902.05213
Lisy V, Kovarik V, Lanctot M, Bosansky B. Convergence of Monte Carlo tree search in simultaneous move games. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 2013, 2112–2120
Auger D. Multiple tree for partially observable Monte-Carlo tree search. In: Proceedings of the 14th European Conference on the Applications of Evolutionary Computation. 2011, 53–62
Ponsen M, De Jong S, Lanctot M. Computing approximate nash equilibria and robust best-responses using sampling. Journal of Artificial Intelligence Research, 2011, 42: 575–605
MathSciNet MATH Google Scholar
Cowling P I, Powley E J, Whitehouse D. Information set Monte Carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4(2): 120–143
Article Google Scholar
Jin P, Keutzer K, Levine S. Regret minimization for partially observable deep reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 2342–2351
Vitter J S. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 1985, 11(1): 37–57
Article MathSciNet MATH Google Scholar
Chaslot G M B, Winands M H, van den Herik H J. Parallel Monte-Carlo tree search. In: Proceedings of the 6th International Conference on Computers and Games. 2008, 60–71
Fang Q, Boas D A. Monte Carlo simulation of photon migration in 3d turbid media accelerated by graphics processing units. Optics Express, 2009, 17(22): 20178–20190
Article Google Scholar
Hill M D, Marty M R. Amdahl’s law in the multicore era. Computer, 2008, 41(7): 33–38
Article Google Scholar
Lanctot M, Waugh K, Zinkevich M, Bowling M. Monte Carlo sampling for regret minimization in extensive games. In: Proceedings of the 23rd Annual Conference on Neural Information Processing Systems. 2009, 1078–1086

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (2017YFB1002503), Science and Technology Innovation 2030 — “New Generation Artificial Intelligence” Major Project (2018AAA0100902), China.

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Li Zhang, Yuxuan Chen, Wei Wang, Ziliang Han, Shijian Li, Zhijie Pan & Gang Pan

Authors

Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuxuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ziliang Han
View author publications
You can also search for this author in PubMed Google Scholar
Shijian Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhijie Pan
View author publications
You can also search for this author in PubMed Google Scholar
Gang Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shijian Li.

Additional information

Li Zhang received the BEng and PhD degrees from Zhejiang University, China in 2007 and 2013, respectively. He is currently an assistant researcher of the Department of Computer Science, Zhejiang University, China. In 2009, he was a visiting scholar at the University of Hong Kong, China. From 2013 to 2017, he was a researcher in Works Applications Co., Ltd. His current interests include deep learning, game theory, human-machine hybrid computing and pervasive computing. He has authored over ten refereed papers and patents.

Yuxuan Chen is a PhD student of computer science in the Pervasive and Cyborg Technology laboratory of Zhejiang University, China. He is doing research on game algorithms in incomplete information games and Nash Equlibrium problems. He did some research on reinforcement learning and implemented some related algorithms. He is also interested in human-computer iteraction. He also contributes on opensource gaming algorithms.

Wei Wang is currently an Algorithm engineer in Alibaba. She received her MS in computer science from Zhejiang University, and BS in computer Science from University of electronic science and technology of China. She did some reinforcement learning research with lecturer during studying in the university. And she works on recommendation system now.

Ziliang Han is currently pursuing a master’s degree at the Pervasive and Cyborg Technology laboratory of Zhejiang University, China. He has a bachelor’s degree in computer science from Jilin University, China. He did some research on reinforcement learning and implemented some related algorithms. He is also building a computational platform for large-scale game algorithm training and evaluation. He also contributes on opensource gaming algorithms.

Shijian Li received the PhD degree from Zhejiang University, China in 2006. In 2010, he was a Visiting Scholar with the Institute Telecom SudParis, France. He is currently with the College of Computer Science and Technology, Zhejiang University, China. He was published over 40 papers. His research interests include sensor networks, ubiquitous computing, and social computing. He serves as an Editor of the International Journal of Distributed Sensor Networks and as Reviewer or PC Member of more than ten conferences.

Zhijie Pan, PhD, Prof. Director of the intelligent Vehicle Research Center of Zhejiang University, China. He is mainly engaged in the cross research fields of computer science and automotive, including AI, autonomous vehicle technology, automatic driving, ITS. He has published more than 90 papers in international conferences and international magazines IEEE, SAE, JSAE, etc., has more than 100 patents.

Gang Pan received the BEng and PhD degrees from Zhejiang University, China in 1998 and 2004, respectively. He is currently a professor of the Department of Computer Science, and deputy director of State Key Lab of CAD&CG, Zhejiang University, China. His current interests include artificial intelligence, pervasive computing, brain-inspired computing, and brain-machine interfaces. He has authored over 100 refereed papers, and 35 patents granted. He received three best paper awards (e.g., ACM UbiComp’16) and three nominations from premier international conferences. He is the recipient of IEEE TCSC Award for Excellence (Middle Career Researcher), CCF-IEEE CS Young Computer Scientist Award, and the State Scientific and Technological Progress Award. He serves as an Associate Editor of IEEE Trans. Neural Networks and Learning Systems, IEEE Systems Journal, Pervasive and Mobile Computing.

Electronic supplementary material

11704_2020_9307_MOESM1_ESM.pdf

A Monte Carlo Neural Fictitious self-play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Chen, Y., Wang, W. et al. A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games. Front. Comput. Sci. 15, 155334 (2021). https://doi.org/10.1007/s11704-020-9307-6

Download citation

Received: 28 August 2019
Accepted: 23 July 2020
Published: 16 July 2021
DOI: https://doi.org/10.1007/s11704-020-9307-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

Abstract

Access this article

Similar content being viewed by others

Neural Fictitious Self-Play in Imperfect Information Games with Many Players

Efficient policy detecting and reusing for non-stationarity in Markov games

Adaptive Warm-Start MCTS in AlphaZero-Like Deep Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

11704_2020_9307_MOESM1_ESM.pdf

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Neural Fictitious Self-Play in Imperfect Information Games with Many Players

Efficient policy detecting and reusing for non-stationarity in Markov games

Adaptive Warm-Start MCTS in AlphaZero-Like Deep Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

11704_2020_9307_MOESM1_ESM.pdf

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation