Reinforcement Learning with Monte Carlo Sampling in Imperfect Information Problems

Zhang, Jiajia; Liu, Hong

doi:10.1007/978-3-319-94307-7_5

Reinforcement Learning with Monte Carlo Sampling in Imperfect Information Problems

Jiajia Zhang¹⁷ &
Hong Liu¹⁷

Conference paper
First Online: 20 June 2018

767 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10971))

Abstract

Artificial intelligence is an approach that analyzes, studies, optimizes human strategies in challenging domains. Unlike perfect information problems, imperfect information problems usually present more complexity because the accuracy of conditions estimation cannot be effectively guaranteed. Thus, imperfect information problems need much more training data or much longer learning process when using supervised and unsupervised learning systems. This paper presents and evaluates a novel algorithm that based on Monte Carlo sampling as terminal states’ estimation method in reinforce learning systems. The learning system calculates an adjusted result by novel algorithm in each iterations to smooth the fluctuation of imperfect information conditions. In this paper, we apply the new algorithm to build a deep neural network (DNN) learning system in our Texas Holdem poker game program. The contrast poker program has gained third rank in Annual Computer Poker Competition 2017 (ACPC 2017) and system with new approach shows better performance while convergence much faster.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bampton, H.J.: Solving imperfect information games using the Monte Carlo heuristic, Master dissertation. University of Tennessee, USA (1994b)
Google Scholar
Brown, N., Sandholm, T.: Safe and nested subgame solving for imperfect-information games (2017)
Google Scholar
Billings, D., Papp, D., Schaeffer, J. (eds.): Opponent modeling in poker. In: Association for the Advancement of Artificial Intelligence, pp. 493–499. AAAI Press (1998)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Article Google Scholar
Silver, D., Schrittwieser, J., Simonyan, K., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354 (2017)
Article Google Scholar
Coulom, R.: Efficient selectivity and backup operators in monte-carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Chapter Google Scholar
http://www.computerpokercompetition.org/index.php/competitions/results/125-2017-results
Long, J., Sturtevant, N., Buro, M.: Understanding the success of perfect information monte carlo sampling in game tree search. In: Proceedings of AAAI-10, pp. 134–140 (2010)
Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Zhang, J., Wang, X., Yao, L., Li, L., Shen, X.: Using Kullback-Leibler divergence to model opponents in poker. In: 28th AAAI Conference on Artificial Intelligence (AAAI 2014) Workshop: Computer Poker and Imperfect Information, QuebecCity, Canada, pp. 50–57 (2014)
Google Scholar
Van der Kleij, A.A.J.: Monte Carlo Tree Search and Opponent Modeling through Player Clustering in no-limit Texas Hold’em Poker. Master dissertation, University of Groningen, The Netherlands (2010b)
Google Scholar
https://tonybet.com/sport

Download references

Author information

Authors and Affiliations

Shenzhen Graduate School, Peking University, Shenzhen, 518055, China
Jiajia Zhang & Hong Liu

Authors

Jiajia Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiajia Zhang .

Editor information

Editors and Affiliations

Big Data Platform Division, Peking, China
Jing Xiao
University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Zhi-Hong Mao
IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA
Toyotaro Suzumura
Kingdee International Software Group Co., Ltd., Shenzhen, China
Liang-Jie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Liu, H. (2018). Reinforcement Learning with Monte Carlo Sampling in Imperfect Information Problems. In: Xiao, J., Mao, ZH., Suzumura, T., Zhang, LJ. (eds) Cognitive Computing – ICCC 2018. ICCC 2018. Lecture Notes in Computer Science(), vol 10971. Springer, Cham. https://doi.org/10.1007/978-3-319-94307-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-94307-7_5
Published: 20 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94306-0
Online ISBN: 978-3-319-94307-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics