Abstract
Artificial intelligence is an approach that analyzes, studies, optimizes human strategies in challenging domains. Unlike perfect information problems, imperfect information problems usually present more complexity because the accuracy of conditions estimation cannot be effectively guaranteed. Thus, imperfect information problems need much more training data or much longer learning process when using supervised and unsupervised learning systems. This paper presents and evaluates a novel algorithm that based on Monte Carlo sampling as terminal states’ estimation method in reinforce learning systems. The learning system calculates an adjusted result by novel algorithm in each iterations to smooth the fluctuation of imperfect information conditions. In this paper, we apply the new algorithm to build a deep neural network (DNN) learning system in our Texas Holdem poker game program. The contrast poker program has gained third rank in Annual Computer Poker Competition 2017 (ACPC 2017) and system with new approach shows better performance while convergence much faster.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bampton, H.J.: Solving imperfect information games using the Monte Carlo heuristic, Master dissertation. University of Tennessee, USA (1994b)
Brown, N., Sandholm, T.: Safe and nested subgame solving for imperfect-information games (2017)
Billings, D., Papp, D., Schaeffer, J. (eds.): Opponent modeling in poker. In: Association for the Advancement of Artificial Intelligence, pp. 493–499. AAAI Press (1998)
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Silver, D., Schrittwieser, J., Simonyan, K., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354 (2017)
Coulom, R.: Efficient selectivity and backup operators in monte-carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
http://www.computerpokercompetition.org/index.php/competitions/results/125-2017-results
Long, J., Sturtevant, N., Buro, M.: Understanding the success of perfect information monte carlo sampling in game tree search. In: Proceedings of AAAI-10, pp. 134–140 (2010)
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Zhang, J., Wang, X., Yao, L., Li, L., Shen, X.: Using Kullback-Leibler divergence to model opponents in poker. In: 28th AAAI Conference on Artificial Intelligence (AAAI 2014) Workshop: Computer Poker and Imperfect Information, QuebecCity, Canada, pp. 50–57 (2014)
Van der Kleij, A.A.J.: Monte Carlo Tree Search and Opponent Modeling through Player Clustering in no-limit Texas Hold’em Poker. Master dissertation, University of Groningen, The Netherlands (2010b)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zhang, J., Liu, H. (2018). Reinforcement Learning with Monte Carlo Sampling in Imperfect Information Problems. In: Xiao, J., Mao, ZH., Suzumura, T., Zhang, LJ. (eds) Cognitive Computing – ICCC 2018. ICCC 2018. Lecture Notes in Computer Science(), vol 10971. Springer, Cham. https://doi.org/10.1007/978-3-319-94307-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-94307-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94306-0
Online ISBN: 978-3-319-94307-7
eBook Packages: Computer ScienceComputer Science (R0)