Skip to main content

Reinforcement Learning with Monte Carlo Sampling in Imperfect Information Problems

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10971))

Abstract

Artificial intelligence is an approach that analyzes, studies, optimizes human strategies in challenging domains. Unlike perfect information problems, imperfect information problems usually present more complexity because the accuracy of conditions estimation cannot be effectively guaranteed. Thus, imperfect information problems need much more training data or much longer learning process when using supervised and unsupervised learning systems. This paper presents and evaluates a novel algorithm that based on Monte Carlo sampling as terminal states’ estimation method in reinforce learning systems. The learning system calculates an adjusted result by novel algorithm in each iterations to smooth the fluctuation of imperfect information conditions. In this paper, we apply the new algorithm to build a deep neural network (DNN) learning system in our Texas Holdem poker game program. The contrast poker program has gained third rank in Annual Computer Poker Competition 2017 (ACPC 2017) and system with new approach shows better performance while convergence much faster.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bampton, H.J.: Solving imperfect information games using the Monte Carlo heuristic, Master dissertation. University of Tennessee, USA (1994b)

    Google Scholar 

  2. Brown, N., Sandholm, T.: Safe and nested subgame solving for imperfect-information games (2017)

    Google Scholar 

  3. Billings, D., Papp, D., Schaeffer, J. (eds.): Opponent modeling in poker. In: Association for the Advancement of Artificial Intelligence, pp. 493–499. AAAI Press (1998)

    Google Scholar 

  4. Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)

    Article  Google Scholar 

  5. Silver, D., Schrittwieser, J., Simonyan, K., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354 (2017)

    Article  Google Scholar 

  6. Coulom, R.: Efficient selectivity and backup operators in monte-carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7

    Chapter  Google Scholar 

  7. http://www.computerpokercompetition.org/index.php/competitions/results/125-2017-results

  8. Long, J., Sturtevant, N., Buro, M.: Understanding the success of perfect information monte carlo sampling in game tree search. In: Proceedings of AAAI-10, pp. 134–140 (2010)

    Google Scholar 

  9. Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  10. Zhang, J., Wang, X., Yao, L., Li, L., Shen, X.: Using Kullback-Leibler divergence to model opponents in poker. In: 28th AAAI Conference on Artificial Intelligence (AAAI 2014) Workshop: Computer Poker and Imperfect Information, QuebecCity, Canada, pp. 50–57 (2014)

    Google Scholar 

  11. Van der Kleij, A.A.J.: Monte Carlo Tree Search and Opponent Modeling through Player Clustering in no-limit Texas Hold’em Poker. Master dissertation, University of Groningen, The Netherlands (2010b)

    Google Scholar 

  12. https://tonybet.com/sport

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiajia Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J., Liu, H. (2018). Reinforcement Learning with Monte Carlo Sampling in Imperfect Information Problems. In: Xiao, J., Mao, ZH., Suzumura, T., Zhang, LJ. (eds) Cognitive Computing – ICCC 2018. ICCC 2018. Lecture Notes in Computer Science(), vol 10971. Springer, Cham. https://doi.org/10.1007/978-3-319-94307-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94307-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94306-0

  • Online ISBN: 978-3-319-94307-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics