Deep Distributional Temporal Difference Learning for Game Playing

Berglind, Frej; Chen, Jianhua; Sopasakis, Alexandros

doi:10.1007/978-3-030-67148-8_14

Frej Berglind⁶,
Jianhua Chen⁶ &
Alexandros Sopasakis⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 949))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

302 Accesses
1 Citations

Abstract

We compare classic scalar temporal difference learning with three new distributional algorithms for playing the game of 5-in-a-row using deep neural networks: distributional temporal difference learning with constant learning rate, and two distributional temporal difference algorithms with adaptive learning rate. All these algorithms are applicable to any two-player deterministic zero sum game and can probably be successfully generalized to other settings.

All algorithms in our study performed well and developed strong strategies. The algorithms implementing the adaptive methods learned more quickly in the beginning, but in the long run, they were outperformed by the algorithms using constant learning rate which, without any prior knowledge, learned to play the game at a very high level after 200 000 games of self play.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reinforcement Learning for N-player Games: The Importance of Final Adaptation

Deep Reinforcement Learning in Strategic Board Game Environments

Beyond games: a systematic review of neural Monte Carlo tree search applications

Article Open access 28 December 2023

Notes

1.
The arrow ($\leftarrow $) is a pseudo code notation for assigning a new value to the function. In our implementation, the new value is used immediately to create new values for preceding states and the input/output pair is used as training data for the neural network at the end of the training iteration.

References

Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. CoRR abs/1707.06887 (2017). http://arxiv.org/abs/1707.06887
Bellman, R.: The theory of dynamic programming. Bull. Am. Math. Soc. 60(6), 503–516 (1954)
Article MathSciNet Google Scholar
Berglind, F.: Deep distributional temporal difference learning for game playing. Master’s thesis, Lund University (2020)
Google Scholar
Goyal, P., Dollár, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., He, K.: Accurate, large minibatch SGD: training imagenet in 1 h. CoRR abs/1706.02677 (2017). http://arxiv.org/abs/1706.02677
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. CoRR abs/1603.05027 (2016). http://arxiv.org/abs/1603.05027
Irpan, A.: Deep reinforcement learning doesn’t work yet (2018). https://www.alexirpan.com/2018/02/14/rl-hard.html
Ailis, L.V., van den Herik, H.J., Huntjens, M.H.: GoMoku solved by new search techniques. AAAI Technical Report FS-93-02 (1993). https://www.aaai.org/Papers/Symposia/Fall/1993/FS-93-02/FS93-02-001.pdf
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning, NIPS Deep Learning Workshop (2013). arxiv:1312.5602
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Article Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall Press, Upper Saddle River (2009)
MATH Google Scholar
Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3(3), 210–229 (1959). https://doi.org/10.1147/rd.33.0210
Article MathSciNet Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T.P., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484–489 (2016)
Article Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., Hassabis, D.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm (2017)
Google Scholar
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L.R., Lai, M., Bolton, A., Chen, Y., Lillicrap, T.P., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., Hassabis, D.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)
Article Google Scholar
Silver, D., Sutton, R.S., Müller, M.: Temporal-difference search in computer go. Mach. Learn. 87(2), 183–219 (2012). https://doi.org/10.1007/s10994-012-5280-0
Article MathSciNet MATH Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, Cambridge (2018). http://incompleteideas.net/book/the-book-2nd.html
MATH Google Scholar
Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM 38, 58–68 (1995)
Article Google Scholar
Watkins, C.: Learning from delayed rewards. Ph.D. thesis, Cambridge University (1989)
Google Scholar
Yannakakis, G.N., Togelius, J.: Artificial Intelligence and Games. Springer, Cham (2018). http://gameaibook.org
Book Google Scholar

Download references

Author information

Authors and Affiliations

Louisiana State University, Baton Rouge, LA, USA
Frej Berglind & Jianhua Chen
Lund University, Lund, Sweden
Alexandros Sopasakis

Authors

Frej Berglind
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Sopasakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frej Berglind .

Editor information

Editors and Affiliations

Graz University of Technology, Graz, Austria
Martin Stettinger
University of Klagenfurt, Klagenfurt, Austria
Gerhard Leitner
Graz University of Technology, Klagenfurt, Austria
Alexander Felfernig
University of North Carolina, Charlotte, NC, USA
Zbigniew W. Ras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berglind, F., Chen, J., Sopasakis, A. (2021). Deep Distributional Temporal Difference Learning for Game Playing. In: Stettinger, M., Leitner, G., Felfernig, A., Ras, Z.W. (eds) Intelligent Systems in Industrial Applications. ISMIS 2020. Studies in Computational Intelligence, vol 949. Springer, Cham. https://doi.org/10.1007/978-3-030-67148-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-67148-8_14
Published: 04 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67147-1
Online ISBN: 978-3-030-67148-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics