Developing a 2048 Player with Backward Temporal Coherence Learning and Restart

Matsuzaki, Kiminori

doi:10.1007/978-3-319-71649-7_15

Developing a 2048 Player with Backward Temporal Coherence Learning and Restart

Kiminori Matsuzaki¹⁶

Conference paper
First Online: 22 December 2017

1060 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10664))

Abstract

The puzzle game 2048 is a single-player stochastic game played on a \(4\times 4\) grid. It is very popular among similar slide-and-merge games. After the appearance of the game, several researchers developed computer players for 2048 based on reinforcement learning methods with N-tuple networks. The state-of-the-art player developed by Jaśkowski is based on several techniques as the title of the paper implies. In this paper, we show that backward learning is very useful for 2048, since the game has quite a long sequence of moves in a single play. We also show a restart strategy to improve the learning by focusing on the later stage of the game. The learned player achieved better average scores than the existing players with the same set of N-tuple networks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
2048 is a derivative of the games Threes and 1024.
2.
Reports can be found at https://icga.leidenuniv.nl/wp-content/uploads/2015/04/2048-bot-tournament-report-1104.pdf and at http://www.cs.put.poznan.pl/wjaskowski/pub/2015-GECCO-2048-Competition/GECCO-2015-2048-Competition-Results.pdf.
3.
The author could not judge the algorithms used in [13, 15].
4.
In the implementation, \(V_i[s]\) is represented by a 32bit fix-point number (with 10bit below the point) and \(E_i[s]\) and \(A_i[s]\) by a 32bit floating-point number.
5.
The number of games played differed by players. With \(10^{10}\) actions, \(1.3 \times 10^6\) games are played by the best player while \(2.2 \times 10^6\) games by the worst player.
6.
We can improve the learning time with the delayed learning technique in [3].
7.
We also tested the multi-staging strategy proposed by Jaśkowski [3] but the simple one achieved better scores.

References

Beal, D.F., Smith, M.C.: Temporal coherence and prediction decay in TD learning. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, vol. 1, pp. 564–569 (1999)
Google Scholar
Cirulli, G.: 2048 (2014). http://gabrielecirulli.github.io/2048/
Jaśkowski, W.: Mastering 2048 with delayed temporal coherence learning, multi-stage weight promotion, redundant encoding and carousel shaping. In: IEEE Transactions on Computational Intelligence and AI in Games (2017, accepted for publication)
Google Scholar
Matsuzaki, K.: Systematic selection of N-tuple networks with consideration of interinfluence for game 2048. In: Technologies and Applications of Artificial Intelligence (TAAI 2016), pp. 186–193 (2016)
Google Scholar
Oka, K., Matsuzaki, K.: Systematic selection of N-tuple networks for 2048. In: Plaat, A., Kosters, W., van den Herik, J. (eds.) CG 2016. LNCS, vol. 10068, pp. 81–92. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50935-8_8
Chapter Google Scholar
van der Ree, M., Wiering, M.: Reinforcement learning in the game of Othello: learning against a fixed opponent and learning from self-play. In: IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), pp. 108–115 (2013)
Google Scholar
Rodgers, P., Levine, J.: An investigation into 2048 AI strategies. In: 2014 IEEE Conference on Computational Intelligence and Games, pp. 1–2 (2014)
Google Scholar
Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 44, 206–227 (1959)
Article MathSciNet Google Scholar
Schraudolph, N.N., Dayan, P., Sejnowski, T.J.: Learning to evalutate Go positions via temporal difference methods. In: Baba, N., Jain, L.C. (eds.) Computational Intelligence in Games. Studies in Fuzziness and Soft Computing, pp. 77–98. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-7908-1833-8_4
Chapter Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)
Google Scholar
Szubert, M., Jaśkowski, W.: Temporal difference learning of N-tuple networks for the game 2048. In: 2014 IEEE Conference on Computational Intelligence and Games, pp. 1–8. IEEE (2014)
Google Scholar
Tesauro, G.: TD-gammon, a self-teaching Backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994)
Article Google Scholar
Wu, I.C., Yeh, K.H., Liang, C.C., Chiang, H.: Multi-stage temporal difference learning for 2048. In: Cheng, S.M., Day, M.Y. (eds.) Technologies and Applications of Artificial Intelligence. LNCS, pp. 366–378. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13987-6_34
Chapter Google Scholar
Xiao, R., Vermaelen, W., Mora\(\acute{\rm v}\)ek, P.: AI for the 2048 game (2015). https://github.com/nneonneo/2048-ai
Yeh, K., Wu, I., Hsueh, C., Chang, C., Liang, C., Chiang, H.: Multi-stage temporal difference learning for 2048-like games. In: IEEE Transactions on Computational Intelligence and AI in Games (2016, accepted for publication)
Google Scholar
Zaky, A.: Minimax and expectimax algorithm to solve 2048 (2014). http://informatika.stei.itb.ac.id/rinaldi.munir/Stmik/2013-2014-genap/Makalah2014/MakalahIF2211-2014-037.pdf

Download references

Acknowledgments

Most of the experiments in this paper were conducted on the IACP cluster of the Kochi University of Technology.

Author information

Authors and Affiliations

Kochi University of Technology, Kami, 782-8502, Japan
Kiminori Matsuzaki

Authors

Kiminori Matsuzaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kiminori Matsuzaki .

Editor information

Editors and Affiliations

Department of Data Science and Knowledge Engineering, Maastricht University, Maastricht, Limburg, The Netherlands
Mark H.M. Winands
Leiden Centre of Data Science, Leiden University, Leiden, Zuid-Holland, The Netherlands
H. Jaap van den Herik
Leiden Institute of Advanced Computer Science, Leiden University, Leiden, Zuid-Holland, The Netherlands
Walter A. Kosters

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matsuzaki, K. (2017). Developing a 2048 Player with Backward Temporal Coherence Learning and Restart. In: Winands, M., van den Herik, H., Kosters, W. (eds) Advances in Computer Games. ACG 2017. Lecture Notes in Computer Science(), vol 10664. Springer, Cham. https://doi.org/10.1007/978-3-319-71649-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-71649-7_15
Published: 22 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71648-0
Online ISBN: 978-3-319-71649-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics