Abstract
The puzzle game 2048 is a single-player stochastic game played on a \(4\times 4\) grid. It is very popular among similar slide-and-merge games. After the appearance of the game, several researchers developed computer players for 2048 based on reinforcement learning methods with N-tuple networks. The state-of-the-art player developed by Jaśkowski is based on several techniques as the title of the paper implies. In this paper, we show that backward learning is very useful for 2048, since the game has quite a long sequence of moves in a single play. We also show a restart strategy to improve the learning by focusing on the later stage of the game. The learned player achieved better average scores than the existing players with the same set of N-tuple networks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
2048 is a derivative of the games Threes and 1024.
- 2.
- 3.
- 4.
In the implementation, \(V_i[s]\) is represented by a 32bit fix-point number (with 10bit below the point) and \(E_i[s]\) and \(A_i[s]\) by a 32bit floating-point number.
- 5.
The number of games played differed by players. With \(10^{10}\) actions, \(1.3 \times 10^6\) games are played by the best player while \(2.2 \times 10^6\) games by the worst player.
- 6.
We can improve the learning time with the delayed learning technique in [3].
- 7.
We also tested the multi-staging strategy proposed by Jaśkowski [3] but the simple one achieved better scores.
References
Beal, D.F., Smith, M.C.: Temporal coherence and prediction decay in TD learning. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, vol. 1, pp. 564–569 (1999)
Cirulli, G.: 2048 (2014). http://gabrielecirulli.github.io/2048/
Jaśkowski, W.: Mastering 2048 with delayed temporal coherence learning, multi-stage weight promotion, redundant encoding and carousel shaping. In: IEEE Transactions on Computational Intelligence and AI in Games (2017, accepted for publication)
Matsuzaki, K.: Systematic selection of N-tuple networks with consideration of interinfluence for game 2048. In: Technologies and Applications of Artificial Intelligence (TAAI 2016), pp. 186–193 (2016)
Oka, K., Matsuzaki, K.: Systematic selection of N-tuple networks for 2048. In: Plaat, A., Kosters, W., van den Herik, J. (eds.) CG 2016. LNCS, vol. 10068, pp. 81–92. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50935-8_8
van der Ree, M., Wiering, M.: Reinforcement learning in the game of Othello: learning against a fixed opponent and learning from self-play. In: IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), pp. 108–115 (2013)
Rodgers, P., Levine, J.: An investigation into 2048 AI strategies. In: 2014 IEEE Conference on Computational Intelligence and Games, pp. 1–2 (2014)
Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 44, 206–227 (1959)
Schraudolph, N.N., Dayan, P., Sejnowski, T.J.: Learning to evalutate Go positions via temporal difference methods. In: Baba, N., Jain, L.C. (eds.) Computational Intelligence in Games. Studies in Fuzziness and Soft Computing, pp. 77–98. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-7908-1833-8_4
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)
Szubert, M., Jaśkowski, W.: Temporal difference learning of N-tuple networks for the game 2048. In: 2014 IEEE Conference on Computational Intelligence and Games, pp. 1–8. IEEE (2014)
Tesauro, G.: TD-gammon, a self-teaching Backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994)
Wu, I.C., Yeh, K.H., Liang, C.C., Chiang, H.: Multi-stage temporal difference learning for 2048. In: Cheng, S.M., Day, M.Y. (eds.) Technologies and Applications of Artificial Intelligence. LNCS, pp. 366–378. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13987-6_34
Xiao, R., Vermaelen, W., Mora\(\acute{\rm v}\)ek, P.: AI for the 2048 game (2015). https://github.com/nneonneo/2048-ai
Yeh, K., Wu, I., Hsueh, C., Chang, C., Liang, C., Chiang, H.: Multi-stage temporal difference learning for 2048-like games. In: IEEE Transactions on Computational Intelligence and AI in Games (2016, accepted for publication)
Zaky, A.: Minimax and expectimax algorithm to solve 2048 (2014). http://informatika.stei.itb.ac.id/rinaldi.munir/Stmik/2013-2014-genap/Makalah2014/MakalahIF2211-2014-037.pdf
Acknowledgments
Most of the experiments in this paper were conducted on the IACP cluster of the Kochi University of Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Matsuzaki, K. (2017). Developing a 2048 Player with Backward Temporal Coherence Learning and Restart. In: Winands, M., van den Herik, H., Kosters, W. (eds) Advances in Computer Games. ACG 2017. Lecture Notes in Computer Science(), vol 10664. Springer, Cham. https://doi.org/10.1007/978-3-319-71649-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-71649-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71648-0
Online ISBN: 978-3-319-71649-7
eBook Packages: Computer ScienceComputer Science (R0)