Skip to main content

Developing a 2048 Player with Backward Temporal Coherence Learning and Restart

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10664))

Abstract

The puzzle game 2048 is a single-player stochastic game played on a \(4\times 4\) grid. It is very popular among similar slide-and-merge games. After the appearance of the game, several researchers developed computer players for 2048 based on reinforcement learning methods with N-tuple networks. The state-of-the-art player developed by Jaśkowski is based on several techniques as the title of the paper implies. In this paper, we show that backward learning is very useful for 2048, since the game has quite a long sequence of moves in a single play. We also show a restart strategy to improve the learning by focusing on the later stage of the game. The learned player achieved better average scores than the existing players with the same set of N-tuple networks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    2048 is a derivative of the games Threes and 1024.

  2. 2.

    Reports can be found at https://icga.leidenuniv.nl/wp-content/uploads/2015/04/2048-bot-tournament-report-1104.pdf and at http://www.cs.put.poznan.pl/wjaskowski/pub/2015-GECCO-2048-Competition/GECCO-2015-2048-Competition-Results.pdf.

  3. 3.

    The author could not judge the algorithms used in [13, 15].

  4. 4.

    In the implementation, \(V_i[s]\) is represented by a 32bit fix-point number (with 10bit below the point) and \(E_i[s]\) and \(A_i[s]\) by a 32bit floating-point number.

  5. 5.

    The number of games played differed by players. With \(10^{10}\) actions, \(1.3 \times 10^6\) games are played by the best player while \(2.2 \times 10^6\) games by the worst player.

  6. 6.

    We can improve the learning time with the delayed learning technique in [3].

  7. 7.

    We also tested the multi-staging strategy proposed by Jaśkowski [3] but the simple one achieved better scores.

References

  1. Beal, D.F., Smith, M.C.: Temporal coherence and prediction decay in TD learning. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, vol. 1, pp. 564–569 (1999)

    Google Scholar 

  2. Cirulli, G.: 2048 (2014). http://gabrielecirulli.github.io/2048/

  3. Jaśkowski, W.: Mastering 2048 with delayed temporal coherence learning, multi-stage weight promotion, redundant encoding and carousel shaping. In: IEEE Transactions on Computational Intelligence and AI in Games (2017, accepted for publication)

    Google Scholar 

  4. Matsuzaki, K.: Systematic selection of N-tuple networks with consideration of interinfluence for game 2048. In: Technologies and Applications of Artificial Intelligence (TAAI 2016), pp. 186–193 (2016)

    Google Scholar 

  5. Oka, K., Matsuzaki, K.: Systematic selection of N-tuple networks for 2048. In: Plaat, A., Kosters, W., van den Herik, J. (eds.) CG 2016. LNCS, vol. 10068, pp. 81–92. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50935-8_8

    Chapter  Google Scholar 

  6. van der Ree, M., Wiering, M.: Reinforcement learning in the game of Othello: learning against a fixed opponent and learning from self-play. In: IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), pp. 108–115 (2013)

    Google Scholar 

  7. Rodgers, P., Levine, J.: An investigation into 2048 AI strategies. In: 2014 IEEE Conference on Computational Intelligence and Games, pp. 1–2 (2014)

    Google Scholar 

  8. Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 44, 206–227 (1959)

    Article  MathSciNet  Google Scholar 

  9. Schraudolph, N.N., Dayan, P., Sejnowski, T.J.: Learning to evalutate Go positions via temporal difference methods. In: Baba, N., Jain, L.C. (eds.) Computational Intelligence in Games. Studies in Fuzziness and Soft Computing, pp. 77–98. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-7908-1833-8_4

    Chapter  Google Scholar 

  10. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)

    Google Scholar 

  11. Szubert, M., Jaśkowski, W.: Temporal difference learning of N-tuple networks for the game 2048. In: 2014 IEEE Conference on Computational Intelligence and Games, pp. 1–8. IEEE (2014)

    Google Scholar 

  12. Tesauro, G.: TD-gammon, a self-teaching Backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994)

    Article  Google Scholar 

  13. Wu, I.C., Yeh, K.H., Liang, C.C., Chiang, H.: Multi-stage temporal difference learning for 2048. In: Cheng, S.M., Day, M.Y. (eds.) Technologies and Applications of Artificial Intelligence. LNCS, pp. 366–378. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13987-6_34

    Chapter  Google Scholar 

  14. Xiao, R., Vermaelen, W., Mora\(\acute{\rm v}\)ek, P.: AI for the 2048 game (2015). https://github.com/nneonneo/2048-ai

  15. Yeh, K., Wu, I., Hsueh, C., Chang, C., Liang, C., Chiang, H.: Multi-stage temporal difference learning for 2048-like games. In: IEEE Transactions on Computational Intelligence and AI in Games (2016, accepted for publication)

    Google Scholar 

  16. Zaky, A.: Minimax and expectimax algorithm to solve 2048 (2014). http://informatika.stei.itb.ac.id/rinaldi.munir/Stmik/2013-2014-genap/Makalah2014/MakalahIF2211-2014-037.pdf

Download references

Acknowledgments

Most of the experiments in this paper were conducted on the IACP cluster of the Kochi University of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kiminori Matsuzaki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Matsuzaki, K. (2017). Developing a 2048 Player with Backward Temporal Coherence Learning and Restart. In: Winands, M., van den Herik, H., Kosters, W. (eds) Advances in Computer Games. ACG 2017. Lecture Notes in Computer Science(), vol 10664. Springer, Cham. https://doi.org/10.1007/978-3-319-71649-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-71649-7_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-71648-0

  • Online ISBN: 978-3-319-71649-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics