Skip to main content

Multi-Stage Temporal Difference Learning for 2048

  • Conference paper
Technologies and Applications of Artificial Intelligence (TAAI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8916))

Abstract

Recently, Szubert and Jaskowski successfully used TD learning together with n-tuple networks for playing the game 2048. In this paper, we first improve their result by modifying the n-tuple networks. However, we observe a phenomenon that the programs based on TD learning still hardly reach large tiles, such as 32768-tiles (the tiles with value 32768). In this paper, we propose a new learning method, named multi-stage TD learning, to effectively improve the performance, especially for maximum scores and the reaching ratio of 32768-tiles. After incorporating shallow expectimax search, our 2048 program can reach 32768-tiles with probability 10.9%, and obtain the maximum score 605752 and the averaged score 328946. To the best of our knowledge, our program outperforms all the known 2048 programs up to date, except for the program developed by the programmers, nicknamed nneonneo and xificurk, which heavily relies on deep search heuristics tuned manually. The program can reach 32768-tiles with probability 32%, but ours runs about 100 times faster. Also interestingly, our new learning method can be easily applied to other 2048-like games, such as Threes. Our program for Threes outperforms all the known Threes programs up to date.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ballard, B.W.: The *-Minimax Search Procedure for Trees Containing Chance Nodes. Artificial Intelligence 21, 327–350 (1983)

    Article  MATH  Google Scholar 

  2. Baxter, J., Tridgell, A., Weaver, L.: Learning to Play Chess Using Temporal Differences. Machine Learning 40(3), 243–263 (2000)

    Article  MATH  Google Scholar 

  3. Beal, D.F., Smith, M.C.: First Results from Using Temporal Difference Learning in Shogi. In: van den Herik, H.J., Iida, H. (eds.) CG 1998. LNCS, vol. 1558, pp. 113–125. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  4. Buro, M.: Experiments with Multi-ProbCut and a New High-Quality Evaluation Function for Othello. Games in AI Research, 77–96 (1997)

    Google Scholar 

  5. Game 1024, http://1024game.org/

  6. Game Threes!, http://asherv.com/threes/

  7. Game 2048, http://gabrielecirulli.github.io/2048/

  8. Knuth, D.E., Moore, R.W.: An analysis of alpha-beta pruning. Artificial Intelligence 6, 293–326 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  9. Melko, E., Nagy, B.: Optimal Strategy in games with chance nodes. Acta Cybernetica 18(2), 171–192 (2007)

    MATH  MathSciNet  Google Scholar 

  10. Nneonneo and xificurk (nicknames), Improved algorithm reaching 32k tile, https://github.com/nneonneo/2048-ai/pull/27

  11. Overlan, M.: 2048 AI, http://ov3y.github.io/2048-AI/

  12. Pearl, J.: The solution for the branching factor of the alpha-beta pruning algorithm and its optimality. Communications of ACM 25(8), 559–564 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  13. Schaeffer, J., Hlynka, M., Jussila, V.: Temporal Difference Learning Applied to a High-Performance Game-Playing Program. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 529–534 (August 2001)

    Google Scholar 

  14. Silver, D.: Reinforcement Learning and Simulation-Based Search in Computer Go, Ph.D. Dissertation, Dept. Comput. Sci., Univ. Alberta, Edmonton, AB, Canada (2009)

    Google Scholar 

  15. StackOverflow.: What is the optimal algorithm for the game, 2048?, http://stackoverflow.com/questions/22342854/what-is-the-optimal-algorithm-for-the-game-2048/22674149#22674149

  16. Sutton, R.S., Barto, A.G.: Temporal-Difference Learning, An Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  17. Szubert, M., Jaskowaski, W.: Temporal Difference Learning of N-tuple Networks for the Game 2048. In: IEEE CIG 2014 Conference (August 2014)

    Google Scholar 

  18. Taiwan 2048-bot, http://2048-botcontest.twbbs.org/

  19. Tesauro, G.: TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation 6, 215–219 (1994)

    Article  Google Scholar 

  20. Trinh, T., Bashi, A., Deshpande, N.: Temporal Difference Learning in Chinese Chess. In: Tasks and Methods in Applied Artificial Intelligence, pp. 612–618 (1998)

    Google Scholar 

  21. Wu, K.C.: 2048-c, https://github.com/kcwu/2048-c/

  22. Wu, I.-C., Tsai, H.-T., Lin, H.-H., Lin, Y.-S., Chang, C.-M., Lin, P.-H.: Temporal Difference Learning for Connect6. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 121–133. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  23. Zobrist, A.L.: A New Hashing Method With Application For Game Playing. Technical Report #88 (April 1970)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wu, IC., Yeh, KH., Liang, CC., Chang, CC., Chiang, H. (2014). Multi-Stage Temporal Difference Learning for 2048. In: Cheng, SM., Day, MY. (eds) Technologies and Applications of Artificial Intelligence. TAAI 2014. Lecture Notes in Computer Science(), vol 8916. Springer, Cham. https://doi.org/10.1007/978-3-319-13987-6_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13987-6_34

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13986-9

  • Online ISBN: 978-3-319-13987-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics