skip to main content
10.1145/2739480.2754783acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris

Published: 11 July 2015 Publication History

Abstract

SZ-Tetris, a restricted version of Tetris, is a difficult reinforcement learning task. Previous research showed that, similarly to the original Tetris, value function-based methods such as temporal difference learning, do not work well for SZ-Tetris. The best performance in this game was achieved by employing direct policy search techniques, in particular the cross-entropy method in combination with handcrafted features. Nonetheless, a simple heuristic hand-coded player scores even higher. Here we show that it is possible to equal its performance with CMA-ES (Covariance Matrix Adaptation Evolution Strategy). We demonstrate that further improvement is possible by employing systematic n-tuple network, a knowledge-free function approximator, and VD-CMA-ES, a linear variant of CMA-ES for high dimension optimization. Last but not least, we show that a large systematic n-tuple network (involving more than 4 million parameters) allows the classical temporal difference learning algorithm to obtain similar average performance to VD-CMA-ES, but at 20 times lower computational expense, leading to the best policy for SZ-Tetris known to date. These results enrich the current understanding of difficulty of SZ-Tetris, and shed new light on the capabilities of particular search paradigms when applied to representations of various characteristics and dimensionality.

References

[1]
Y. Akimoto, A. Auger, N. Hansen, and Others. Comparison-Based Natural Gradient Optimization in High Dimension. In Genetic and Evolutionary Computation Conference GECCO'14, 2014.
[2]
D. P. Bertsekas and S. Ioffe. Temporal differences-based policy iteration and applications in neuro-dynamic programming. Lab. for Info. and Decision Systems Report LIDS-P-2349, MIT, Cambridge, MA, 1996.
[3]
W. W. Bledsoe and I. Browning. Pattern recognition and reading by machine. In Proc. Eastern Joint Comput. Conf., pages 225--232, 1959.
[4]
H. Burgiel. How to lose at Tetris. Mathematical Gazette, 81:194--200, 1997.
[5]
S. Faußer and F. Schwenker. Selective Neural Network Ensembles in Reinforcement Learning. In European Symposium on Artificial Neural Networks, pages 105--110, 2014.
[6]
S. Faußer and F. Schwenker. Neural Network Ensembles in Reinforcement Learning. Neural Processing Letters, 41(1):55--69, 2015.
[7]
N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary computation, 9(2):159--195, 2001.
[8]
W. Ja\'skowski. Systematic n-tuple networks for othello position evaluation. ICGA Journal, 37(2):85--96, June 2014.
[9]
S. Kalyanakrishnan and P. Stone. Characterizing reinforcement learning methods through parameterized learning problems. Machine Learning, 84(1--2):205--247, 2011.
[10]
K. Krawiec, W. Ja\'skowski, and M. Szubert. Evolving small-board go players using coevolutionary temporal difference learning with archive. International Journal of Applied Mathematics and Computer Science, 21(4):717--731, 2011.
[11]
S. Lucas. Learning to play Othello with n-tuple systems. Australian Journal of Intelligent Information Processing, 4:1--20, 2008.
[12]
S. M. Lucas. Learning to play Othello with N-tuple systems. Australian Journal of Intelligent Information Processing Systems, Special Issue on Game Technology, 9(4):01--20, 2007.
[13]
R. Y. Rubinstein and D. P. Kroese. The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Springer Science & Business Media, 2004.
[14]
F. Stulp and O. Sigaud. Path integral policy improvement with covariance matrix adaptation. arXiv preprint arXiv:1206.4621, 2012.
[15]
R. S. Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9--44, 1988.
[16]
I. Szita. Reinforcement learning in games. In Reinforcement Learning, pages 539--577. Springer, 2012.
[17]
I. Szita and C. Szepesvári. SZ-Tetris as a benchmark for studying key problems of reinforcement learning. In Proceedings of the ICML 2010 Workshop on Machine Learning and Games., 2010.
[18]
M. Szubert and W. Ja\'skowski. Temporal difference learning of n-tuple networks for the game 2048. In IEEE Conference on Computational Intelligence and Games, pages 1--8, Dortmund, Aug 2014. IEEE.
[19]
M. Szubert, W. Ja\'skowski, and K. Krawiec. Coevolutionary temporal difference learning for othello. In IEEE Symposium on Computational Intelligence and Games, pages 104--111, Milano, Italy, 2009.
[20]
M. Szubert, W. Ja\'skowski, and K. Krawiec. On scalability, generalization, and hybridization of coevolutionary learning: a case study for othello. IEEE Transactions on Computational Intelligence and AI in Games, 5(3):214--226, 2013.
[21]
M. Szubert, W. Wojciech Ja\'skowski, and K. Krzysztof Krawiec. Learning Board Evaluation Function for Othello by Hybridizing Coevolution with Temporal Difference Learning. Control and Cybernetics, 2011.
[22]
G. Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58--68, 1995.
[23]
C. Thiery and B. Scherrer. Building controllers for Tetris. International Computer Games Association Journal, 32:3--11, 2009.
[24]
C. Thiery and B. Scherrer. Improvements on learning Tetris with cross entropy. International Computer Games Association Journal, 32, 2009.
[25]
M. Thill, P. Koch, and W. Konen. Reinforcement Learning with N-tuples on the Game Connect-4. In C. A. C. Coello, V. Cutello, K. Deb, S. Forrest, G. Nicosia, and M. Pavone, editors, Parallel Problem Solving from Nature - PPSN XII, volume 7491 of Lecture Notes in Computer Science, pages 184--194. Springer, 2012.

Cited By

View all
  • (2024)How fast can we play Tetris greedily with rectangular pieces?Theoretical Computer Science10.1016/j.tcs.2024.114405992:COnline publication date: 21-Apr-2024
  • (2023)AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test TimeIEEE Transactions on Games10.1109/TG.2022.320673315:4(637-647)Online publication date: Dec-2023
  • (2020)Reinforcement Learning for N-player Games: The Importance of Final AdaptationBioinspired Optimization Methods and Their Applications10.1007/978-3-030-63710-1_7(84-96)Online publication date: 16-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '15: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation
July 2015
1496 pages
ISBN:9781450334723
DOI:10.1145/2739480
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cma-es
  2. covariance matrix adaptation
  3. function approximation
  4. knowledge-free representations
  5. n-tuple system
  6. reinforcement learning
  7. vd-cma
  8. video games

Qualifiers

  • Research-article

Funding Sources

Conference

GECCO '15
Sponsor:

Acceptance Rates

GECCO '15 Paper Acceptance Rate 182 of 505 submissions, 36%;
Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)How fast can we play Tetris greedily with rectangular pieces?Theoretical Computer Science10.1016/j.tcs.2024.114405992:COnline publication date: 21-Apr-2024
  • (2023)AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test TimeIEEE Transactions on Games10.1109/TG.2022.320673315:4(637-647)Online publication date: Dec-2023
  • (2020)Reinforcement Learning for N-player Games: The Importance of Final AdaptationBioinspired Optimization Methods and Their Applications10.1007/978-3-030-63710-1_7(84-96)Online publication date: 16-Nov-2020
  • (2018)Sigmoid-weighted linear units for neural network function approximation in reinforcement learningNeural Networks10.1016/j.neunet.2017.12.012107(3-11)Online publication date: Nov-2018
  • (2017)Accelerating coevolution with adaptive matrix factorizationProceedings of the Genetic and Evolutionary Computation Conference10.1145/3071178.3071320(457-464)Online publication date: 1-Jul-2017
  • (2017)Residual Sarsa algorithm with function approximationCluster Computing10.1007/s10586-017-1303-8Online publication date: 10-Nov-2017
  • (2016)Discovering Rubik's Cube Subgroups using Coevolutionary GPProceedings of the Genetic and Evolutionary Computation Conference 201610.1145/2908812.2908887(789-796)Online publication date: 20-Jul-2016
  • (2016)Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position EvaluationIEEE Transactions on Computational Intelligence and AI in Games10.1109/TCIAIG.2015.24647118:4(389-401)Online publication date: Dec-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media