research-article

High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris

Authors:

Wojciech Jaśkowski,

Marcin Szubert,

Paweł Liskowski,

Krzysztof KrawiecAuthors Info & Claims

GECCO '15: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation

Pages 567 - 573

https://doi.org/10.1145/2739480.2754783

Published: 11 July 2015 Publication History

Abstract

SZ-Tetris, a restricted version of Tetris, is a difficult reinforcement learning task. Previous research showed that, similarly to the original Tetris, value function-based methods such as temporal difference learning, do not work well for SZ-Tetris. The best performance in this game was achieved by employing direct policy search techniques, in particular the cross-entropy method in combination with handcrafted features. Nonetheless, a simple heuristic hand-coded player scores even higher. Here we show that it is possible to equal its performance with CMA-ES (Covariance Matrix Adaptation Evolution Strategy). We demonstrate that further improvement is possible by employing systematic n-tuple network, a knowledge-free function approximator, and VD-CMA-ES, a linear variant of CMA-ES for high dimension optimization. Last but not least, we show that a large systematic n-tuple network (involving more than 4 million parameters) allows the classical temporal difference learning algorithm to obtain similar average performance to VD-CMA-ES, but at 20 times lower computational expense, leading to the best policy for SZ-Tetris known to date. These results enrich the current understanding of difficulty of SZ-Tetris, and shed new light on the capabilities of particular search paradigms when applied to representations of various characteristics and dimensionality.

References

[1]

Y. Akimoto, A. Auger, N. Hansen, and Others. Comparison-Based Natural Gradient Optimization in High Dimension. In Genetic and Evolutionary Computation Conference GECCO'14, 2014.

Digital Library

[2]

D. P. Bertsekas and S. Ioffe. Temporal differences-based policy iteration and applications in neuro-dynamic programming. Lab. for Info. and Decision Systems Report LIDS-P-2349, MIT, Cambridge, MA, 1996.

[3]

W. W. Bledsoe and I. Browning. Pattern recognition and reading by machine. In Proc. Eastern Joint Comput. Conf., pages 225--232, 1959.

Digital Library

[4]

H. Burgiel. How to lose at Tetris. Mathematical Gazette, 81:194--200, 1997.

[5]

S. Faußer and F. Schwenker. Selective Neural Network Ensembles in Reinforcement Learning. In European Symposium on Artificial Neural Networks, pages 105--110, 2014.

[6]

S. Faußer and F. Schwenker. Neural Network Ensembles in Reinforcement Learning. Neural Processing Letters, 41(1):55--69, 2015.

Digital Library

[7]

N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary computation, 9(2):159--195, 2001.

Digital Library

[8]

W. Ja\'skowski. Systematic n-tuple networks for othello position evaluation. ICGA Journal, 37(2):85--96, June 2014.

[9]

S. Kalyanakrishnan and P. Stone. Characterizing reinforcement learning methods through parameterized learning problems. Machine Learning, 84(1--2):205--247, 2011.

Digital Library

[10]

K. Krawiec, W. Ja\'skowski, and M. Szubert. Evolving small-board go players using coevolutionary temporal difference learning with archive. International Journal of Applied Mathematics and Computer Science, 21(4):717--731, 2011.

Digital Library

[11]

S. Lucas. Learning to play Othello with n-tuple systems. Australian Journal of Intelligent Information Processing, 4:1--20, 2008.

[12]

S. M. Lucas. Learning to play Othello with N-tuple systems. Australian Journal of Intelligent Information Processing Systems, Special Issue on Game Technology, 9(4):01--20, 2007.

[13]

R. Y. Rubinstein and D. P. Kroese. The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Springer Science & Business Media, 2004.

Digital Library

[14]

F. Stulp and O. Sigaud. Path integral policy improvement with covariance matrix adaptation. arXiv preprint arXiv:1206.4621, 2012.

[15]

R. S. Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9--44, 1988.

[16]

I. Szita. Reinforcement learning in games. In Reinforcement Learning, pages 539--577. Springer, 2012.

[17]

I. Szita and C. Szepesvári. SZ-Tetris as a benchmark for studying key problems of reinforcement learning. In Proceedings of the ICML 2010 Workshop on Machine Learning and Games., 2010.

[18]

M. Szubert and W. Ja\'skowski. Temporal difference learning of n-tuple networks for the game 2048. In IEEE Conference on Computational Intelligence and Games, pages 1--8, Dortmund, Aug 2014. IEEE.

[19]

M. Szubert, W. Ja\'skowski, and K. Krawiec. Coevolutionary temporal difference learning for othello. In IEEE Symposium on Computational Intelligence and Games, pages 104--111, Milano, Italy, 2009.

Digital Library

[20]

M. Szubert, W. Ja\'skowski, and K. Krawiec. On scalability, generalization, and hybridization of coevolutionary learning: a case study for othello. IEEE Transactions on Computational Intelligence and AI in Games, 5(3):214--226, 2013.

[21]

M. Szubert, W. Wojciech Ja\'skowski, and K. Krzysztof Krawiec. Learning Board Evaluation Function for Othello by Hybridizing Coevolution with Temporal Difference Learning. Control and Cybernetics, 2011.

[22]

G. Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58--68, 1995.

Digital Library

[23]

C. Thiery and B. Scherrer. Building controllers for Tetris. International Computer Games Association Journal, 32:3--11, 2009.

[24]

C. Thiery and B. Scherrer. Improvements on learning Tetris with cross entropy. International Computer Games Association Journal, 32, 2009.

[25]

M. Thill, P. Koch, and W. Konen. Reinforcement Learning with N-tuples on the Game Connect-4. In C. A. C. Coello, V. Cutello, K. Deb, S. Forrest, G. Nicosia, and M. Pavone, editors, Parallel Problem Solving from Nature - PPSN XII, volume 7491 of Lecture Notes in Computer Science, pages 184--194. Springer, 2012.

Digital Library

Cited By

Dallant JIacono J(2024)How fast can we play Tetris greedily with rectangular pieces?Theoretical Computer Science10.1016/j.tcs.2024.114405992:COnline publication date: 21-Apr-2024
https://dl.acm.org/doi/10.1016/j.tcs.2024.114405
Scheiermann JKonen W(2023)AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test TimeIEEE Transactions on Games10.1109/TG.2022.320673315:4(637-647)Online publication date: Dec-2023
https://doi.org/10.1109/TG.2022.3206733
Konen WBagheri S(2020)Reinforcement Learning for N-player Games: The Importance of Final AdaptationBioinspired Optimization Methods and Their Applications10.1007/978-3-030-63710-1_7(84-96)Online publication date: 16-Nov-2020
https://doi.org/10.1007/978-3-030-63710-1_7
Show More Cited By

Index Terms

High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris

Recommendations

Reinforcement learning algorithms with function approximation: Recent advances and applications

In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques in RL will be essential to deal ...
Learning classifier system equivalent with reinforcement learning with function approximation
GECCO '05: Proceedings of the 7th annual workshop on Genetic and evolutionary computation

We present an experimental comparison of the reinforcement process between Learning Classifier System (LCS) and Reinforcement Learning (RL) with function approximation (FA) method, regarding their generalization mechanisms. To validate our previous ...
Reinforcement learning algorithms: A brief survey
Highlights
- RL can be used to solve problems involving sequential decision-making.
- RL is based on trial-and-error learning through rewards and punishments.
- The ultimate goal of an RL agent is to maximize cumulative reward.
- RL agent tries ...
Abstract
Reinforcement Learning (RL) is a machine learning (ML) technique to learn sequential decision-making in complex problems. RL is inspired by trial-and-error based human/animal learning. It can learn an optimal policy autonomously with knowledge ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '15: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation

July 2015

1496 pages

ISBN:9781450334723

DOI:10.1145/2739480

Editor:
Sara Silva
Universidade de Lisboa, Portugal
,
General Chair:
Anna I. Esparcia-Alcázar
Universitat Politècnica de València, Spain

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Narodowe Centrum Nauki

Conference

GECCO '15

Sponsor:

SIGEVO

GECCO '15: Genetic and Evolutionary Computation Conference

July 11 - 15, 2015

Madrid, Spain

Acceptance Rates

GECCO '15 Paper Acceptance Rate 182 of 505 submissions, 36%;

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
185
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dallant JIacono J(2024)How fast can we play Tetris greedily with rectangular pieces?Theoretical Computer Science10.1016/j.tcs.2024.114405992:COnline publication date: 21-Apr-2024
https://dl.acm.org/doi/10.1016/j.tcs.2024.114405
Scheiermann JKonen W(2023)AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test TimeIEEE Transactions on Games10.1109/TG.2022.320673315:4(637-647)Online publication date: Dec-2023
https://doi.org/10.1109/TG.2022.3206733
Konen WBagheri S(2020)Reinforcement Learning for N-player Games: The Importance of Final AdaptationBioinspired Optimization Methods and Their Applications10.1007/978-3-030-63710-1_7(84-96)Online publication date: 16-Nov-2020
https://doi.org/10.1007/978-3-030-63710-1_7
Elfwing SUchibe EDoya K(2018)Sigmoid-weighted linear units for neural network function approximation in reinforcement learningNeural Networks10.1016/j.neunet.2017.12.012107(3-11)Online publication date: Nov-2018
https://doi.org/10.1016/j.neunet.2017.12.012
Liskowski PJaśkowski WBosman P(2017)Accelerating coevolution with adaptive matrix factorizationProceedings of the Genetic and Evolutionary Computation Conference10.1145/3071178.3071320(457-464)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1145/3071178.3071320
Qiming FWen HQuan LHeng LLingyao HJianping C(2017)Residual Sarsa algorithm with function approximationCluster Computing10.1007/s10586-017-1303-8Online publication date: 10-Nov-2017
https://doi.org/10.1007/s10586-017-1303-8
Smith RKelly SHeywood MNeumann FSutton A(2016)Discovering Rubik's Cube Subgroups using Coevolutionary GPProceedings of the Genetic and Evolutionary Computation Conference 201610.1145/2908812.2908887(789-796)Online publication date: 20-Jul-2016
https://dl.acm.org/doi/10.1145/2908812.2908887
Jaskowski WSzubert M(2016)Coevolutionary CMA-ES for Knowledge-Free Learning of Game Position EvaluationIEEE Transactions on Computational Intelligence and AI in Games10.1109/TCIAIG.2015.24647118:4(389-401)Online publication date: Dec-2016
https://doi.org/10.1109/TCIAIG.2015.2464711

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten