Learning Time Allocation Using Neural Networks

Kocsis, Levente; Uiterwijk, Jos; van den Herik, Jaap

doi:10.1007/3-540-45579-5_11

Levente Kocsis⁶,
Jos Uiterwijk⁶ &
Jaap van den Herik⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2063))

Included in the following conference series:

International Conference on Computers and Games

896 Accesses
3 Citations

Abstract

The strength of a game-playing program is mainly based on the adequacy of the evaluation function and the efficacy of the search algorithm. This paper investigates how temporal difference learning and genetic algorithms can be used to improve various decisions made during game-tree search. The existent TD algorithms are not directly suitable for learning search decisions. Therefore we propose a modified update rule that uses the TD error of the evaluation function to shorten the lag between two rewards. The genetic algorithms can be applied directly to learn search decisions. For our experiments we selected the problem of time allocation from the set of search decisions. On each move the player can decide on a certain search depth, being constrained by the amount of time left. As testing ground, we used the game of Lines of Action, which has roughly the same complexity as Othello. From the results we conclude that both the TD and the genetic approach lead to good results when compared to the existent time-allocation techniques. Finally, a brief discussion of the issues that can emerge when the algorithms are applied to more complex search decisions is given.

Acknowledgements

The authors thank the referees for their constructive comments and suggestions for improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

T.S. Anantharaman. Evaluation tuning for computer chess: Linear discriminant methods. ICCA Journal, 20(4):224–242, 1997.
Google Scholar
E.B. Baum and W.D. Smith. A bayesian approach to relevance in game playing. Artificial Intelligence, 97(1-2):195–242, 1997.
Article MATH MathSciNet Google Scholar
J. Baxter, A. Tridgell, and L. Weaver. Experiments in parameter learning using temporal differences. ICCA Journal, 21(2):84–99, 1998.
Google Scholar
D.F. Beal and M.C. Smith. Learning piece values using temporal difference learning. ICCA Journal, 20(3):147–151, 1997.
Google Scholar
D.F. Beal and M.C. Smith. Temporal difference learning for heuristic search and game playing. Information Sciences, 122(1):3–21, 2000.
Article Google Scholar
Y. Björnsson and T. Marsland. Learning search control in adversary games. In H.J. van den Herik and B. Monien, editors, Proceedings of the Advances in Computer Games 9 Conference, 2000.
Google Scholar
M. Buro. Experiments with Multi-ProbCut and a new high-quality evaluation function for Othello. In H. J. van den Herik and H. Iida, editors, Games in AI Research. 1999.
Google Scholar
D. Carmel and S. Markovitch. Incorporating opponent models into adversary search. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pages 120–125, 1996.
Google Scholar
K. Chellapilla and D.B. Fogel. Co-evolving checkers playing programs using only win, lose, or draw. In Proceedings of SPIE’s AeroSense’99: Applications and Science of Computational Intelligence II, 1999.
Google Scholar
D.E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA, 1989.
MATH Google Scholar
D. Harada and S. Russell. Extended abstract: Learning search strategies. In AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information, 1999.
Google Scholar
E.A. Heinz. Adaptive null-move pruning. ICCA Journal, 22(3):123–132, 1999.
Google Scholar
R.M. Hyatt. Using time wisely. ICCA Journal, 7(1):4–9, 1984.
Google Scholar
H. Iida, J.W.H.M. Uiterwijk, H.J. van den Herik, and I.S. Herschberg. Potential applications of opponent-model search. Part 1: The domain of applicability. ICCA Journal, 16(4):201–208, 1993.
Google Scholar
T. Jaakkola, S. Singh, and M. Jordan. Reinforcement learning algorithm for partially observable markov problems. In Advances in Neural Information Processing Systems 7, pages 345–352, 1994.
Google Scholar
L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, 1998.
Article MATH MathSciNet Google Scholar
H. Kimura and S. Kobayashi. An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In Proceedings of the 15th International Conference on Machine Learning, pages 278–286, 1998.
Google Scholar
V.R. Konda and V.S. Borkar. Actor-critic type learning algorithms for markov decision processes. SIAM Journal of Control and Optimisation, 38(1):94–133, 1999.
Article MATH MathSciNet Google Scholar
V.R. Konda and J.N. Tsitsiklis. Actor-critic algorithms. In Advances in Neural Information Processing Systems 12, 2000.
Google Scholar
B.C. Kuszmaul. The StarTech massively parallel chess program. ICCA Journal, 18(1):3–19, 1995.
Google Scholar
C. Leiserson. Using the Cilk multithreaded programming language to implement a multiprocessor chess program. In H.J. van den Herik and B. Monien, editors, Proceedings of the Advances in Computer Games 9 Conference, 2000.
Google Scholar
S. Markovitch and Y. Sella. Learning of resource allocation strategies for game playing. Computational Intelligence, 12(1):88–105, 1996.
Article Google Scholar
T.M. Mitchell, R.M. Keller, and S. Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine Learning, 1(1):47–80, 1986.
Google Scholar
D.E. Moriarty and R. Miikkulainen. Hierarchical evolution of neural networks. In Proceedings of the 1998 IEEE Conference on Evolutionary Computation, pages 428–433, 1998.
Google Scholar
D.E. Moriarty, A.C. Schultz, and J.J. Grefenstette. Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 11:241–276, 1999.
MATH Google Scholar
N. Richards, D. Moriarty, and R. Miikkulainen. Evolving neural networks to play Go. Applied Intelligence, 8:85–96, 1998.
Article Google Scholar
S. Russell and E.H. Wefald. Do the Right Thing: Studies in Limited Rationality. MIT Press, 1991.
Google Scholar
A.L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3):211–229, 1959.
Article MathSciNet Google Scholar
J. Schaeffer and A. Plaat. Kasparov versusDEEPBLUE: The rematch. ICCA Journal, 20(2):95–101, 1997.
Google Scholar
Y. Seirawan. The Kasparov-DEEP BLUE games. ICCA Journal, 20(2):102–125, 1997.
Google Scholar
R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
Google Scholar
R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057–1063, 2000.
Google Scholar
G.J. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8:257–277, 1992.
MATH Google Scholar
S. Thrun. Learning to play the game of chess. In Advances in Neural Information Processing Systems 7, pages 1069–1076, 1995.
Google Scholar
M.H.M. Winands. Analysis and implementation of Lines of Action. Master’s thesis, Department of Computer Science, Universiteit Maastricht, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Institute for Knowledge and Agent Technology, Universiteit Maastricht, P.O. Box 616, 6200 MD, Maastricht, The Netherlands
Levente Kocsis, Jos Uiterwijk & Jaap van den Herik

Authors

Levente Kocsis
View author publications
You can also search for this author in PubMed Google Scholar
Jos Uiterwijk
View author publications
You can also search for this author in PubMed Google Scholar
Jaap van den Herik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Alberta, Edmonton, Alberta, Canada
Tony Marsland
Future University - Hakodate, Hakodate, Hokkaido, Japan
Ian Frank

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kocsis, L., Uiterwijk, J., van den Herik, J. (2001). Learning Time Allocation Using Neural Networks. In: Marsland, T., Frank, I. (eds) Computers and Games. CG 2000. Lecture Notes in Computer Science, vol 2063. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45579-5_11

Download citation

DOI: https://doi.org/10.1007/3-540-45579-5_11
Published: 20 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43080-3
Online ISBN: 978-3-540-45579-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics