First Results from Using Temporal Difference Learning in Shogi

Beal, Donald F.; Smith, Martin C.

doi:10.1007/3-540-48957-6_7

Donald F. Beal⁶ &
Martin C. Smith⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1558))

Included in the following conference series:

International Conference on Computers and Games

493 Accesses
6 Citations

Abstract

This paper describes first results from the application of Temporal Difference learning [1] to shogi. We report on experiments to determine whether sensible values for shogi pieces can be obtained in the same manner as for western chess pieces [2]. The learning is obtained entirely from randomised self-play, without access to any form of expert knowledge. The piece values are used in a simple search program that chooses shogi moves from a shallow lookahead, using pieces values to evaluate the leaves, with a random tie-break at the top level. Temporal difference learning is used to adjust the piece values over the course of a series of games. The method is successful in learning values that perform well in matches against hand-crafted values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Monte Carlo Approaches to Parameterized Poker Squares

Mastering the game of Go with deep neural networks and tree search

Article 27 January 2016

Reinforcement Learning for N-player Games: The Importance of Final Adaptation

References

Sutton, R.S.: Learning to Predict by the Methods of Temporal Differences. Machine Learning 3 (1988) 9–44
Google Scholar
Beal, D.F. and Smith, M.C.: Learning Piece Values Using Temporal Differences International Computer Chess Association Journal, Vol. 20, No. 3 (1997) 147–151
Google Scholar
Levinson, R. and Snyder, R.: Adaptive Pattern Oriented Chess. Proceedings of AAAI-91, Morgan-Kaufman (1991) 601–605
Google Scholar
Christensen, J. and Korf, R.: A Unified Theory of Heuristic Evaluation Functions and its Application to Learning.. AAAI-86, Morgan-Kaufman (1986) 148–152
Google Scholar
Baxter, J., Tridgell, A. and Weaver, L.: KnightCap: A chess program that learns by combining TD(lambda) with game-tree search. In: Machine Learning, Proceedings of the Fifteenth International Conference (ICML’ 98), Madison (1998) 28–36
Google Scholar
Fairbairn, J.: Shogi for Beginners. Ishi Press International (1989)
Google Scholar
Leggett, T.: Shogi: Japan’s Game of Strategy. Charles E. Tuttle Company [Reprinted in 1993, first published in 1966]
Google Scholar
Matsubara, H., Iida, H. and Grimbergen, R.: Natural Developments in Game Research: From Chess to Shogi to Go International Computer Chess Association Journal, Vol. 19, No. 2 (1996) 103–112
Google Scholar
Tesauro, G.: Practical Issues in Temporal Difference Learning. Machine Learning 8 (1988) 9–44
Google Scholar
Tesauro, G.: TD-Gammon, a Self-Teaching Backgammon Program, achieves Master Level Play. Neural Computation, Vol. 6, No. 2 (1994) 215–219
Article Google Scholar
Marsland, T.A.: Computer Chess and Search. In: Shapiro, S. (ed.) Encyclopaedia of Artificial Intelligence. 2nd edn. J. Wiley & Sons (1992)
Google Scholar
Beal, D.F.: Experiments with the Null Move. In: Beal, D.F. (ed.) Advances in Computer Chess 5. Elsevier Science Publishers (1989) 65–79
Google Scholar
Donninger, C.: Null Move and Deep Search: Selective Search Heuristics for Obtuse Chess Programs. International Computer Chess Association Journal, Vol. 16, No. 3 (1993) 137–143
Google Scholar
Mutz, M.: Gnu Shogi v1.2p03. Available from many sources, including ftp://ftp.unipassau. de/pub/local/shogi (1994)
Yamashita, H.: YSS: About the Data Structures and the Algorithm. Published on the WWW at http://plaza15.mbn.or.jp/~yss (1997)

Download references

Author information

Authors and Affiliations

Department of Computer Science, Queen Mary and Westfield College, University of London, Mile End Road, London, E1 4NS, England
Donald F. Beal & Martin C. Smith

Authors

Donald F. Beal
View author publications
You can also search for this author in PubMed Google Scholar
Martin C. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Maastricht, Maastricht, The Netherlands
H. Jaap van den Herik
Department of Computer Science, Shizuoka University, Hamamatsu, Japan
Hiroyuki Iida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beal, D.F., Smith, M.C. (1999). First Results from Using Temporal Difference Learning in Shogi. In: van den Herik, H.J., Iida, H. (eds) Computers and Games. CG 1998. Lecture Notes in Computer Science, vol 1558. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48957-6_7

Download citation

DOI: https://doi.org/10.1007/3-540-48957-6_7
Published: 12 March 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65766-8
Online ISBN: 978-3-540-48957-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

First Results from Using Temporal Difference Learning in Shogi

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Monte Carlo Approaches to Parameterized Poker Squares

Mastering the game of Go with deep neural networks and tree search

Reinforcement Learning for N-player Games: The Importance of Final Adaptation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

First Results from Using Temporal Difference Learning in Shogi

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Monte Carlo Approaches to Parameterized Poker Squares

Mastering the game of Go with deep neural networks and tree search

Reinforcement Learning for N-player Games: The Importance of Final Adaptation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation