Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes

Kaneko, Tomoyuki; Hoki, Kunihito

doi:10.1007/978-3-642-31866-5_14

Tomoyuki Kaneko¹⁷ &
Kunihito Hoki¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7168))

Included in the following conference series:

Advances in Computer Games

1791 Accesses
3 Citations

Abstract

This paper discusses gradients of search values with a parameter vector θ in an evaluation function. Recent learning methods for evaluation functions in computer shogi are based on minimization of an objective function with search results. The gradients of the evaluation function at the leaf position of a principal variation (PV) are used to make an easy substitution of the gradients of the search result. By analyzing the variations of the min-max value, we show (1) when the min-max value is partially differentiable and (2) how the substitution may introduce errors. Experiments on a shogi program with about a million parameters show how frequently such errors occur, as well as how effective the substitutions for parameter tuning are in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anantharaman, T.: Evaluation tuning for computer chess: Linear discriminant methods. ICCA Journal 20, 224–242 (1997)
Google Scholar
Baxter, J., Tridgell, A., Weaver, L.: Learning to play chess using temporal-differences. Machine Learning 40, 242–263 (2000)
Article Google Scholar
Beal, D.F., Smith, M.C.: Temporal difference learning applied to game playing and the results of application to shogi. Theoretical Computer Science 252, 105–119 (2001)
Article MathSciNet MATH Google Scholar
Buro, M.: Improving heuristic mini-max search by supervised learning. Artificial Intelligence 134, 85–99 (2002)
Article MATH Google Scholar
Campbell, M., Hoane Jr., A.J., Hsu, F.H.: Deep blue. Artificial Intelligence 134, 57–83 (2002)
Article MATH Google Scholar
Fawcett, T.E.: Feature Discovery for Problem Solving Systems. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst (1993)
Google Scholar
Fürnkranz, J.: Machine learning in games: a survey. In: Machines that Learn to Play Games, pp. 11–59. Nova Science Publishers, Commack (2001)
Google Scholar
Hoki, K.: (2005) (in Japanese), http://www.geocities.jp/bonanza_shogi/
Hoki, K.: Optimal control of minimax search results to learn positional evaluation. In: GPW 2006, pp. 78–83 (2006) (in Japanese)
Google Scholar
Hoki, K., Kaneko, T.: Large-scale optimization of evaluation functions with minimax search (in preparation)
Google Scholar
Iida, H., Sakuta, M., Rollason, J.: Computer shogi. Artificial Intelligence 134, 121–144 (2002)
Article MATH Google Scholar
Kaneko, T.: Recent improvements on computer shogi and GPS-Shogi. Journal of Information Processing Society of Japan 50, 878–886 (2009) (in Japanese)
Google Scholar
Marsland, T.: Evaluation function factors. ICCA Journal 8, 47–57 (1985)
Google Scholar
Nowatzyk, A.: (2000), http://tim-mann.org/DT_eval_tune.txt
Tanaka, T., Kaneko, T.: (2003), http://gps.tanaka.ecc.u-tokyo.ac.jp/gpsshogi/
Tesauro, G.: Comparison training of chess evaluation functions. In: Machines that Learn to Play Games, pp. 117–130. Nova Science Publishers (2001)
Google Scholar
Tesauro, G.: Programming backgammon using self-teaching neural nets. Artificial Intelligence 134, 181–199 (2002)
Article MATH Google Scholar
Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. In: Advances in Neural Information Processing Systems 22, pp. 1937–1945 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Graphics and Computer Sciences, The University of Tokyo, Japan
Tomoyuki Kaneko
Department of Communication Engineering and Informatics, The University of Electro-Communications, Japan
Kunihito Hoki

Authors

Tomoyuki Kaneko
View author publications
You can also search for this author in PubMed Google Scholar
Kunihito Hoki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tilburg Institute of Cognition and Communication, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands
H. Jaap van den Herik & Aske Plaat &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaneko, T., Hoki, K. (2012). Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes. In: van den Herik, H.J., Plaat, A. (eds) Advances in Computer Games. ACG 2011. Lecture Notes in Computer Science, vol 7168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31866-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-31866-5_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31865-8
Online ISBN: 978-3-642-31866-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics