Abstract
This paper discusses gradients of search values with a parameter vector θ in an evaluation function. Recent learning methods for evaluation functions in computer shogi are based on minimization of an objective function with search results. The gradients of the evaluation function at the leaf position of a principal variation (PV) are used to make an easy substitution of the gradients of the search result. By analyzing the variations of the min-max value, we show (1) when the min-max value is partially differentiable and (2) how the substitution may introduce errors. Experiments on a shogi program with about a million parameters show how frequently such errors occur, as well as how effective the substitutions for parameter tuning are in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anantharaman, T.: Evaluation tuning for computer chess: Linear discriminant methods. ICCA Journal 20, 224–242 (1997)
Baxter, J., Tridgell, A., Weaver, L.: Learning to play chess using temporal-differences. Machine Learning 40, 242–263 (2000)
Beal, D.F., Smith, M.C.: Temporal difference learning applied to game playing and the results of application to shogi. Theoretical Computer Science 252, 105–119 (2001)
Buro, M.: Improving heuristic mini-max search by supervised learning. Artificial Intelligence 134, 85–99 (2002)
Campbell, M., Hoane Jr., A.J., Hsu, F.H.: Deep blue. Artificial Intelligence 134, 57–83 (2002)
Fawcett, T.E.: Feature Discovery for Problem Solving Systems. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst (1993)
Fürnkranz, J.: Machine learning in games: a survey. In: Machines that Learn to Play Games, pp. 11–59. Nova Science Publishers, Commack (2001)
Hoki, K.: (2005) (in Japanese), http://www.geocities.jp/bonanza_shogi/
Hoki, K.: Optimal control of minimax search results to learn positional evaluation. In: GPW 2006, pp. 78–83 (2006) (in Japanese)
Hoki, K., Kaneko, T.: Large-scale optimization of evaluation functions with minimax search (in preparation)
Iida, H., Sakuta, M., Rollason, J.: Computer shogi. Artificial Intelligence 134, 121–144 (2002)
Kaneko, T.: Recent improvements on computer shogi and GPS-Shogi. Journal of Information Processing Society of Japan 50, 878–886 (2009) (in Japanese)
Marsland, T.: Evaluation function factors. ICCA Journal 8, 47–57 (1985)
Nowatzyk, A.: (2000), http://tim-mann.org/DT_eval_tune.txt
Tanaka, T., Kaneko, T.: (2003), http://gps.tanaka.ecc.u-tokyo.ac.jp/gpsshogi/
Tesauro, G.: Comparison training of chess evaluation functions. In: Machines that Learn to Play Games, pp. 117–130. Nova Science Publishers (2001)
Tesauro, G.: Programming backgammon using self-teaching neural nets. Artificial Intelligence 134, 181–199 (2002)
Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. In: Advances in Neural Information Processing Systems 22, pp. 1937–1945 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kaneko, T., Hoki, K. (2012). Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes. In: van den Herik, H.J., Plaat, A. (eds) Advances in Computer Games. ACG 2011. Lecture Notes in Computer Science, vol 7168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31866-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-31866-5_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31865-8
Online ISBN: 978-3-642-31866-5
eBook Packages: Computer ScienceComputer Science (R0)