Abstract
Simulation balancing is a new technique to tune parameters of a playout policy for a Monte-Carlo game-playing program. So far, this algorithm had only been tested in a very artificial setting: it was limited to 5×5 and 6×6 Go, and required a stronger external program that served as a supervisor. In this paper, the effectiveness of simulation balancing is demonstrated in a more realistic setting. A state-of-the-art program, Erica, learned an improved playout policy on the 9×9 board, without requiring any external expert to provide position evaluations. The evaluations were collected by letting the program analyze positions by itself. The previous version of Erica learned pattern weights with the minorization-maximization algorithm. Thanks to simulation balancing, its playing strength was improved from a winning rate of 69% to 78% against Fuego 0.4.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramson, B.: Expected-outcome: A general model of static evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(2), 182–193 (1990)
Brügmann, B.: Monte Carlo Go (1993) (unpublished technical report)
Bouzy, B., Helmstetter, B.: Monte Carlo Go developments. In: van den Herik, H.J., Iida, H., Heinz, E.A. (eds.) ACG10, pp. 159–175. Kluwer Academic Publishers, Dordrecht (2003)
Coulom, R.: Efficient selectivity and backup operators in monte-carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)
Gelly, S., Wang, Y., Munos, R., Teytaud, O.: Modification of UCT with patterns in Monte-Carlo Go. Technical Report RR-6062, INRIA (2006)
Bouzy, B.: Associating domain-dependent knowledge and Monte-Carlo approaches within a Go program. Information Sciences, Heuristic Search and Computer Game Playing IV 175(4), 247–257 (2005)
Chen, K.H., Zhang, P.: Monte-Carlo Go with knowledge-guided simulations. ICGA Journal 31(2), 67–76 (2008)
Chaslot, G., Fiter, C., Hoock, J.-B., Rimmel, A., Teytaud, O.: Adding expert knowledge and exploration in monte-carlo tree search. In: van den Herik, H.J., Spronck, P. (eds.) ACG 2009. LNCS, vol. 6048, pp. 1–13. Springer, Heidelberg (2010)
Bouzy, B., Chaslot, G.: Monte-Carlo Go reinforcement learning experiments. In: Kendall, G., Louis, S. (eds.) 2006 IEEE Symposium on Computational Intelligence and Games, Reno, USA, pp. 187–194 (May 2006)
Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Proceedings of the 24th International Conference on Machine Learning, Corvallis Oregon, USA, pp. 273–280 (2007)
Chaslot, G.M.J.B., Winands, M.H.M., Szita, I., van den Herik, H.J.: Cross-entropy for Monte-Carlo tree search. ICGA Journal 31(3), 145–156 (2008)
Silver, D., Tesauro, G.: Monte-Carlo simulation balancing. In: Bottou, L., Littman, M. (eds.) Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada, pp. 945–952. Omnipress (June 2009)
Coulom, R.: Computing Elo ratings of move patterns in the game of Go. ICGA Journal 30(4), 198–208 (2007)
Enzenberger, M., Muller, M.: Fuego—an open-source framework for board games and Go engine based on Monte-Carlo tree search. Technical Report TR 09-08, University of Alberta, Edmonton, Alberta, Canada (2009)
Anderson, D.A.: Monte Carlo search in games. Technical report, Worcester Polytechnic Institute (2009)
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chaslot, G., Winands, M., Bouzy, B., Uiterwijk, J.W.H.M., van den Herik, H.J.: Progressive strategies for monte-carlo tree search. In: Wang, P. (ed.) Proceedings of the 10th Joint Conference on Information Sciences, Salt Lake City, USA, pp. 655–661 (2007)
Goertz, U., Shubert, W.: Game records in SGF format (2007), http://www.u-go.net/gamerecords/
Chung-Hsiung, L.: Web2go web site (2009), http://www.web2go.idv.tw/gopro/
Silver, D.: Message to the computer-go mailing list (2009), http://www.mail-archive.com/computer-go@computer-go.org/msg11260.html
Schraudolph, N.N.: Local gain adaptation in stochastic gradient descent. In: Proceedings of the 9th International Conference on Artificial Neural Networks, London. IEEE, Los Alamitos (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, SC., Coulom, R., Lin, SS. (2011). Monte-Carlo Simulation Balancing in Practice. In: van den Herik, H.J., Iida, H., Plaat, A. (eds) Computers and Games. CG 2010. Lecture Notes in Computer Science, vol 6515. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17928-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-17928-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17927-3
Online ISBN: 978-3-642-17928-0
eBook Packages: Computer ScienceComputer Science (R0)