Abstract
In the standard version of the UCT algorithm, in the case of a continuous set of decisions, the exploration of new decisions is done through blind search. This can lead to very inefficient exploration, particularly in the case of large dimension problems, which often happens in energy management problems, for instance. In an attempt to use the information gathered through past simulations to better explore new decisions, we propose a method named Blind Value (BV). It only requires the access to a function that randomly draws feasible decisions. We also implement it and compare it to the original version of continuous UCT. Our results show that it gives a significant increase in convergence speed, in dimensions 12 and 80.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bellman, R.: Dynamic Programming. Princeton Univ. Press (1957)
Bertsekas, D., Tsitsiklis, J.: Neuro-dynamic Programming. Athena Scientific (1996)
Bubeck, S., Munos, R., Stoltz, G., Szepesvári, C.: Online optimization in x-armed bandits. In: Advances in Neural Information Processing Systems 22 (2008)
Chaslot, G., Winands, M., Uiterwijk, J., van den Herik, H., Bouzy, B.: Progressive Strategies for Monte-Carlo Tree Search. In: Wang, P., et al. (eds.) Proceedings of the 10th Joint Conference on Information Sciences (JCIS 2007), pp. 655–661. World Scientific Publishing Co. Pte. Ltd. (2007)
Couëtoux, A., Hoock, J.-B., Sokolovska, N., Teytaud, O., Bonnard, N.: Continuous Upper Confidence Trees. In: Coello, C.A.C. (ed.) LION 5. LNCS, vol. 6683, pp. 433–445. Springer, Heidelberg (2011)
Coulom, R.: Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)
Coulom, R.: Computing elo ratings of move patterns in the game of go. In: Computer Games Workshop, Amsterdam, The Netherlands (2007)
Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: ICML 2007: Proceedings of the 24th International Conference on Machine Learning, pp. 273–280. ACM Press, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Couëtoux, A., Doghmen, H., Teytaud, O. (2012). Improving the Exploration in Upper Confidence Trees. In: Hamadi, Y., Schoenauer, M. (eds) Learning and Intelligent Optimization. LION 2012. Lecture Notes in Computer Science, vol 7219. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34413-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-34413-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34412-1
Online ISBN: 978-3-642-34413-8
eBook Packages: Computer ScienceComputer Science (R0)