Abstract
The efficiency of Monte-Carlo based algorithms heavily relies on a random search heuristic, which is often hand-crafted using domain knowledge. To improve the generality of these approaches, new algorithms such as Nested Rollout Policy Adaptation (NRPA), have replaced the hand crafted heuristic with one that is trained online, using data collected during the search. Despite the limited expressiveness of the policy model, NRPA is able to outperform traditional Monte-Carlo algorithms (i.e. without learning) on various games including Morpion Solitaire. In this paper, we combine Monte-Carlo search with a more expressive, non-linear policy model, based on a neural network trained beforehand. We then demonstrate how to use this network in order to obtain state-of-the-art results with this new technique on the game of Morpion Solitaire. We also use NeuralNRPA as an expert to train a model with Expert Iteration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anthony, T., Tian, Z., Barber, D.: Thinking fast and slow with deep learning and tree search. In: Advances in Neural Information Processing Systems, pp. 5360–5370 (2017)
Buzer, L., Cazenave, T.: Playout optimization for Monte Carlo search algorithms. Application to Morpion Solitaire. In: IEEE Conference on Games (2021)
Cazenave, T.: Nested Monte-Carlo search. In: Boutilier, C. (ed.) IJCAI, pp. 456–461 (2009)
Cazenave, T.: Residual networks for computer go. IEEE Trans. Games 10(1), 107–110 (2018)
Cazenave, T.: Generalized nested rollout policy adaptation. arXiv preprint arXiv:2003.10024 (2020)
Cazenave, T.: Mobile networks for computer go. IEEE Tans. Games (2020)
Cazenave, T., et al.: Polygames: improved zero learning. ICGA J. 42(4), 244–255 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, pp. 770–778 (2016)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Rosin, C.D.: Nested rollout policy adaptation for Monte Carlo Tree search. In: IJCAI, pp. 649–654 (2011)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Wang, H., Preuss, M., Emmerich, M., Plaat, A.: Tackling Morpion Solitaire with AlphaZero-like ranked reward reinforcement learning (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Doux, B., Negrevergne, B., Cazenave, T. (2022). Deep Reinforcement Learning for Morpion Solitaire. In: Browne, C., Kishimoto, A., Schaeffer, J. (eds) Advances in Computer Games. ACG 2021. Lecture Notes in Computer Science, vol 13262. Springer, Cham. https://doi.org/10.1007/978-3-031-11488-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-11488-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11487-8
Online ISBN: 978-3-031-11488-5
eBook Packages: Computer ScienceComputer Science (R0)