Deep Reinforcement Learning for Morpion Solitaire

Doux, Boris; Negrevergne, Benjamin; Cazenave, Tristan

doi:10.1007/978-3-031-11488-5_2

Boris Doux¹⁰,
Benjamin Negrevergne¹⁰ &
Tristan Cazenave¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13262))

Included in the following conference series:

Advances in Computer Games

346 Accesses
1 Citations

Abstract

The efficiency of Monte-Carlo based algorithms heavily relies on a random search heuristic, which is often hand-crafted using domain knowledge. To improve the generality of these approaches, new algorithms such as Nested Rollout Policy Adaptation (NRPA), have replaced the hand crafted heuristic with one that is trained online, using data collected during the search. Despite the limited expressiveness of the policy model, NRPA is able to outperform traditional Monte-Carlo algorithms (i.e. without learning) on various games including Morpion Solitaire. In this paper, we combine Monte-Carlo search with a more expressive, non-linear policy model, based on a neural network trained beforehand. We then demonstrate how to use this network in order to obtain state-of-the-art results with this new technique on the game of Morpion Solitaire. We also use NeuralNRPA as an expert to train a model with Expert Iteration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anthony, T., Tian, Z., Barber, D.: Thinking fast and slow with deep learning and tree search. In: Advances in Neural Information Processing Systems, pp. 5360–5370 (2017)
Google Scholar
Buzer, L., Cazenave, T.: Playout optimization for Monte Carlo search algorithms. Application to Morpion Solitaire. In: IEEE Conference on Games (2021)
Google Scholar
Cazenave, T.: Nested Monte-Carlo search. In: Boutilier, C. (ed.) IJCAI, pp. 456–461 (2009)
Google Scholar
Cazenave, T.: Residual networks for computer go. IEEE Trans. Games 10(1), 107–110 (2018)
Article Google Scholar
Cazenave, T.: Generalized nested rollout policy adaptation. arXiv preprint arXiv:2003.10024 (2020)
Cazenave, T.: Mobile networks for computer go. IEEE Tans. Games (2020)
Google Scholar
Cazenave, T., et al.: Polygames: improved zero learning. ICGA J. 42(4), 244–255 (2020)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, pp. 770–778 (2016)
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Rosin, C.D.: Nested rollout policy adaptation for Monte Carlo Tree search. In: IJCAI, pp. 649–654 (2011)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Article MathSciNet Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Wang, H., Preuss, M., Emmerich, M., Plaat, A.: Tackling Morpion Solitaire with AlphaZero-like ranked reward reinforcement learning (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

LAMSADE, Université Paris-Dauphine, PSL, CNRS, Paris, France
Boris Doux, Benjamin Negrevergne & Tristan Cazenave

Authors

Boris Doux
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Negrevergne
View author publications
You can also search for this author in PubMed Google Scholar
Tristan Cazenave
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tristan Cazenave .

Editor information

Editors and Affiliations

Maastricht University, Maastricht, The Netherlands
Cameron Browne
IBM Research - Tokyo, Tokyo, Japan
Akihiro Kishimoto
University of Alberta, Edmonton, AB, Canada
Jonathan Schaeffer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Doux, B., Negrevergne, B., Cazenave, T. (2022). Deep Reinforcement Learning for Morpion Solitaire. In: Browne, C., Kishimoto, A., Schaeffer, J. (eds) Advances in Computer Games. ACG 2021. Lecture Notes in Computer Science, vol 13262. Springer, Cham. https://doi.org/10.1007/978-3-031-11488-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-11488-5_2
Published: 01 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11487-8
Online ISBN: 978-3-031-11488-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Reinforcement Learning for Morpion Solitaire