An analysis of Single-Player Monte Carlo Tree Search performance in Sokoban

https://doi.org/10.1016/j.eswa.2021.116224Get rights and content

Highlights

  • Sokoban requires domain knowledge to be solved.

  • MCTS has been successful in several domains including puzzle games.

  • The best available Sokoban solver is based on IDA*

  • MCTS with domain knowledge can reach interesting performance in Sokoban.

  • IDA* still provides the best performance.

Abstract

We apply the extension of Monte Carlo Tree Search for single player games (SP-MCTS) to Sokoban and compare its performance to a solver integrating Iterative Deepening A* (IDA*) with several problem-specific heuristics. We introduce two extensions of MCTS to deal with some of the challenges that Sokoban poses to MCTS methods, namely, the reduced search space that deadlock situations can cause and the large number of cycles. We also evaluate three domain-independent enhancements that have been shown to improve MCTS performance, namely, UCB1-Tuned, Rapid Action Value Estimation (RAVE), and Node Recycling. We perform a series of experiments to determine the best SP-MCTS configuration and then compare its performance to IDA*. We show that SP-MCTS can solve around 85% of the levels with 1000000 iterations, that is the same performance reached by IDA* with only 10000 nodes. Overall, our results suggest that IDA* is still the best solver for Sokoban, also because it can easily integrate much domain knowledge. At the same time, our results also highlight some interesting directions to design better MCTS solvers for this domain.

Introduction

Puzzles are a very popular pastime and it has been so since the dawn of mankind. They are known as an effective way to stimulate brain activity and mental welfare as well as simply being fun. There are many types of puzzles that challenge different problem-solving skills (e.g., logic, pattern recognition, sequence solving, and word completion) whose solutions may require structuring a form or creating a certain order. Sokoban1 is a puzzle game in which the player controls a warehouse keeper (sōkoban in Japanese) that has to push a set of boxes to their storage positions. The character cannot pull boxes and this is the major source of deadlock situations that represent one of the main challenges for solving Sokoban levels. The game is extremely challenging, both for human and artificial players, and it has been shown to be NP-hard and P-space complete (Culberson, 1997).

In this paper, we apply the extension of Monte Carlo Tree Search for single player games (SP-MCTS) (Schadd et al., 2012) to Sokoban and compare its performance to a solver based on Iterative Deepening A* (IDA*) (Korf, 1985) and the work of (Junghanns and Eltern, 1999, Junghanns and Schaeffer, 1998a, Junghanns and Schaeffer, 1998b, Junghanns and Schaeffer, 2001a). Sokoban introduces several challenges to MCTS methods. First, the detection of early deadlocks is essential to solve a level, but can also dramatically reduce the search space by generating many early terminal states in the search tree. Furthermore, the generated game tree for Sokoban can potentially contain many cycles that must be identified to avoid wasting computation and memory resources. Accordingly, we introduced two extensions to SP-MCTS to deal with these issues by explicitly eliminating paths leading to deadlock states and by efficiently identifying and avoiding cycles. We also introduced three domain independent extensions that have been shown to improve MCTS performance, namely, UCB1-Tuned (Auer et al., 2002), Rapid Action Value Estimation (RAVE) (Gelly and Silver, 2007, Gelly and Silver, 2011) and Node Recycling (Powley et al., 2017). We considered four reward functions: (i) the standard one returning 1 for a solved level and 0 for an unsolved one; (ii) a reward based on the number of boxes correctly positioned on a storage position; and (iii) two reward functions based on the same heuristics used by IDA* to evaluate states. We performed a series of experiments to determine the best SP-MCTS configuration and then compared its performance to IDA* using the first 100 levels of the Microban suite (Skinner, 2021). We show that SP-MCTS can solve around 87% of the levels with 500000 iterations—the same performance reached by IDA* with only 10000 nodes (50 times fewer nodes than SP-MCTS). Overall, our results suggest that IDA* is still a difficult opponent to beat for MCTS when playing Sokoban. In our opinion, this mainly depends on the significant amount of domain knowledge needed to solve Sokoban levels that can be easily incorporated in IDA*, as published works demonstrate (Junghanns and Eltern, 1999, Junghanns and Schaeffer, 1998a, Junghanns and Schaeffer, 1998b, Junghanns and Schaeffer, 2001a), but it is difficult to exploit in MCTS. At the same time, our results show that general-purpose solutions that target specific issues introduced by Sokoban (and similar puzzles) can be effective.

The paper is organized as follows. In Section 2, we briefly overview the published work that is most relevant for our study. In Section 3, we introduce Sokoban and illustrate the challenges it poses to artificial intelligence methods like IDA*, described in Section 4. In Section 5, we briefly overview Monte Carlo Tree Search (MCTS) and its single player variant (SP-MCTS) while in Sections 6 Recursive node elimination and cycle avoidance, 7 UCB1-tuned, RAVE, and node recycling we discuss the extensions we added to deal with some of the issues that Sokoban introduces in MCTS solvers and to improve MCTS performance in a wide variety of domains. In Section 8, we illustrate the optimization that we introduced in the environment and in the IDA* solver for Sokoban following what done by Junghanns and Eltern (1999). Finally, we present the results of our experiments (Section 9) which we discuss in Section 10 where we also draw some conclusions and possible future research directions.

Section snippets

Related work

Puzzles are a very popular pastime and they are also an effective domain to challenge methods of artificial intelligence and machine learning. A*, Iterative Deepening A* (IDA*), and their enhancements have been widely applied to puzzles and IDA* is probably one of the most successful approaches so far (Junghanns and Eltern, 1999, Korf, 1985, Paumard et al., 2020, Soto et al., 2013). However, these techniques heavily rely on the quality of the evaluation function that guides the search. In

Sokoban

Sokoban5 is a single-player computer game created by Hiroyuki Imabayashi in 1981 and published in December 1982 by Thinking Rabbit, a software house based in Takarazuka, Japan. In Japanese, the word sōkoban means warehouse keeper. A Sokoban level is a grid in which each position is either a walkable floor or an impenetrable wall (Fig. 1). Floor positions can either be empty, contain a box, or be marked as storage (or goal); the number of storage positions

Iterative Deepening A*

Iterative Deepening A* (IDA*) Korf (1985) is one of the most successful methods for puzzle solving that can be found in the literature (Junghanns and Eltern, 1999, Korf and Taylor, 1996). It is based on the classical planning approach A* (Hart et al., 1968) from which it inherits the low memory complexity of depth first search, while retaining completeness even in presence of unlimited trees. At each iteration, IDA* performs a full depth-first search until either a solution is found or the

Monte Carlo Tree Search for puzzles

Monte Carlo Tree Search (MCTS) is a search algorithm that has been widely applied to games  (Browne et al., 2012, Cowling, Powley, and Whitehouse, 2012, Cowling, Ward, and Powley, 2012, Nijssen and Winands, 2011, Walton-Rivers et al., 2017). MCTS builds a partial and asymmetric search tree by performing random simulations. The algorithm consists of four steps (selection, expansion, simulation, and backpropagation) that are repeated in this order until an end condition is met, e.g., a limit of

Recursive node elimination and cycle avoidance

We introduced two extensions of MCTS that target problems with many early terminal states and problems with many cycle situation.

Recursive Node Elimination. In Sokoban, early deadlock detection can significantly reduce the search space by generating many early terminal states. Recursive Node Elimination is based on the observation that MCTS often repeatedly selected nodes that would lead to a deadlock inside the tree. This was caused by the constraints posed by the problem itself. And as

UCB1-tuned, RAVE, and node recycling

We also implemented three domain-independent enhancements for MCTS, UCB1-Tuned (Auer et al., 2002), Rapid Action Value Estimation (RAVE) (Gelly and Silver, 2007, Gelly and Silver, 2011) and Node Recycling (Powley et al., 2017).

UCB1-Tuned (Auer et al., 2002) is a bandit-based enhancement to tune more finely the bounds of UCB1. It uses Eq. (4) as the upper confidence bound for the variance of the arm j in a multiarmed bandit problem. Vj(s)=12τ=1sXj,τ2X¯2j,s+2lntsThis means that arm j, that has

IDA* for Sokoban

Sokoban is a challenging problem and, to reduce the search space, we implemented several optimizations taken from the literature (Junghanns & Eltern, 1999) that are specific to the game (Section 8.1) and to the IDA* algorithm we used for comparison (Section 8.2).

Experimental results

We present the results of a set of experiments we performed to compare the performance of IDA* against SP-MCTS over the first 100 levels of the Microban suite (Skinner, 2021)—a collection of Sokoban levels of various difficulty. At first, we analyzed the performance of IDA* on the Microban suite. Next, we identified the best configuration for MCTS/SP-MCTS for Sokoban consisting of the reward function, the exploration constants c, the SP-MCTS constant D, the simulation strategy, the role of

Conclusions

We applied the extension of Monte Carlo Tree Search for single player games (SP-MCTS) (Schadd et al., 2012) to Sokoban and compared its performance to a version of Iterative Deepening A* (IDA*) (Korf, 1985) specifically optimized for this task, following the work of (Junghanns and Eltern, 1999, Junghanns and Schaeffer, 1998a, Junghanns and Schaeffer, 1998b, Junghanns and Schaeffer, 2001a). Some of these optimizations concerned the problem itself (like for instance Push Level Search, Deadlock

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (49)

  • BaierH. et al.

    MCTS-minimax hybrids with state evaluations

    Journal of Artificial Intelligence Research

    (2018)
  • BrowneC. et al.

    A survey of Monte Carlo tree search methods

    IEEE Transaction Compution Intelligence and AI in Games

    (2012)
  • CazenaveT.

    Nested Monte-Carlo search

  • CazenaveT. et al.

    Towards deadlock free sokoban

  • CowlingP.I. et al.

    Information set Monte Carlo tree search

    IEEE Transaction Compution Intelligence and AI in Games

    (2012)
  • CowlingP.I. et al.

    Ensemble determinization in Monte Carlo tree search for the imperfect information card game magic: The gathering

    IEEE Transaction Compution Intelligence and AI in Games

    (2012)
  • CulbersonJ.C.

    Sokoban is PSPACE-completeTechnical report

    (1997)
  • CulbersonJ.C. et al.

    Searching with pattern databases

  • EdelkampS. et al.

    Finding the needle in the haystack with heuristically guided swarm tree search

    (2010)
  • FengD. et al.

    A novel automated curriculum strategy to solve hard Sokoban planning instances

  • GellyS. et al.

    Combining online and offline knowledge in UCT

  • GuezA. et al.

    Learning to search with MCTSnets

  • GuezA. et al.

    Learning to search with MCTSnets

    (2018)
  • HartP.E. et al.

    A formal basis for the heuristic determination of minimum cost paths

    IEEE Transactions on Systems Science and Cybernetics

    (1968)
  • View full text