1 Introduction

Bin packing problems are classical and popular optimization problems since 1970s, which have been extensively studied and widely applied in many applications [2], such as manufacturing, computer memory management [3], cloud resource assignment [4], logistics transportation, etc. As a set of combinatorial optimization problems [5], the bin packing problems have been derived into numerous variants based on different constraints for specific application scenarios, such as 1D/2D/3D packing, linear packing, packing with weight or cost constraint, etc. The objective of the single container 3D bin packing problem (3D-BPP) is to find a solution to pack a set of small boxes with possibly different sizes into a large bin, so that the volume utilization of the bin is maximized. A typical 3D bin packing solution is shown in Fig. 1. From decision making point of view, the 3D-BPP can be considered as a Markov Decision Process (MDP), where each step with an observed state \( s_{t} \in {\mathcal{S}} \) (in the state space: \( {\mathcal{S}} \)), an action \( a_{t} \in {\mathcal{A}} \) (in the action space: \( {\mathcal{A}} \)) is chosen to be taken based on a policy \( \pi :{\mathcal{S}} \to {\mathcal{A}} \), and the target is to find an optimal policy \( \pi^{*} \) to fulfill the packing task. Since the 3D-BPP is strongly NP-hard, it is hard to find an exact solution; therefore, lots of research works that focus on approximation algorithms and heuristic algorithms have been proposed. The first approximation algorithm for the 3D-BPP was proposed in [7] and its performance bound was thoroughly investigated. Based on conventional heuristic algorithms for the 3D-BPP and the Monte-Carlo Tree Search (MCTS) [1, 10], a novel heuristic search algorithm with dynamically trimmed search space, namely, the Quasi-Monte-Carlo Tree Search (QMCTS), is proposed to achieve high efficiency and effectiveness.

Fig. 1.
figure 1

A visualized 3D bin packing process of the first step (left), in which one box is packed into the left-bottom-back corner then cut the original space into 3 sub-spaces, and a final packing solution (right) generated by our system for the case BR7_0.

The remainder of this paper is organized as follows: the objective of the 3D-BPP is set and its related works are reviewed in Sect. 2. The proposed QMCTS scheme is detailed and discussed in Sect. 3. Its evaluation and performance comparison with state-of-the-art heuristic based approaches are shown in Sect. 4. Conclusions are drawn in Sect. 5, where future works will be envisioned as well.

2 Problem Definition and Related Work

2.1 The Objective of 3D Bin Packing Problem

Given a bin \( {\mathbb{B}} \) with its dimensions: width (\( W \)), height (\( H \)) and depth (\( D \)), and a set of boxes with their respective dimensions: width (\( w_{i} \)), height (\( h_{i} \)) and depth (\( d_{i} \)), the objective of 3D Bin Packing Problem (3D-BPP) is to find a solution that maximizes the volume utilization by filling these boxes into the given bin \( {\mathbb{B}} \). This objective is represented in the following Eq. (1) with the constraints specified from Eq. (2a) to Eq. (2p). The left-bottom-back of the bin is set as the (0, 0, 0) coordinate, and \( \left( {x_{i} ,y_{i} ,z_{i} } \right) \) is the coordinate of \( box_{i} \) in the given bin \( {\mathbb{B}} \). The details of variables used in the 3D-BPP definition are described in Table 1.

Table 1. The variables used for the 3D-BPP definition

Based on the descriptions of the 3D-BPP and notations in Table 1, given a bin \( {\mathbb{B}} \), the mathematical formulation for obtaining maximum volume utilization \( u\left( {\mathbb{B}} \right) \) on the 3D-BPP is defined as follows.

$$ \mathop {\hbox{max} }\nolimits_{{{\mathbb{I}}_{i} }} u\left( {\mathbb{B}} \right) = \mathop {max}\nolimits_{{{\mathbb{I}}_{i} }} \mathop \sum \nolimits_{i} {\mathbb{I}}_{i} \times \left( {x_{i} \times y_{i} \times z_{i} } \right) $$
(1)
$$ s.t.\left\{ {\begin{array}{*{20}l} { {u({\mathbb{B}})} \le W \times H \times I} \hfill & {({\text{a}})} \hfill \\ {{\mathbb{I}}_{i} \in \left\{ {0,1} \right\}} \hfill & {({\text{b}})} \hfill \\ {s_{ij} ,\,\,u_{ij} ,\,\,b_{ij} \in \left\{ {0,1} \right\}} \hfill & {({\text{c}})} \hfill \\ {\delta_{i1} \,\,\delta_{i2} \,\,\delta_{i3} \,\,\delta_{i4} \,\,\delta_{i5} \,\,\delta_{i6} \in \left\{ {0,1} \right\}} \hfill & {({\text{d}})} \hfill \\ {s_{ij} + u_{ij} + b_{ij} = 1} \hfill & {({\text{e}})} \hfill \\ {\delta_{i1} + \delta_{i2} + \delta_{i3} + \delta_{i4} + \delta_{i5} + \delta_{i6} = 1} \hfill & {({\text{f}})} \hfill \\ {x_{i} - x_{j} + W \times s_{ij} \le W - w_{i}^{*} } \hfill & {({\text{h}})} \hfill \\ {y_{i} - y_{j} + H \times u_{ij} \le H - h_{i}^{*} } \hfill & {({\text{i}})} \hfill \\ {z_{i} - z_{j} + D \times b_{ij} \le D - d_{i}^{*} } \hfill & {({\text{j}})} \hfill \\ {0 \le x_{i} \le W - w_{i}^{*} } \hfill & {({\text{k}})} \hfill \\ {0 \le y_{j} \le H - h_{i}^{*} } \hfill & {({\text{l}})} \hfill \\ {0 \le z_{j} \le D - d_{i}^{*} } \hfill & {({\text{m}})} \hfill \\ {w_{i}^{*} = \delta_{i1} w_{i} + \delta_{i2} w_{i} + \delta_{i3} h_{i} + \delta_{i4} h_{i} + \delta_{i5} d_{i} + \delta_{i6} d_{i} } \hfill & {({\text{n}})} \hfill \\ {h_{i}^{*} = \delta_{i1} h_{i} + \delta_{i2} d_{i} + \delta_{i3} w_{i} + \delta_{i4} d_{i} + \delta_{i5} w_{i} + \delta_{i6} h_{i} } \hfill & {({\text{o}})} \hfill \\ {d_{i}^{*} = \delta_{i1} d_{i} + \delta_{i2} h_{i} + \delta_{i3} d_{i} + \delta_{i4} w_{i} + \delta_{i5} h_{i} + \delta_{i6} w_{i} } \hfill & {({\text{p}})} \hfill \\ \end{array} } \right. $$
(2)

where \( s_{ij} = 1 \) if \( box_{i} \) is in the left side of \( box_{j} \), \( u_{ij} = 1 \) if \( box_{i} \) is under \( box_{j} \), \( b_{ij} = 1 \) if \( box_{i} \) is in the back of \( box_{j} \), \( \delta_{i1} = 1 \) if the orientation of \( box_{i} \) is front-up, \( \delta_{i2} = 1 \) if the orientation of \( box_{i} \) is front-down, \( \delta_{i3} = 1 \) if the orientation of \( box_{i} \) is side-up, \( \delta_{i4} = 1 \) if the orientation of \( box_{i} \) is side-down, \( \delta_{i5} = 1 \) if orientation of \( box_{i} \) is bottom-up, and \( \delta_{i6} = 1 \) if orientation of \( box_{i} \) is bottom-down. Constraints 2(n), (o), (p) denote the width, height and depth of \( box_{i} \) after orientating it. Constraints 2(e), (h), (i), (j) are used to make sure that there is no overlap between two packed boxes while constraints 2(k), (l), (m) are used to make sure that all boxes will not be put outside the bin, and constraint (2) guarantees that the total volume of all the packed boxes will not be greater than the volume of the given bin \( {\mathbb{B}} \).

2.2 Related Works

Although there can be lots of constraints on a real application based the 3D-BPP, the volume utilization is the primary objective in most scenarios. Most research works focus on the Single Container Loading Problem (SCLP), which means there is only one given bin and a set of boxes. The SCLP problem is proved as an NP-hard problem in the strict sense [18], so an exact solution can only be attained for a problem with small number of boxes as an example shown in [17]. Lots of heuristics, metaheuristics and incomplete tree search based methods have been proposed to generate approximated solutions. These algorithms can be roughly classified into three categories: constructive methods, divide-and-conquer methods and local search methods. The constructive methods [15, 22] can yield loading plans when recursively packing boxes into a given bin until when there is no box left for packing or when there is no space to fill. The divide-and-conquer methods [20, 21] divide the space of the given bin into sub-spaces, then recursively solve all the sub packing problems in the divided sub-spaces, and finally combine all the sub-solutions into a complete solution. The local search methods [6, 22] start with an existing solution, then new solutions can be further produced by applying neighborhood operators repeatedly.

The approaches with the best performance in recent literature [15, 22] share similar algorithm structures with block building inside, and the block building based approaches are constructive methods. Compared to the approaches with original simple boxes, the basic elements in the block building based approaches are blocks, in which a block is compactly pre-packed with a subset of homogenous or approximately homogenous boxes. Each packing step of a block building based approach involves an additional step to search potential blocks for the left free space in the given bin, and this operation will be repeated until no block can be found or be packed into the bin. As block building based approaches are technically superior to their competitors, it is thus chosen as the underlying technique of this proposed scheme rather than approaches that are directly dealing with simple boxes.

3 Proposed Method

3.1 Monte-Carlo Tree Search for Game Playing

Monte Carlo Tree Search (MCTS) is a heuristic search algorithm for decision making and is most notably employed in game playing. A very famous example of using MCTS on game playing is the computer Go programs [1]. Since its creation, a lot of improvements and variants have been published, and huge success has been achieved in board and card games such as chess, checkers, bridge, poker [27], and real-time video games.

The Workflow of Monte-Carlo Tree Search

Monte Carlo Tree Search (MCTS) is first employed in the Go game using the best-first search strategy, which is to find the best move (action) among all the potential moves. MCTS evaluates all the next move’s expected long-term repay by simulation techniques (using stochastic playouts on game playing). Potential moves are generated unequally via Monte Carlo sampling based on the simulation performance, then the most promising move(s) will be analyzed and expanded. Conventional MCTS algorithm normally consists of the following four steps, as shown in Fig. 2:

  • Selection: each round of the MCTS starts from the root-node and then select successive child-nodes down to a leaf-node. The Selection step tries to choose those child-nodes with exploration and exploitation strategy based on the biggest Upper Confidence Bound (UCB) value, the details of which will be discussed in the following section. With this strategy, the MCTS can expand the tree towards the most promising moves, which is the essence of MCTS.

  • Expansion: unless leaf-node ends the game with a win/loss for either player, the Expansion process will create one child-node or several child-nodes and then choose one child-node as the working-node among them.

  • Simulation: play with a random policy or with default policy to playout for the working-node.

  • Backpropagation: based on the playout result (e.g., the played times and won times for game playing), the node information is updated on the path from working-node back up to the root-node.

Fig. 2.
figure 2

The four regular steps of Monte Carlo Tree Search (MCTS) algorithm.

An example for the Go game playing is illustrated in Fig. 2, in which each tree node has the number of “won times/played times”, which is obtained from previous working rounds. So, in the Selection diagram, the path of nodes from the root-node with statistics 11/21, to 7/10, and then 5/6, and finally ended up at 3/3 is chosen. Then the leaf-node with statistics 3/3 is chosen as the working-node and is expanded. After the Simulation operation, all nodes along the Selection path increase their simulation count (the denominator) and won times (the numerator). Rounds of MCTS will be repeated until time out or when other conditions are satisfied. Then the optimal plan for the game playing is chosen, which is the move path with child-nodes having the optimal values (high average win rate).

Exploration and Exploitation via Upper Confidence Bound

The main consideration on MCTS is how to choose child-nodes. On game playing, MCTS tries to exploit the nodes (states) with high average win rate after moves (actions), at the same time, explores the nodes with few simulations. This balance is also the essence of MCTS. The first formula for balancing exploitation and exploration in game playing is the UCT (Upper Confidence Bound (UCB) applied to Trees) strategy, which is the selection strategy of the standard MCTS. The UCT strategy can also work for move pruning, and its performance is comparable to other classical pruning algorithms, e.g., the α-β pruning [28], which is a powerful technique to prune suboptimal moves from the search tree.

Given a state (node) \( s \) and the set \( A\left( s \right) \) of all potential actions (moves) in state \( s \), MCTS selects the most promising action \( a^{*} \in A\left( s \right) \) based on following UCB formula:

$$ a^{*} = \mathop {\text{argmax}}\nolimits_{a \in A\left( s \right)} \left\{ {Q\left( {s,a} \right) + c\sqrt {\frac{\ln N\left( s \right)}{{N\left( {s,a} \right)}} } } \right\} $$
(3)

where \( N\left( s \right) \) is the visiting count of the node (state) \( s \), \( N\left( {s,a} \right) \) is the count of move (action) \( a \) which is chosen based on node \( s \) visited and \( Q\left( {s,a} \right) \) is the average score (performance) after all the simulations with move \( a \) acted based on node \( s \). In this formula, the first component corresponds to exploitation, which is high for moves with high average win ratio, the second component corresponds to exploration, which is high for moves with few simulations, and the constant parameter \( c \) controls the balance. For the game playing, the UCB can be alternated with following formula:

$$ UCB = \frac{{w_{i} }}{{n_{i} }} + c\sqrt {\frac{{\ln N_{i} }}{{n_{i} }}} $$
(4)
  • \( w_{i} \) stands for the count of won for the node after the \( i_{th} \) move;

  • \( n_{i} \) stands for the count of simulations for the node after the \( i_{th} \) move;

  • \( N_{i} \) stands for the total number of simulations for the node’s parent-node after the \( i^{th} \) move;

  • \( c \) is the balance coefficient to represent the trade-off between exploration and exploitation, which is equal to \( \sqrt 2 \) theoretically and can also be set empirically.

3.2 Quasi-Monte-Carlo Tree Search for 3D Bin Packing

The Framework of Quasi-Monte-Carlo Tree Search

Based on the principle of the Monte-Carlo Tree Search (MCTS), a framework for solving the 3D Bin Packing problem (3D-BPP) is designed, in which conventional heuristic skills are incorporated into the MCTS based tree search scheme, and therefore, this new MCTS approximated scheme is named Quasi-Monte-Carlo Tree Search (QMCTS). As illustrated in Fig. 3, the proposed QMCTS algorithm also consists of four steps:

  • Selection: start from root-node and select successive child-nodes down to a leaf-node. Different from MCTS for game playing, here each node stores an evaluation value of the volume utilization, rather than the win-rate. During the Selection step, a leaf-node is selected by traversing the tree from the root-node onwards until a leaf-node if all the child-nodes in the path are no smaller than their respective Top-K values.

  • Expansion: Different from MCTS for game playing, that only one child-node is created for the leaf-node, for the 3D-BPP, several child-nodes are created as working-nodes.

  • Simulation: Simulating playout(s) for working-nodes is a domain problem. For the 3D-BPP, as shown in Fig. 3, an evaluation module is designed to do the playout part. Firstly, the working-nodes are expanded with a certain number of layers (e.g., 2) and several child-nodes (e.g., 3) for each layer. Then calculate all the volume utilization values for all the leaf nodes in the last expanded layer via our defined Generalized Rapid Action Value Estimation (GRAVE) module, which means to pack a block for every step (start from a chosen leaf node) with default policy (i.e., packing with the max size block). Finally, the maximum volume utilization value achieved from all leaf nodes is set to the simulated working-node.

  • Backpropagation: The volume utilization values for the working-nodes need to be updated after Simulation step. In the QMCTS scheme for the 3D-BPP, the volume utilization values are used to update a Top-K table, which will be described in the following section.

Fig. 3.
figure 3

The framework of Quasi-Monte-Carlo Tree Search (QMCTS) for 3D Bin Packing.

Exploration and Exploitation via Top-K Table

As discussed in previous section, conventional MCTS algorithms use the Upper Confidence Bound (UCB) to balance the exploitation of the estimated best move and the exploration of less visited moves for game playing. For the 3D-BPP, a Top-K table is designed to implement the same function as the UCB. As shown in Fig. 3, A Top-K table is designed, where each row works for each layer (except the top-layer: Layer0) for constructing the tree. To every tree node, only when its simulated volume utilization value is no less than the Top-K values in its layer (stored in the Top-K table), it can be expandable. Based on this idea, compared to conventional tree search based heuristic approaches, QMCTS can balance the search efficiency and effectiveness via clipping off the search space in both the breadth range and the depth range.

Generalized Rapid Action Value Estimation

The Rapid Action Value Estimation(RAVE) is a selection strategy proposed to speed up the move sampling inside the MCTS tree [23, 24], which can be considered as an enhancement to the All Moves As First (AMAF) [25, 26] strategy. RAVE recorded previous optimal moves as global optimal moves, which can be reused in future states. In this QMCTS scheme, a Generalized Rapid Action Value Estimation (GRAVE) strategy \( Q_{GRAVE} \left( {s,a} \right) \) is proposed for the 3D-BPP, which is derived from the concept of RAVE in MCTS. Using the GRAVE strategy, starting from a chosen leaf node (state) \( s \), an action \( a \) with default policy (i.e., packing the max size block) iterates until there is no block that can be packed or when other stop criterion is satisfied. After that, a packing plan can be generated and the its volume utilization will be compared with other rapid estimation values of the current node.

Parallelization of Quasi-Monte-Carlo Tree Search

The nature of MCTS, with its repeated and rapid playouts along with separable steps in the algorithm, enables parallel processing. There are generally three ways to parallelize search in an MCTS tree: at the leaves of the search tree, throughout the entire tree, and at its root. The parallelization scheme of QMCTS is to combine the merits from tree and root parallelization in MCTS. Firstly, in the root-node, multiple independent trees are built by separate threads, but different from the root parallelization in MCTS, all the threads can communicate among them via reading from and writing to the Top-K table, as illustrated in Fig. 3. Same to the tree parallelization in MCTS, multiple threads perform all four phases of the QMCTS scheme (descend through the search tree, add nodes, conduct playouts and propagate statistics with the Top-K table) at the same time. To prevent data corruption from simultaneous memory access, mutexes (locks) are placed on all the working threads for writing the shared Top-K table. The QMCTS scheme can be sped up in parallel process mode, which is able to theoretically outperform the depth-first search (DFS) and breadth-first search (BFS) based algorithms.

Workflow of Quasi-Monte-Carlo Tree Search Algorithm

The main technique of the Quasi-Monte-Carlo Tree Search (QMCTS) algorithm has been discussed in previous section. The workflow of QMCTS algorithm is described in Algorithm 1 and the evaluation module for the Simulation step is described in Algorithm 2, respectively.

figure a
figure b

4 Experiments

The proposed Quasi-Monte-Carlo Tree Search (QMCTS) algorithm is compared with a number of state-of-the-art algorithms with these practical benchmark datasets: BR1–BR7, including the Iterated Construction (IC) [11], the Simulated Annealing (SA) [12], the Variable Neighborhood Search (VNS) [13], the Fit Degree Algorithm (FDA) [14], the Container Loading by Tree Search (CLTRS) [15] and the Multi-Layer Heuristic Search (MLHS) [16]. The experimental results are listed in Table 2 (some statistics come from the published paper). As the data in the table shown, our proposed QMCTS method is obviously superior to most listed algorithms and even can achieve an average more 0.1% volume utilization gain when compared to the best algorithm.

Table 2. Comparison on BR instances (BR1–BR7)

Table 3 lists the properties of BR instances (BR1–BR7). The heterogeneity gets higher as the box type value gets larger, no matter how the number of type gets smaller.

Table 3. Properties of BR instances (BR1–BR7)

From Fig. 4, it can be clearly seen that the QMCTS algorithm constantly outperforms other algorithms in all the test datasets. The computation time among different algorithms is difficult to compare, as they are written in different languages and some of them are tested in dated platforms. The QMCTS algorithm is implemented C++ and tested in Intel-i7 machine (8 cores). It can achieve an average speed of around 60 s for one case, which is acceptable for real applications.

Fig. 4.
figure 4

Comparison on BR instances (BR1–BR7) between QMCTS and other methods

5 Conclusion and Future Works

During the past decades, classic three-dimensional bin packing problem (3D-BPP) has been tackled by lots of handcrafted heuristic algorithms, meanwhile the classic Monte-Carlo Tree Search (MCTS) algorithm has been widely employed in decision making applications, such as: computer games. In this paper, a Quasi-Monte-Carlo Tree Search (QMCTS) algorithm is proposed, which integrates conventional heuristic skills into the MCTS framework. The QMCTS scheme provides an efficient and effective tree search based heuristic technique to solve the 3D-BPP. Experiments have shown that QMCTS approach can consistently outperform recent state-of-the-art algorithms in terms of volume utilization. Furthermore, QMCTS can be sped up due to its parallel working mode.

In the future, firstly, intensive evaluation and optimization on dataset with higher heterogeneity, such as the BR8–BR15 datasets, is planned. Secondly, a more efficient Generalized Rapid Action Value Estimation (GRAVE) component is needed to further improve the QMCTS scheme. Finally, as the techniques on tasks with visual observations in Atari games [8], path-planning [9], and Google DeepMind’s AlphaGo algorithm [10] have witnessed the recent achievements in deep reinforcement learning (DRL), the question whether the 3D-BPP can be solved by combing DRL technology and the QMCTS algorithm is worthy of future research. The recent attempt [19] has tried to use DRL to solve the 3D-BPP with a Long-Short Term Memory (LSTM) based Recurrent Neural Networks (RNN), though no performance comparison on the standard benchmark datasets has been shown. DRL based artificial intelligence (AI) algorithm remains to be a possible solution to solve the classical NP-hard 3D-BPP.