1 Introduction

Nowadays, automatic identification of objects and individuals, and collecting data related to them without the necessity of human intervention is in high demand in many sectors such as industry, market, medicine, and finance [1, 2]. Several technologies have been designed and implemented so far to meet these needs. Barcodes, smart cards, optical character recognition (OCR), biometrics, and radio frequency identification (RFID) are some examples in this regard among which, the latter is the most dominant and has attracted a great deal of interest over the past decade [3,4,5].

RFID works wirelessly and is capable of identifying target objects by means of a reader device and an electronic tag which is usually attached to the object. Identification is completed when the reader successfully collects the tag ID via radio frequency data transmission. In some cases, it may forward the tag ID to a back-end system for storage and analysis [6]. This process is depicted in Fig. 1. Because of the reliability, flexibility, cost-efficiency, and long service life of RFID-based solutions [7], they are now ubiquitous in many domains such as contactless payment [8,9,10], customer behavior identification [11,12,13], product manufacturing [14,15,16,17,18], healthcare [19,20,21,22,23], access management [24,25,26,27], logistics management [28,29,30,31], and transportation [32,33,34] to name a few.

Fig. 1
figure 1

The RFID working mechanism

However, RFID networks often suffer from the serious issue of “tag collision” which degrades their performance [35, 36]. Collision happens when multiple tags are transmitting simultaneously to the reader within its interrogation zone (Fig. 2). Such an interference confuses the reader, resulting in incorrect identification of the tags. Tag starvation is a common consequence in this situation where the reader fails to identify a tag for a long time due to involving in endless collisions.

Fig. 2
figure 2

Diagram of tag collision problem in the identification range of a reader

A number of RFID anti-collision algorithms have been proposed to tackle the tag collision problem. These algorithms are designed to achieve an efficient tag reading rate. The time division multiple access (TDMA)-based anti-collision protocols are more frequently used because of their robustness and easy implementation [37]. TDMA-based algorithms are mainly divided into two categories: tree-based (deterministic) methods [38,39,40,41,42] and ALOHA-based (probabilistic) methods [43,44,45,46]. The deterministic algorithms such as collision tree (CT) [47,48,49], query tree (QT) [50,51,52,53,54], binary search (BS) [37, 55,56,57,58] and bit-locking back-off (BLBO) [59] ensure the identification of all tags in range, but often fail to achieve a low communication overhead. On the other hand, ALOHA-based algorithms as in [60, 61] are simple and fast, but suffer from the tag starvation problem.

Despite the versatility of the tree-based algorithms, all of them use conventional search methods to reduce the required computational power. Nevertheless, with the recent introduction of low-cost, high-power embedded computing systems (e.g. ESP8266, Raspberry Pi), it is now possible to consider the heuristic search algorithms, such as the best-first search, the A* algorithm, simulated-annealing, genetic algorithms [62], and the Monte–Carlo Tree Search (MCTS) [63], in order to design more efficient RFID anti-collision systems.

In this paper, we present the incorporation of a novel modified version of the MCTS into the QT algorithm. The new algorithm is compared to the previously reported methods, and its effectiveness for RFID collision mitigation is investigated. The paper is organized as follows: Sect. 2 describes the MCTS and QT as heuristic and non-heuristic conventional search algorithms, respectively, as they are the building blocks of the new anti-collision algorithm. Section 3 introduces the idea of combining the MCTS and QT to benefit from the best of both worlds. The novel anti-collision protocol, as well as a simple trick to reduce its computation costs, are then introduced, and described in detail. Section 4 presents the time, and space complexity analyses of the proposed method as well as the results of simulations which are compared to some of the current tree-based algorithms. Finally, Sect. 5 concludes the paper.

2 Conventional search algorithms

The core of a tree-based anti-collision algorithm is the search procedure that is used to explore the search tree, i.e., the data structure formed by the tags’ IDs. Search algorithms can be categorized into two types: heuristic and non-heuristic. A heuristic search strategy aims to optimize a problem by iteratively improving the solution based on a heuristic function (cost measure) [64]. In contrast to non-heuristic search algorithms, which follow deterministic routines towards an optimal solution, heuristic algorithms are not always guaranteed to find an optimal solution, hence terminating at a suboptimal point. However, the solution is found faster and with less memory requirements. In this section, the MCTS and the QT are briefly introduced, as a random-heuristic search algorithm, and a deterministic and non-heuristic one, respectively. These algorithms construct the very first building blocks of the new anti-collision scheme which will be presented in Sect. 3.

2.1 Monte–Carlo tree search

MCTS [65] is one the most popular planning algorithms which have been used as a reliable decision-making method in many perfect information games (e.g., Google DeepMind’s AlphaGo [66]). In addition to being extraordinarily successful in the field of game playing, MCTS can be a suitable solution for other types of problems, specifically, the ones that can be characterized from the perspective of “states” and “actions”. The primary purpose of MCTS is to discover the most promising actions in order to reach the most promising states based on the results of the random simulations, i.e., playouts. A playout is the action of tree traversing in which the moves are selected randomly (or pseudo-randomly), starting from a root node to a terminal state. Final results of the playout are used to update the statistics of the game tree nodes along the simulation path. These statistics are then used to take subsequent actions such that they would be most likely to result in a win or a desirable terminal state.

The number of iterations in the MCTS depends on the complexity of the task to be completed as well as the computational capacity, e.g., number of processor cores. There are four phases in each iteration that fulfill the search procedure [67]:

  • Selection phase Start from the root node S0, recursively advance to the selected child nodes one by one until a leaf node (i.e., a node with no children) is reached.

  • Expansion phase If the leaf node does not satisfy the terminal condition of the game, add a new child node into the tree for each available action from the game position, and randomly select one of them, and label it as C.

  • Simulation phase Initiate a playout from C, and walk down the tree until a terminal state is reached, and the game result is concluded.

  • Backpropagation phase Update the statistics associated with all of the visited nodes during the iteration, except for the ones on the simulation path (i.e., from S0 to C).

Two statistics must be stored for each node: (a) the number of times that it has been visited, and (b) total playout reward, i.e., the number of times that simulations passed through that node has resulted in a win. These four phases are run repeatedly for a desired period of time; eventually, once the MCTS procedure is finished, the best path to move through is the one whose nodes are visited the most.

The core part of the MCTS is the action selection strategy (tree policy) for the selection phase. It is commonly a function of the node statistics, and the child node that maximizes its value is selected to traverse through in the selection phase. In essence, we are willing to select not only the moves that have a higher chance of winning but also the ones which are not explored enough during the iterations of the algorithm. Therefore, we are facing the so-called exploration–exploitation dilemma, that is the problem of balancing exploration and exploitation between multiple options, each with an unknown payout, and the goal is to determine the best moves to achieve the highest expected gain. This is the main challenge of multi-armed bandit (MAB) problem which is a classic reinforcement learning (RL) problem.

One of the solutions to the MAB problem is the Upper Confidence Bound 1 (UCB1) which is called the UCT (Upper Confidence bound 1 applied to Trees) when applied to the MCTS [68]. The UCT is a value-based RL algorithm and its action selection function is given by

$$UCT\left( i \right) = \frac{{v_{i} }}{{n_{i} }} + c\sqrt {\frac{\ln N}{{n_{i} }}}$$
(1)

where \(v_{i}\) is the total simulation reward of the child node i, \(n_{i}\) is the total number of visits to the child node i, and \(N = \sum\nolimits_{j} {n_{j} }\) is the total number of simulations. \(c\) is a constant value which adjusts the balance between exploitation and exploration, and is analytically proven to be equal to \(\sqrt 2\) [68]; however, it can be chosen according to the nature of the problem.

The MCTS can be easily implemented and it operates effectively with no prior knowledge. The search tree in MCTS grows asymmetrically as the explorations become concentrated on the more promising subtrees. This is due to the RL-based nature of the UCT, which leads MCTS to achieve better results than the classical algorithms for games with high branching factors. Furthermore, it is possible to interrupt MCTS at any time to yield the most promising move already found. A disadvantage of the MCTS is its high memory requirements due to the rapid growth of the search tree after the first few iterations.

2.2 Query tree

QT algorithm [50] follows a request-response routine between the reader and the tags, which can be represented in the form of a full \(m\)-ary tree. In each interrogation, the reader transmits a query including a binary string which is called a prefix. After receiving the reader’s query command, every tag within the effective identification range of the reader, compares the prefix with its ID to acknowledge whether it matches the front part (left side) of the ID. In case of a match, the tag transmits the remaining of its ID back to the reader; otherwise, it waits for the next query.

There are three cases which may occur after reader’s query: (1) only one tag responds to the reader’s query; thus, the query leads to the successful identification of that readable tag. (2) there is no match for the transmitted prefix and no tag responds to the reader, which is an idle state, and (3) there may be two or more responses from the tags to the reader which is a collision. If a collision occurs, the reader appends an \(\ell\)-bit binary string to the current prefix, and queries for the new prefix. This will introduce \(m = 2^{\ell }\) new child nodes into the search tree, and the reader will search for them respectively. \(m\) is the branching factor of the tree, and such a tree structure is called binary tree for \(\ell = 1\), quadtree for \(\ell = 2\), octree for \(\ell = 3\), and \(m\)-ary tree in general. The reader continues this procedure until one tag is successfully identified. It then sends a command to put the identified tag into the halt state. Afterward, the reader starts a new set of queries with other prefixes and looks for other tags, and it stops when all of the tags are identified.

Take for example a network with four tags, which their unique IDs are: 1001, 0010, 1011, and 1000. If we use the binary tree, we will encounter three collision nodes, one idle node, and four readable nodes. The QT for this network is shown in Fig. 3.

Fig. 3
figure 3

An example of binary QT for a network with four tags: 1001, 0010, 1011, and 1000

QT algorithm is simple, easy-to-implement, and imposes minimal computational and memory requirements on tags. In fact, each tag is memoryless (i.e., only its ID needs to be stored), just comparing its ID with the reader’s query [50]. Thus, the hardware cost is low. However, the performance of the QT algorithm is greatly affected by the distribution of the tags’ IDs due to the mechanical update routine of the query prefix. As a consequence, QT produces a large number of idle timeslots.

3 Monte–Carlo query tree search

Although heuristic search algorithms are capable of handling complex problems, they have not been used for anti-collision schemes so far. This is probably due to their sub-optimality and the tendency of the RFID community to employ the classical search algorithms. For the case of the RFID anti-collision schemes, even a non-heuristic algorithm should be deterministic in order to guarantee the successful identification of all tags. However, the high computational overhead of non-heuristic algorithms might become an issue. One can consider a combination of heuristic and non-heuristic search strategies as a possible solution to the multiple tag identification for RFID systems to benefit from the optimality of the former and the acceptable time complexity of the latter.

The main idea behind the proposed method is to fit QT to the search tree of the tag identification problem, and use a modified version of MCTS to explore it. In this way, we can have the memoryless nature of the QT as well as its optimality, and improve the tree exploration by leveraging RL-based nature of the MCTS. We propose a new variant of MCTS, aiming to emphasize the actions that lead to finding uninvestigated or collision nodes (which will be called valuable nodes, from now on), and penalize those of which approach idle nodes. This strategy has two advantageous outcomes: (1) by shrewdly passing through valuable nodes, we aim to reach readable nodes in the deeper layers of the tree, and (2) at the same time, we eliminate the idle timeslots of the reader’s interrogation by avoiding the idle nodes.

In this work, we have viewed the tag identification as a single-player deterministic game in order to learn the best actions by which we can win the game, i.e., successfully identify all of the available tags, as quick as possible. We represent the entire process in the form of a tree data structure in which each node corresponds to the action of appending an \(\ell\)-bit code from the set \(\left\{ {0 \cdots 00,0 \cdots 01, \cdots ,1 \cdots 11} \right\}\) to the current prefix in QT. By employing MCTS with a new upper confidence bound strategy, a reward is given to the appending actions that lead to valuable nodes, since they are the ones which should be explored further. Tag identification is performed by exploring the search tree based on the modified MCTS algorithm, and the conduction of the QT routine when passing each node. Next decisions are made based on the performance of the previous actions. Eventually, all the tags will be identified after a number of iterations of this Monte–Carlo Query Tree Search (MCQTS) algorithm.

While using MCTS necessitates postponing the actual tag reading until the end, the proposed method (MCQTS) manages to do this during the Monte–Carlo iterations. Therefore, the simulation (playout) phase plays a significant role in the tag identification, and improving its efficiency is of paramount importance. We propose the following modifications to the MCTS for the purpose of devising an effective search:

  1. 1.

    In order for the game actions to quickly converge to the optimal ones, the statistics of the nodes visited during the simulation are also recorded. To this end, two arrays are set to store \(n_{i}\) and \(v_{i}\) values, each with a length of

    $$L = \sum\limits_{i = 0}^{D} {m^{i} } = \sum\limits_{i = 0}^{D} {2^{\ell i} }$$
    (2)

    where \(m\) and \(\ell\) are the same as in QT, and \(D\) is the maximum depth of the tree which is given by:

    $$D = \left\lceil {\frac{K}{\ell }} \right\rceil$$
    (3)

    where \(K\) is the length of the tag ID and \(\left\lceil . \right\rceil\) is the ceiling function.

  2. 2.

    A “reward and punishment” scheme is adopted to keep the score throughout the game. The paradigm for such a scheme is demonstrated in Fig. 4. The tag identification is best done when the idle nodes are avoided the most, and the valuable nodes are explored quicker. As a result, an action is rewarded if it leads to discovering valuable nodes, and punished otherwise. The reward is given in the form of a unit increment in the reward value (\(v_{i}\)) of some associated nodes.

  3. 3.

    The terminal state of the game is determined to be either a readable node or an idle node which is treated with punishment once it is reached. If the path from S0 to the parent node of an encountered terminal node contains at least one collision node, every node along the path is rewarded. This is because we intend to further investigate such a path as it has a high chance of having undiscovered readable nodes. In addition, we set an array \(\omega\) (with the length \(L\)) to store the status of nodes. The ith element of \(\omega\) (\(\omega_{i}\)), takes the values 0, 1, 2, and 3 for uninvestigated, collision, idle, and identified nodes, respectively. Each \(\omega_{i}\) is initially set to zero before the reader’s interrogation, and takes a status value once the reader’s query for the prefix of the ith node is finished.

  4. 4.

    The indices of the nodes are assumed to start from zero (corresponding to the initial state S0), and incremented by one for successive nodes, which are ordered ascendingly by their associated prefix in each layer. Hence, the indices of the children of the ith node in the tree are: \(\left\{ {2^{\ell } i + j} \right\}_{j = 1}^{{2^{\ell } }}\). We define a zero–one status function as

    $$\phi \left( i \right) = \left\{ {\begin{array}{*{20}c} 1 & {\omega_{i} = 0,1} \\ 0 & {otherwise} \\ \end{array} } \right.$$
    (4)

    where \(\phi \left( i \right)\) is equal to one for all valuable nodes, and zero otherwise. Let \(\alpha \left( {j|i} \right)\) be the action of selecting a child node j, given that the algorithm is at the ith node. Thus, the probability of each random action in the playout is given by

    $$\Pr \left( {\alpha \left( {j|i} \right)} \right) = \frac{1}{{\sum\limits_{{k = 2^{\ell } i + 1}}^{{2^{\ell } (i + 1)}} {\phi (k)} }}$$
    (5)

    which is uniformly randomly distributed. By using the status function, we penalize the ith node once for each action \(\alpha \left( {j|i} \right)\) if all of the children of the node are either already identified or idle. For this, the reward value of the ith node is decremented by one.

  5. 5.

    A constraint is put on the choice of j in \(\alpha \left( {j|i} \right)\) so that the playouts are selected randomly from a set of valuable child nodes, so that the nodes that have been identified before, or are idle, are not considered as the possible paths to move through. We make a minor modification in the original UCT formula (Eq. 1) and propose a new function, given by

    $$UCT_{MCQTS} (i) = \phi (i)\left( {\frac{{v_{i} }}{{n_{i} }} + 2\sqrt {\frac{{\ln n_{p(i)} }}{{n_{i} }}} } \right)$$
    (6)

    where \(c = 2\) is chosen empirically, and \(p(i) = \left\lfloor {\frac{i - 1}{m}} \right\rfloor\) gives the index of the parent of the ith node. Here, we have used the number of visits to the parent node of the competing child nodes (\(n_{p(i)}\)) instead of using the total number of playouts (\(N\)) to take a more local decision. Furthermore, we have multiplied the UCT formula by the status function to ensure that the idle or identified nodes are not involved in the action selection. Such a tree policy ensures that the need for exploring the useless parts of the search tree during the simulations, is eliminated.

    Fig. 4
    figure 4

    Reward and punishment scheme

Having applied the aforementioned modifications, we exploit the algorithm to recognize the tags in the network. The complete flowchart of the MCQTS is depicted in the Fig. 5. Obviously, the \(UCT_{MCQTS}\) values must be computed for some nodes in each iteration, which might be computationally demanding and slow down the identification process. A simple yet effective way to reduce the computation costs is to use look-up tables (LUTs) to store the results of some frequent computations. According to Eq. (6), the first term in the parenthesis is the number of a node visits divided by its reward value. The most frequent pairs of \(\left( {v_{i} ,n_{i} } \right)\) occurred in the iterations usually correspond to the nodes that are located at the upper layers of the tree, such as (0, 0), (0, 1), (1, 2), …, and (1, \(2^{\ell }\)). Nevertheless, these pairs can be chosen according to the specified network. Thus, we can construct a small LUT to store the calculated values of \({{v_{i} } \mathord{\left/ {\vphantom {{v_{i} } {n_{i} }}} \right. \kern-\nulldelimiterspace} {n_{i} }}\) for each pair, and use it in \(UCT_{MCQTS}\) computations. This simple trick has a considerable effect on the energy efficiency of the algorithm.

Fig. 5
figure 5

Flowchart of the MCQTS algorithm

4 Performance analysis

4.1 Time complexity

Time complexity of an anti-collision algorithm is defined as the number of required query cycles in a complete round of the recognition of a set of tags. It is commonly expressed as a function of the number of tags to be identified, as well as in the form of the big O notation. Here, we first derive the time complexity of the modified MCTS and use it in tandem with that of the QT to derive the time complexity of the proposed MCQTS.

Deriving the time complexity of the original MCTS is straightforward. MCTS executes four phases for, say \(N_{I}\), iterations. Therefore, we can express the overall time complexity as:

$$O\left( {N_{I} (t_{s} + t_{e} + t_{p} + t_{b} )} \right)$$
(7)

where \(t_{s}\), \(t_{e}\), \(t_{p}\), and \(t_{b}\) are the contributions of the running times of selection, expansion, simulation, and backpropagation phases, respectively. Since \(m\) nodes in every layer are involved in the selection phase, this phase runs in \(O(mD)\); hence, \(t_{s} = mD\). Expansion phase takes constant time (\(O(1)\)) as it just adds a number of child nodes to the tree; hence \(t_{e} = 1\). Simulation and backpropagation take time proportional to the depth of the tree, i.e. \(O(D)\); hence \(t_{p} = t_{b} = D\). The worst-case corresponds to simulation from S0 or the backpropagation from a leaf node in the last layer. Substituting the results in Eq. (7), we obtain the time complexity of the MCTS:

$$O\left( {N_{I} (mD + 1 + 2D)} \right) = O\left( {N_{I} mD} \right)$$
(8)

By adopting the tree policy of Eq. (6), only valuable nodes are involved in the action selection. Thus, the selection phase of the modified MCTS runs in \(O(\widehat{m}D)\) with \(\widehat{m} \le m\). A reasonable choice for \(\widehat{m}\) is the expected number of valuable nodes in the tree (\(E\left[ {\phi (i)} \right]\)); however, we rather use its upper bound (\(m\)) as we are interested in the worst-case scenario. If we assume not every node is assigned to a tag, it is possible to have some nodes at last layers with more than one idle child. Such nodes may not be discovered in a single iteration. Thus, it is reasonable to say

$$\begin{array}{*{20}c} {N_{I} = \frac{{N_{t} }}{\gamma },} & {0 < \gamma \le 1} \\ \end{array}$$
(9)

where \(\gamma\) is an efficiency coefficient, which is a measure of how well the algorithm operates, i.e., how quickly it identifies all of the tags. A large \(\gamma\) corresponds to smaller number of iterations, lower computation cost, and higher speed. In practice we do not expect \(\gamma\) to be smaller than 0.5 (typically); however, this must be ensured through evaluations. One can easily deduce that the efficiency coefficient is strongly dependent on the way the tree is traversed during the tag identification. In fact, the purpose of the rules we established for tree traversal in Sect. 3, was to achieve values of \(\gamma\) as close as possible to 1. Substituting Eq. (9) in Eq. (8) and using the upper bound of \(\widehat{m}\), we obtain:

$$O\left( {\frac{{N_{t} }}{\gamma }mD} \right) = O\left( {N_{t} mD} \right)$$
(10)

Since \(D = \left\lceil {\frac{K}{\ell }} \right\rceil \le \frac{K}{\ell } + 1\), we can rewrite Eq. (10) as follows:

$$O\left( {N_{t} m\left( {\frac{K}{\ell } + 1} \right)} \right) = O\left( {\frac{mK}{\ell }N_{t} } \right)$$
(11)

Equation (11) presents the worst-case time complexity of the modified MCTS.

On the other hand, according to [50], the average time complexity of the QT algorithm is given by:

$$T_{QT} (N_{t} ) = 2.887N_{t} - 1$$
(12)

while its worst-case time complexity is as follows:

$$O\left( {N_{t} \left( {K + 2 - \log N_{t} } \right)} \right)$$
(13)

Although the worst-case complexity of Eq. (13) looks complicated, it is proved in [50] that the QT has high probability of having linear running time (\(O\left( {N_{t} } \right)\)). In fact, the authors showed the probability that QT takes at least \(cN_{t}\) steps to identify \(N_{t} \ge 2\) tags is at most \(\exp \left( { - 0.4N_{t} ({c \mathord{\left/ {\vphantom {c 2}} \right. \kern-\nulldelimiterspace} 2} - 2)} \right)\) which decreases exponentially as \(N_{t}\) increases. Thus, we assume \(O\left( {N_{t} } \right)\) for the running time of the QT, and combine it with the time complexity of the modified MCTS to obtain the overall time complexity of the MCQTS in the worst-case scenario:

$$T\left( {N_{t} } \right) = O\left( {\max \left( {\frac{mK}{l}N_{t} ,N_{t} } \right)} \right) = O\left( {\frac{mK}{l}N_{t} } \right) = O\left( {N_{t} } \right)$$
(14)

where \({{mK} \mathord{\left/ {\vphantom {{mK} \ell }} \right. \kern-\nulldelimiterspace} \ell }\) is regarded as constant and omitted as for a fixed network architecture. Hence, MCQTS takes linear time. However, the complexity of MCQTS is greatly influenced by the value of \(K\) due to the nature of QT, and care should be taken with the choice of the ID length when using MCQTS. Large values of \(K\) (e.g. 96 or 198 for EPC Global tags), and small values of \(\ell\), results in an additional computational overhead by increasing the depth of the tree, and changing the time complexity (see Eqs. 10, 11). Note that the time complexity is not the only issue in this situation, since a large \(K\) (or small \(\ell\)) also results in higher space complexity due to the dramatic increase of \(L\), which will be discussed in Sect. 4.2.

Table 1 compares the time complexity of MCQTS to five other tree-based algorithms: BS [35], QT [50], CT [47, 48], BLBO [59], and the proposed method in [61] (which we shall call CQT, since it is based on compressed query tree), according to [69]. As can be seen from Table 1, CT, BLBO and CQT take linear time and their complexities is only affected by the number of tags, regardless of the ID length. Time complexity of the BS does not depend on ID length either; however, the existence of \(N_{t} !\) inside the big O notation makes it less efficient than others. We argue that the MCQTS belongs to the class of linear time anti-collision algorithms if the ID length is not very long, and inherits this feature from its ancestor (QT), which has linear running time with high probability. Nevertheless, this does not make it quite equivalent to CT, BLBO, and CQT for which the time complexities are given by:

$$T_{CT} \left( {N_{t} } \right) = T_{BLBO} \left( {N_{t} } \right) = T_{CQT} \left( {N_{t} } \right) = 2N_{t} - 1$$
(15)

showing linear time dependency, regardless of the ID length. The time complexities of other tree-based algorithms such as multi-bit identification collision tree (MICT) [49], and improved quad-tree (IQT) [42] lie somewhere in between the mentioned ones in Table 1. MICT is a CT-based scheme and its time complexity is given by:

$$T_{MICT} (N_{t} ) = \frac{{\sum\nolimits_{\ell = 2}^{{2^{\ell } }} {{{(\ell N_{t} - 1)} \mathord{\left/ {\vphantom {{(\ell N_{t} - 1)} {(\ell - 1)}}} \right. \kern-\nulldelimiterspace} {(\ell - 1)}}} }}{{2^{\ell } - 1}}$$
(16)

where \(\ell\) is the length of the query prefix. It has been proved in [49], that for \(\ell \ge 2\), the time complexity of the MICT is lower than that of CT. IQT, on the other hand, is a modified form of quadtree search and its time complexity is given by:

$$T_{IQT} (N_{t} ) = \sum\limits_{D = 0}^{\infty } {4^{D} \left[ {1 - (1 - 4^{ - D} )^{{N_{t} }} - N_{t} 4^{ - D} (1 - 4^{ - D} )^{{N_{t} - 1}} } \right]} + N_{t}$$
(17)

IQT is superior to the original quadtree search and BS, as it can eliminate the idle timeslots of the tag identification.

Table 1 Time complexity comparison

4.2 Space complexity

Space (memory) complexity of an anti-collision algorithm is defined as the size of the memory (in bits) required for the reader during the tag identification. Similar to the time complexity, we derive the space complexity of the MCQTS in terms of big O notation to see how much the required storage grows, in the worst case, as \(N_{t}\), \(K\), or \(\ell\) increase. As with the time complexity formula, we derive the space complexity of the MCQTS by firstly deriving the space complexity of the modified MCTS, and combining it with that of the QT.

The proposed variant of MCTS requires three arrays to store the values of \(\left\{ {\omega_{i} } \right\}_{i = 1}^{L}\), \(\left\{ {n_{i} } \right\}_{i = 1}^{L}\), and \(\left\{ {v_{i} } \right\}_{i = 1}^{L}\). \(b_{1}\), \(b_{2}\), and \(b_{3}\) are defined as the required number of bits to store \(\omega_{i}\), \(n_{i}\), and \(v_{i}\) in the memory, respectively. We express the space complexity using the contribution of these arrays:

$$O\left( {(b_{1} + b_{2} + b_{3} )L} \right)$$
(18)

Each \(\omega_{i}\) can take four different values (\(\omega_{i} \in \left\{ {0,1,2,3} \right\}\)), hence it can be represented with a 2-bit code, i.e., \(b_{1} = 2\). Apart from the S0, the following inequality holds for the maximum number of visits to a node:

$$\max \left( {n_{i} } \right) \le \left \lfloor \frac{L}{m} \right \rfloor,\quad i = 2,3, \cdots ,L$$
(19)

where the upper bound corresponds to the maximum number of simulations of a child node. By assuming the upper bound of Eq. (19), and taking \(\left\lceil {{L \mathord{\left/ {\vphantom {L m}} \right. \kern-\nulldelimiterspace} m}} \right\rceil \approx {L \mathord{\left/ {\vphantom {L m}} \right. \kern-\nulldelimiterspace} m}\) for the sake of simplicity, at most \(b_{2} = \left\lceil {\log_{2} ({L \mathord{\left/ {\vphantom {L m}} \right. \kern-\nulldelimiterspace} m})} \right\rceil\) bits are required to store each \(n_{i}\) value. On the other hand, as the reward and punishment rules of the MCQTS cancel out each other most of the times, the value of \(v_{i}\) is always (much) less than the upper bound of Eq. (19); hence: \(b_{3} < b_{2}\). Thus, \(b_{2} L\) is the most significant term in Eq. (18), and the worst-case space complexity of the modified MCTS becomes:

$$O\left( {b_{2} L} \right) = O\left( {\left\lceil {\log_{2} \left( \frac{L}{m} \right)} \right\rceil L} \right) = O\left( {\left\lceil {\log_{2} L - \ell } \right\rceil L} \right)$$
(20)

Space complexity of the QT in the worst-case scenario is \(O(2^{K} K)\), since the maximum size of each query prefix is \(K\) bits, and the maximum number of queries is \(2^{K}\) [61]. By integrating the two space complexities, we obtain the worst-case reader-side space complexity of the MCQTS as:

$$O\left( {\max \left( {\left[ {\log_{2} L - l} \right]} \right)L,2^{K} K} \right)$$
(21)

Equation (21) reveals that although the MCQTS has fine performance in terms of time complexity, the amount of memory required to run the algorithm is rather high. Increasing the length of the tag ID can further exacerbate the problem. As discussed in Sect. 2.1, the drawback of the MCTS is its high memory demand, and the MCQTS has inherited it to some extent. Overall, it can be said that the MCQTS is more memory-efficient for the tags having smaller ID length.

We analyze the memory requirements of the MCQTS in two different scenarios: \(K = 10,\ell = 1\) as a typical scenario and \(K = 24,\ell = 4\) as a more complicated one. If we neglect \(b_{3}\), the RFID reader will require approximately 4.3 kilobytes for the first scenario and approximately 101.8 megabytes for the second one. We argue that both scenarios can be tolerated by low-cost, high-power embedded computing systems as RFID readers. For instance, a low cost ESP8266 module, e.g., ESP-01, can handle the first scenario, and even the obsolete generations of Raspberry Pi, e.g., Pi 1 Model A and Pi 1 Model B, could handle the second scenario, not to mention the recent releases.

4.3 Simulation

Besides the theoretical analysis, simulations are performed to obtain more realistic performance measures of the proposed method. To this end, we have implemented the QT via computer programming in which every node has been represented by its corresponding \(\ell\)-bit binary string (see Fig. 3). After writing the simulation codes of the proposed modified MCTS, we run it together with that of QT to simulate the MCQTS. No special toolboxes or libraries used for simulation, only normal programming scripts. It is worth noting that the random playouts were implemented using pseudo-random number generators, initialized with simple seeds such as the CPU clock or the current time.

We have simulated the MCQTS algorithm with different values of \(\ell\) to assess its performance in tag identification. The performance is expressed in terms of the number of query cycles needed to identify a maximum number of 500 tags. Figure 6 shows the simulation results of the MCQTS with \(\ell = 1\) (binary-MCQTS), \(\ell = 2\) (quad-MCQTS), \(\ell = 3\) (oct-MCQTS), and \(\ell = 4\) (hex-MCQTS), with a same ID length in every case. The data points in the curves are the average values of 50 simulations. According to (2) and (3), if \(K\) is fixed, the depth of the search tree as well as the total number of the nodes (\(L\)) decreases with the increase of \(\ell\) and thereby the tree shrinks and becomes shallower. In such a case, the MCQTS deals with a simpler search tree and hence the tag identification can be performed faster. This can be ensured from the simulation results depicted in Fig. 6. The hex-MCQTS completes the tag identification in 560 query cycles, while oct-MCQTS, quad-MCQTS, and binary-MCQTS require 684 and 1215 query cycles, respectively. Note that although incrementing the value of \(\ell\) beyond 4 (which increases the branching factor, i.e., \(m = 2^{\ell }\)) leads to simpler search spaces, the selection phase of the Monte–Carlo iterations becomes more challenging due to the generation of more competing child nodes. Hence, performance improvement is not always guaranteed. In the MAB sense, increasing the branching factor is equivalent to increasing the number of arms, making the resource (playout) allocation harder. Moreover, the choice of \(\ell\) also depends on the number and the ID length of the tags we want to reside in the network.

Fig. 6
figure 6

Comparison of identification delay of the MCQTS with \(\ell = 1\), \(\ell = 2\), \(\ell = 3\), and \(\ell = 4\) based on the query cycles

One way to confirm the result of Eq. (14) is to fit a line to each performance curve resulted from simulations (number of query cycles vs. number of tags) and check the magnitude of the slope of the lines, and whether they are well fitted to the data points. By applying linear least-squares regression on the data points of each curve in Fig. 6, we obtain four regression lines corresponding to binary-, quad-, oct-, and hex-MCQTS:

$$\begin{aligned} T_{\ell = 1} (N_{t} ) &= 2.17N_{t} + 240.24 \\ T_{\ell = 2} (N_{t} ) &= 1.19N_{t} + 117.18 \\ T_{\ell = 3} (N_{t} ) &= 1.05N_{t} + 64.60 \\ T_{\ell = 4} (N_{t} ) &= 1.03N_{t} + 43.57 \\ \end{aligned}$$
(22)

Equation (22) indicates that the maximum slope of the regression lines is 2.17 corresponding to binary-MCQTS, and the rest of the estimated lines grow slower than that. Such mild slopes are quite typical for linear time complexity expressions. Moreover, the average deviation from the true data points is approximately 9%, and the data points corresponding to the small number of tags have contributed more to this error. Hence, the result of Eq. (14) agrees with the empirical experiments. However, the resulted vertical intercepts are not typical. This is because the MCQTS requires a few iterations at the beginning to update the statistics of some nodes in the search tree for the game actions to converge to the optimal (or semi-optimal) ones. These initial iterations impose a number of ineffectual query cycles which result in large values of vertical intercepts in the estimated lines. In essence, by increasing the number of tags, the reader gains more time to update the node statistics; thus, it can better learn the search tree and find paths to the optimal game states. As a result, it discovers the readable nodes more quickly. This means that identifying a small set of tags is less time-efficient than a large set. Also, the larger contribution of the data points corresponding to the small number of tags to the regression error is justified according to this behavior.

Recognition efficiency (throughput) of an anti-collision algorithm is quantified as the ratio of the number of the tags to be identified to the number of the query cycles that the algorithm needs to identify all of those tags. Given \(N_{t}\) tags and the time complexity \(T(N_{t} )\) of an algorithm, recognition efficiency is obtained as follows:

$$\eta (N_{t} ) = \frac{{N_{t} }}{{T(N_{t} )}}$$
(23)

Throughputs of the binary-, quad-, oct-, and hex-MCQTS is summarized in Table 2. It is evident that the MCQTS performs more efficiently for larger set of tags, which was discussed earlier. Furthermore, results show that hex-MCQTS gives the best throughput as the tree becomes simpler with the increase of the branching factor.

Table 2 Empirical values of throughput (\(\eta (N_{t} )\)) for the MCQTS with \(\ell = 1\), \(\ell = 2\), \(\ell = 3\), and \(\ell = 4\)

In Sect. 4.1, we argued that the value of the efficiency coefficient (\(\gamma\)) is mostly higher than 0.5. For validation, we have recorded the total number of iterations that the MCQTS needs to complete the tag identification in each simulation of the algorithm. From the figures, \(\gamma_{\ell = 1} = 0.73\), \(\gamma_{\ell = 2} = 0.77\), \(\gamma_{\ell = 3} = 0.97\), and \(\gamma_{\ell = 4} = 0.94\) were obtained for the binary-, quad-, oct-, and hex-MCQTS, respectively. These results confirm the validity of Eq. (9), since the values of \(\gamma\) are at least 0.73. There is a positive correlation between the values of \(\ell\) and the values of \(\gamma\), as (commonly) the latter decreases with the decrement of the former. The reason is that the tree depth (\(D\)) increments with the decrement of \(\ell\), increasing the complexity of the identification task. It is worthwhile to mention that \(\gamma\) is strongly dependent on the distribution of the tags in the search tree. Best performance is achieved when the readable nodes are placed at the upper layers, and least scattered. For this, a new tag is assigned to an ID, for which the corresponding readable node fills the first (smallest index) idle node in the tree.

In order to compare the performance of the MCQTS to other methods, we have simulated some other existing tree-based anti-collision algorithms, including the BS, QT, CT, BLBO, CQT, IQT, and MICT (\(\ell = 5\)). Figure 7 illustrates the simulation results of oct-MCQTS compared to the other algorithms, and Fig. 8 illustrates that of hex-MCQTS. As expected from Table 1, BS requires many more query cycles than the others to complete the tag recognition (see Figs. 7, 8). CT, BLBO, and CQT have the same time complexity, hence the same behavior in terms of the number of query-responses. The performance of the QT lies somewhere in between the BS and CT, BLBO, and CQT. IQT, on the other hand, performs better than the other algorithms, except for the MICT and the MCQTS. It is evident from Fig. 7 that oct-MCQTS has higher tag reading rate than the other competitors for \(N_{t} > 100\), except for the MICT. However, the MICT is outperformed for \(N_{t} > 300\). For \(N_{t} < 300\), the MICT is superior to the oct-MCQTS because it generates query prefixes based on the number of collisions in the characteristic ID, and is able to determine the collision information using the first few bits of the tag ID. More importantly, by eliminating the idle timeslots, it forms a tree having only collision and readable nodes. In fact, before the MCQTS had enough time to learn the search tree, the MICT starts to outstrip. The same reason is behind the relatively poor performance of the MCQTS for \(N_{t} < 100\). Similar argument is true for the hex-MCQTS in Fig. 8, but the hex-MCQTS outperforms the MICT for \(N_{t} > 200\). Thus, the MCQTS performs best when \(\ell = 4\), and surpasses the other methods in most cases.

Fig. 7
figure 7

Comparison of the number of query cycles of oct-MCQTS with other tree-based methods

Fig. 8
figure 8

Comparison of the number of query cycles of hex-MCQTS with other tree-based methods

The bar chart in Fig. 9 shows the number of required query cycles for BS, QT, CT, BLBO, CQT, IQT, MICT, oct-MCQTS, and hex-MCQTS to recognize \(N_{t} =\) 100, 200, 300, 400, and 500 tags. It can be seen that the MICT, oct-MCQTS, and hex-MCQTS are the top three of the competitors, and the IQT is the next best one. Table 3 enlists the empirical recognition efficiencies of the methods using the results of Fig. 9. The BS has by far the lowest throughput, which also slightly decreases as the number of tags increases. The recognition efficiency of the other methods, except for the MCQTS, remained level in all cases, and the MICT was the best among them. There is a clear upward trend in the recognition efficiency of the oct-MCQTS and hex-MCQTS which is the direct result of the better tree learning as the number of tag increases. The hex-MCQTS declares a fine performance, and outperforms the other methods in all cases, and the MICT for \(N_{t} > 200\). The average improvement percentages of the hex-MCQTS comparing to BS, QT, CT/BLBO/CQT, IQT, and MICT, are 62.06%, 43.86%, 31.34%, 23.30%, and 3.89%, respectively. For \(N_{t} = 500\), the oct-MCQTS and hex-MCQTS achieve nearly 86% and 90% efficiency, therefore, making good candidates to be utilized in networks with large number of tags.

Fig. 9
figure 9

Comparison of the number of query cycles for \(N_{t} =\) 100, 200, 300, 400, and 500

Table 3 Comparison of the empirical values of throughput (\(\eta (N_{t} )\)) for \(N_{t} =\) 100, 200, 300, 400, and 500

5 Conclusion

Monte–Carlo Query Tree Search (MCQTS) method is presented as a novel method for radio frequency identification (RFID) system tag collision arbitration. The original Monte–Carlo Tree Search (MCTS) is modified, making it suitable for combination with query tree (QT) search algorithm. A new action selection function is proposed for the MCTS, and the search mechanism of the method is improved to befit a multi-tag identification problem. Time and space complexities of the new method are analytically derived, and it is shown that compared to the conventional methods, the MCQTS shows better time performance, albeit with higher memory requirement. The proposed method was evaluated using computer simulations, and the effects of number of tags and tree parameters on its performance were investigated. It was shown that, the additional algorithm complexity together with the time required for learning the tree, decreases the recognition efficiency of the MCQTS for systems with less than 100 tags. On the other hand, the new algorithm outperforms the conventional methods such as binary search (BS), query tree (QT), collision tree (CT)/bit-locking back-off (BLBO)/memory-efficient compressed query tree (CQT), improved quadtree (IQT), and multi-bit identification collision tree (MICT), in terms of average throughput, by 62.06%, 43.86%, 31.34%, 23.30%, and 3.89%, respectively.