1 Introduction

The strategies used in the RoboCup Soccer Small Size League (SSL) have been extensively developed in recent years so that each team’s robots take action in response to an opponent’s predicted behavior. It has become increasingly important for teams to learn about their opponent’s behavior. Some studies have developed approaches to learning an opponent’s strategies in the SSL [2, 9]; however, because these methods use robot trajectory data and require long computational times, they are mainly applied to set plays.

To overcome these problems, we propose a new method for classifying an opponent’s strategies. We focus on a sequence of basic actions, or simply a sequence of actions, wherein the basic action is a 4-tuple defined as the <action name, start position, end position, duration>. A typical action is a kick action, a pass action, a shoot action, or other similar actions. Sequences of actions are clustered into several groups such that each group includes a sequences of actions derived from a strategy. The advantage of this method derives from the ease with which this method predicts a future subsequent action, making it possible to take preemptive counter actions.

In the following sections, we describe a method of extracting robot actions and applying a clustering method by defining a dissimilarity measure of a sequence of actions. Finally, we provide experimental results and discuss the availability of the method.

2 Related Work

Erdogan and Veloso [2] proposed a method for classifying an opponent’s behaviors in the SSL, and they applied their classification method to the attacking behaviors in set plays during real SSL games. The opponent’s behaviors were expressed as trajectories of offensive robots to which they applied a cluster analysis by computing the similarity of the behaviors. Yasui et al. [9] also proposed a method of classifying an opponent’s behaviors using an approach similar to that of Erdogan. Yasui et al. applied their method to learn their opponent’s behaviors during set plays as they occurred online and in real time. They demonstrated experimentally that an opponent’s behaviors could be classified about 2 s before ball actuation. These studies demonstrated the effectiveness of learning an opponent’s behaviors; however, because these methods use robot trajectory data and require significant computational times, they are mainly applied to set plays.

Trevizan and Veloso [6] proposed a method for comparing the strategies of two teams in the SSL. They divided a time series representing a game into non-overlapping intervals that they labeled episodes. They used 23 variables, including the distance between a robot and the ball and the distance between a robot and the defense goal, to characterize the episode. They used the mean and standard deviation of each variable over an episode to reduce the data size. Therefore, n episodes with f variables could be represented using a matrix of size \(2f\times n\). They computed the matrix norms of two episode matrices for teams A and B and evaluated the similarities between the strategies of teams A and B. Their method then compared the similarities between the two teams’ strategies. Their study’s objective differed from the objective addressed in this paper.

Visser and Weland [7] proposed a method of classifying an opponent’s behaviors based on a decision tree constructed for use in the RoboCup soccer simulation league. Time series data, consisting of the ball-keeper distance, ball speed, number of defenders in the penalty area, and other game parameters, were used to construct a decision tree that predicted the goalkeeper’s (GK’s) movements, including GK stays in goal, GK leaves goal, GK returns to goal, over several games. Learning based on a decision tree is a type of supervised learning. We propose an unsupervised learning algorithm for use in on-line real-time learning.

3 Robot Action Detection

In this paper, we use data logged during RoboCup 2015 competition. The logged data comprise a time series of robot positions and orientations, ball positions, referee commands, and other game parameters, logged every 1/60 s.

The strategies were classified by defining the following 8 actions: passer robot mark, shooter robot mark, ball-keeping robot mark, pass wait, kick ball, kick shoot, kick pass, and kick clear. A time series of logged data was converted to a sequence of these actions for use as an input to our classification process. (The not available (NA) action was suitably inserted if a part of the time series could not be converted to any of the 8 available actions.)

In this section, we describe how the robots’ actions were detected using the logged data. The basic method is one that we have proposed in [1]. We extend that method in this section.

3.1 Mark Actions

In [1], mark actions consist of three actions: “passer mark”, “shooter mark”, and “ball-keeping robot mark”. We improved the detection algorithms described in [1] and describe some of these improvements in the following subsections.

Passer Mark. In his passer mark algorithm, Asano did not consider whether a passing robot surely existed. In his algorithm, the passer mark was detected, even if a robot simply ran after the ball. We corrected this fault as follows.

Definition of symbols

  • \(\overrightarrow{T_{i,f}}\): the position of the teammate robot \(T_i\) at time f.

  • \(\overrightarrow{O_{j,f}}\): the position of the opponent robot \(O_j\) at time f.

  • \(\overrightarrow{B_f}\): the position of the ball at time f.

  • \(\overrightarrow{T_{S,f}}\): the position of the teammate robot with the shortest distance to the line connecting the ball \(\overrightarrow{B_f}\) and the robot \(\overrightarrow{T_{i,f}}\). We considered \(T_{S,f}\) to be the receiver robot.

Fig. 1.
figure 1

Passer mark.

We computed the distance \(D_{j,i,f}\) between \(O_{j,f}\) and the line connecting \(T_{i,f}\) and \(T_{S,f}\), as shown in Fig. 1. If either or both of the inner products \(\overrightarrow{V_1} \cdot \overrightarrow{V_2}\) and \(\overrightarrow{V_4} \cdot \overrightarrow{V_5}\) were negative, \(\gamma _p \) was added to \(D_{j,i,f}\), where \(\gamma _p\) is a constant, because it was desirable to exclude non-mark cases (See Eq. (1)). Averaging the \(D_{j,i,f}\)’s over the interval \([f, f+n-1]\) gave the following equation, and with it we judged whether or not \(O_j\) marked a passer robot. (\(O_j\) marked the passer \(T_i\) at time f if the \(MarkPass_{j,i,f}\) variable were equal to 1,)

$$\begin{aligned} MarkPass_{j,i,f} = {\left\{ \begin{array}{ll} 1 &{} if\, \frac{1}{n}\displaystyle \sum _{k=f}^{f+n-1}D_{j,i,k} \le TH_p \\ 0 &{} otherwise \end{array}\right. } \end{aligned}$$
(1)

where \(TH_p\) is a given threshold and n is a given constantFootnote 1.

The detection algorithm is given below.

figure a

Shooter Mark and Ball-Keeping Robot Mark

A shooter mark is often carried out near the goal area, and a mark robot usually stands some distance from a shooter. As an evaluation metric, we defined the distance between a (mark) robot and a line connecting the shooter and the center of the goal mouth. We then computed the \(MarkShoot_{j,i,f}\) variable using an equationFootnote 2 similar to Eq. (1). (For a \(MarkShoot_{j,i,f}\) variable of 1, \(O_j\) marks the shooter \(T_i\) at time f.)

A ball-keeping robot mark is a mark other than the passer mark and the shooter mark. We used, as an evaluation metric, the weighted sum of two distances, that is, the distance \(D1_{j,i,f}\) between the ball-keeping robot \(T_i\) and the (mark) robot \(O_j\) and the distance \(D2_{j,i,f}\) between the (mark) robot \(O_j\) and a line connecting the ball-keeping robot \(T_i\) and the ball.

$$\begin{aligned} D_{j,i,f}=\alpha D1_{j,i,f} + \beta D2_{j,i,f} \end{aligned}$$
(2)

We then computed the \(MarkBall_{j,i,f}\) variable using the similarity equationFootnote 3, Eq. (1). (For a \(MarkBall_{j,i,f}\) variable equal to 1, \(O_j\) marked the ball-keeping robot \(T_i\) at time f.)

3.2 Pass Waiting Action

The pass waiting action was not discussed in [1]. We defined it here for the first time.

Fig. 2.
figure 2

Pass waiting action.

Let \(O_b\) be the opponent robot nearest to the ball. It was reasonable to assume that a candidate robot waiting to receive a pass was the one on the left side of \(O_b\), as shown in Fig. 2. \(O_j\) in Fig. 2 is one such candidate. The shootable angle \(\theta _j\) could then be computed. If an opponent robot with a shootable angle exceeding a given threshold were present, the opponent was defined as being in a pass waiting action. To reduce the influence of noise, the shootable angle was averaged over some interval in time. The variable \(WaitPass_{j,f}\) is given by

$$\begin{aligned} WaitPass_{j,f} = {\left\{ \begin{array}{ll} 1 &{} if\, \frac{1}{n}\displaystyle \sum _{k=f}^{f+n-1}\theta _{j,k} \ge TH_w \\ 0 &{} Otherwise, \end{array}\right. } \end{aligned}$$
(3)

where \(\theta _{j,k}\) is the shootable angle of robot \(O_j\) at time k. As the threshold value of \(TH_w\), we used a threshold of 8\(^\circ \) (\(=0.14\,\text {rad} \)), and the length of the interval n was 3 in our experiments.

3.3 Kick Actions

In RoboCup Soccer, the kick actions as well as the mark actions are important. We proposed a kick action detection algorithm using the logged data reported in [1] and its modification in [8]. For the purposes of this paper, we classified kick actions according to the kick purpose: kick for shoot, kick for pass, or kick for clear. A kick action that did not belong to any of these three purposes was also considered. This section describes the kick action detection algorithm.

Definition of symbols

  • Kick actions = {KickShoot, KickPass, KickClear, KickBall}. KickBall is a kick action other than one of the first three actions.

  • \(L_{b}\): a line segment that begins at the kick point \(P_s\) and ends at the last point \(P_e\), along which the ball’s trajectory is straight. Let \(\overrightarrow{P_{b}}\) be its vector form.

  • \(\overrightarrow{P_{oi}}\): a vector beginning at the kick point and ending at the opponent’s robot \(O_i\).

  • \(P_{gl}\), \(P_{gr}\), \(P_{gc}\): edge points and center point of the teammate goal mouth.

  • d, \(D_G\): a distance between \(P_e\) and \(P_{gc}\) and a given threshold.

The following algorithm predicts a kick action based on the location of the end point \(L_{b}\).

figure b

In line 3 of the above algorithm, \(\left| \overrightarrow{P_{oi}} \times \overrightarrow{P_{b}} \right| < D_1\) computes how close \(P_{oi}\) is to the line segment \(L_b\).

Finally, an action other than any of the above 8 actions was expediently classified as a NA action.

4 Action Decision Algorithm

The previous section described the action detection algorithm. Next, a sequence of actions was calculated for each opponent robot. Multiple actions could be predicted simultaneously for a robot. In this case, we selected an action according to the priority of the action, where the priority of a kick action was the highest, followed by a pass waiting action, and finally by a mark action. The time series of logged data was then converted to a time series of actionsFootnote 4 and was finally converted into a sequence of actions:

$$\begin{aligned} \begin{aligned} A_{P}[n]&= \left[ \begin{matrix} \left( \begin{matrix} action_{n1} \\ \overrightarrow{p_{sn1}} \\ \overrightarrow{p_{en1}} \\ frame_{n1} \end{matrix} \right) , &{} \cdots , &{} \left( \begin{matrix} action_{ni} \\ \overrightarrow{p_{sni}} \\ \overrightarrow{p_{eni}} \\ frame_{ni} \end{matrix} \right) , &{} \cdots , &{} \left( \begin{matrix} action_{nt} \\ \overrightarrow{p_{snt}} \\ \overrightarrow{p_{ent}} \\ frame_{nt} \end{matrix} \right) \end{matrix} \right] \end{aligned}, \end{aligned}$$
(4)

where \(A_P[n]\) is a sequence of actions for robot n, \(\overrightarrow{p_{sni}}\) and \(\overrightarrow{p_{eni}}\) are the start and end times of the \(action_{ni}\), respectively, and \(frame_{ni}\) is the duration of \(action_{ni}\). The ith element of a sequence of actions is denoted by

$$\begin{aligned} \begin{aligned} A_{P}[n][i]&= \left( \begin{matrix} action_{ni} \\ \overrightarrow{p_{sni}} \\ \overrightarrow{p_{eni}} \\ frame_{ni} \end{matrix} \right) \end{aligned} \end{aligned}$$
(5)

A Note on the Time Series of Actions. Any time series of actions usually contains false actions. Preprocessing is needed to remove such actions.

  • An action that only continues over a couple of frames should be classified as a false action and, as a result, is replaced by the succeeding action. The kick actions are an exception because short kick actions can occur at the edge of the field.

  • If an action is broken into two actions by a false action, the two actions should be unified into a single action.

  • If a false action cannot be replaced by any of the 8 actions, the time series is padded with an NA action.

5 Dissimilarities Between the Action Sequences

We defined a dissimilarity metric of two sequences of actions using Eq. (4). To do so, we first defined a dissimilarity measure \(d_0\) of two actions \(A_{P_1}[n_1][t_1]\) and \(A_{P_2}[n_2][t_2]\) as follows,

(6)

where \(\alpha \), \(\beta \), \(\gamma \) are the weights, and frame_diff, p_dist, and diff_size_cost are explained in the following paragraph.

The value of frame_diff is given by the following equation,

$$\begin{aligned} \begin{aligned} \mathrm {frame\_diff} = \left| \frac{frame_{n_1t_1}}{frame\_play_{n_1t_1}} - \frac{frame_{n_2t_2}}{frame\_play_{n_2t_2}}\right| , \end{aligned} \end{aligned}$$
(7)

where \(frame\_play\) is the duration of the sequence of play \(A_{P}[n]\). Frame_diff takes a value between 0 and 1.

The value of p_dist is given by the following equation,

$$\begin{aligned} \begin{aligned} \mathrm {p\_dist} = \min \left\{ \frac{|(\overrightarrow{p_{sn_1t_1}} - \overrightarrow{p_{sn_2t_2}})|}{FieldLength},1.0\right\} + \min \left\{ \frac{|(\overrightarrow{p_{en_1t_1}} - \overrightarrow{p_{en_2t_2}})|}{FieldLength}, 1.0\right\} , \end{aligned} \end{aligned}$$
(8)

where FieldLength is the length of the side line of the field. P_dist takes a value between 0 and 2.

The diff_size_cost is given by the following equation,

$$\begin{aligned} \begin{aligned} \mathrm {diff\_size\_cost} = \min \left\{ \frac{1}{3}\left( \frac{\mathrm {long\_size}}{\mathrm {short\_size}} - 1.0\right) ,2.0 \right\} , \end{aligned} \end{aligned}$$
(9)

where long_size = \(\max (frame_{n_1t_1}, frame_{n_2t_2}) \) and short_size = \(\min (frame_{n_1t_1}, \) \(frame_{n_2t_2}) \). Diff_size_cost takes a value between 0 and 2.

Next, we defined a dissimilarity \(d_1(A_{p1}[n_1], A_{p2}[n_2])\) between two sequences of actions \(A_{P_1}[n_1]\) and \(A_{P_2}[n_2]\) of robots \(n_1\) and \(n_2\).

Action sequences do not always have the same length; therefore, we defined the dissimilarity as the degree of overlap between the shorter sequence of actions and the longer sequence of actions. The computational algorithm is given below.

  • Step 1. Let short be the shorter sequence, and long be the longer sequence. Let the length of short and long be \(short\_size\) and \(long\_size\), respectively. Let \(kick\_num\) be the number of kick actions in long. Let i and j be counter variables with an initial value of 1. Let \(start\_j\) and \(limit\_j\) be the start of the search pointer and the end of the search pointer, with an initial value of 1. Initialize \(d_1\) to 0.

  • Step 2. For the ith action in short sequence, decide the search range in the long sequence as follows,

    $$\begin{aligned} \begin{aligned}&ls = \mathrm {long\_size}/\mathrm {short\_size}\\&limit\_j_1 = i + ls\\&limit\_j_2 = \min (start\_j + ls ,\mathrm {long\_size})\\&limit\_j = \max (limit\_j_1,limit\_j_2). \end{aligned} \end{aligned}$$
    (10)

    For the ith action in the short sequence, search coincident actions over the range \(start\_j\) and \(limit\_j\) within the long sequence.

  • Step 3. If a coincident action is found, compute

    $$\begin{aligned} d_1 = d_1 + d_0(A_{P_1}[n_1][i],A_{P_2}[n_2][j]), \end{aligned}$$

    and \(start\_j=j+1\). If such an action is not found, compute

    $$\begin{aligned} d_1 = d_1 + d_0(A_{P_1}[n_1][i],A_{P_2}[n_2][i]). \end{aligned}$$

    If \(i < \mathrm {short\_size}\), then \(i = i + 1\), and go to Step 2; otherwise, go to Step 4.

  • Step 4. Out of \(kick\_num\) kick actions in long sequence, remove actions that match the kick action in short sequence. Let the number of remaining kick actions be \(kick\_unused\). Add \(kick\_unused\) to \(d_1\) as an additional cost,

    $$\begin{aligned} d_1 = d_1 + kick\_unused. \end{aligned}$$

Finally, we defined a dissimilarity \(d_2\) between plays. A play includes six robot action sequences, so we considered the correspondence between any 2 sequences. The dissimilarity \(d_2\) was defined by

$$\begin{aligned} d_2(A_{P_1},A_{P_2})&= \min _{\sigma \in S_6}\{\mathrm {Tr}(F P_\sigma )\} \end{aligned}$$
(11)
$$\begin{aligned} F&= [f_{ij}] \end{aligned}$$
(12)
$$\begin{aligned} f_{ij}&= \left\{ d_1(A_{P_1}[i],A_{P_2}[j])\right\} , \end{aligned}$$
(13)

where \(P_\sigma \) is a permutation matrix and Tr(A) is the trace of matrix A.

The team’s behavior was classified using the group average method [3] to cluster the sequences of actions under the dissimilarity metric \(d_2\).

6 Deciding on the Number of Clusters

Determining the number of clusters was important. If the range of the number of clusters was given in advance, we could use the Davies–Bouldin index [4]. By contrast, Yasui et al. proposed a method for deciding the number of clusters independently of the range [10]. Their method is given by the following procedure. First, compute

$$\begin{aligned} W(K) = \sum _{i=1}^{K}\sum _{X_k \in C_i}\sum _{X_l \in C_i} d_2(A_{P_k},A_{P_l}). \end{aligned}$$
(14)

This equation computes the sum of the distances for any two elements in a cluster, summated over all clusters, assuming that the number of clusters is K. Then, using W(K), compute

$$\begin{aligned} W'(K) = W(K)/W(1), \end{aligned}$$
(15)

and

$$\begin{aligned} \mathop {\hbox {arg max}}\limits _{1 \le K \le N}(W'(K) \le h) , \end{aligned}$$
(16)

where h is a threshold value determined in advance. The number of clusters is decided by Eq. (16).

7 Experiment: Our Team’s Strategy Classification

In RoboCup 2015, we competed in 4 official games and recorded logged data from each game. These data were then used in a classification experiment, assuming that our team was the opponent. In the experiment, we used \(\alpha = \beta = \gamma = 1/3\) in Eq. (6) and \(h=0.06\) in Eq. (16)Footnote 5.

In this section, we classified our team’s strategies experimentally. The set play data were used in the experiment. A set play began at ball re-placement and ended at ball interception or ball-out-of-field. The clustering results obtained from the 4 gamesFootnote 6 are shown in Figs. 3, 4, 5 and 6.

Fig. 3.
figure 3

Dendrogram for game No. 1.

Fig. 4.
figure 4

Dendrogram for game No. 2.

Fig. 5.
figure 5

Dendrogram for game No. 3.

Fig. 6.
figure 6

Dendrogram for game No. 4.

Table 1. Rand index (RoboDragons)
Table 2. Rand index (opponents).

The Rand index [5] was used to evaluate the classification results. We determined the correct classification for each game, as determined in comparison with the clustering results obtained by inspection (the human clustering method). The Rand index for each game is given in Table 1. The Rand index values were high for each game except for the No. 2 game. In the No. 2 game, the opponent team malfunctioned so that the detection of mark actions did not work well, resulting in a lower Rand index. In other games, a cluster identified by human clustering was found to be divided into two clusters by the computer clustering method. This lowered the Rand index slightly; however, from a practical perspective, this was not a serious problem.

8 Experiment: Opponent Team Classification

The strategies of each opponent team in the RoboDragons’ official games were classified. Figures 7, 8, 9 and 10 provide the classification results, and Table 2 lists the Rand indices. Table 2 reveals that the Rand indices assumed values between 0.840 and 0.901. For comparison, Erdogan et al. obtained values between 0.87 and 0.96 by using the trajectory data. Our experimental results revealed that high Rand index values similar to Erdogan’s results were obtained from the action sequence data. This experiment revealed the cluster division problem discussed in previous sections. Future work to improve this problem is necessary.

Fig. 7.
figure 7

Dendrogram for game No. 1 (opponent).

Fig. 8.
figure 8

Dendrogram for game No. 2 (opponent).

Fig. 9.
figure 9

Dendrogram for game No. 3 (opponent).

Fig. 10.
figure 10

Dendrogram for game No. 4 (opponent).

9 Computational Time

The total clustering analysis computational time was measured for game No. 4, in which 35 set plays were executed. This calculation included the computational time associated with the preprocessing of a time series of actions, the creation of a distance matrix, and the clustering using the group average method. We got an average computational time of 0.67 ms and a maximal computational time of 1.82 ms, which show that the real-time computation of clustering is possible.

10 Concluding Remarks

We have proposed a classification method based on an opponent’s actions in this paper. A sequence of actions was derived from a time series of data logged from an SSL game. A sequence of actions includes less data than the logged data, permitting faster computation. An evaluation of this method using the Rand index revealed that clustering using the proposed method provided a good classification of an opponent team’s behaviors (strategies, in most cases). The computational time was so small that real-time computation was possible using this method.

Future work will focus on refinements of the proposed method, extensions to any scene during play, the generation of a counter action using the logged data of past games, and implementation to our RoboDragons system.