Keywords

1 Introduction

Differential attacks [3] and linear attacks [15] are the most fundamental cryptanalytic methods. They have been used in the cryptanalysis of numerous symmetric ciphers. Since the first and most important thing for the two methods is to identify differential characteristics and linear approximations, the automatic search algorithms for differential characteristics and linear approximations have been a focus of cryptographer’s concern. At EUROCRYPT’94, Matsui [16] presented the branch-and-bound search algorithm and found the differential characteristics and linear approximations for DES block cipher. The branch-and-bound search algorithm is one of the most powerful and classic search tools and is still widely used now. Another research line for the application of automatic search algorithm is to provide the provable security against differential cryptanalysis and linear cryptanalysis, which is usually achieved by automatic searching for the minimal number of active S-boxes.

Mixed-Integer Linear Programming (MILP) has been explicitly applied in constructing automatic search algorithm in differential and linear cryptanalysis. The problem of MILP is a class of optimization problems derived from Linear Programming which aims to optimize an objective function under certain constrains. Mouha et al. [18] and Wu et al. [28] translated the problem of counting the minimal number of differentially active S-boxes to an MILP problem which can be solved automatically with open source or commercially available MILP solvers. Their method has been applied in searching for the differential and linear characteristics with specific patterns [14, 29] and counting the minimal number of active S-boxes of bit-oriented block ciphers by introducing bit-level representations [21, 27].

Recently, the MILP-based method has been developed to be a general method to automatically search for the real differential characteristics. Sun et al. [22] constructed the MILP-based model to search for (related-key) differential characteristics by generating linear inequalities from the differential distribution table of S-box, where only partial linear inequalities are used in MILP model to make it solvable in practical time. Their search algorithm, however, is heuristic, since the identified differential characteristics may not be consistent. By computing a small number of inequalities which can exactly describe the differential distribution table of an S-box with the greedy algorithm, Sun et al. [24] transformed the heuristic searching method to the exact and practical searching method. Moreover, they constructed the MILP-based model for automatically searching for linear approximations and extended these models to search for differential and linear hull. Sun et al.’s method [22, 24] is applicable to block ciphers involving bitwise XOR, S-box operation and the linear layer with bit permutationFootnote 1. Although the general linear layer can be transformed into bit XOR operations, it makes the MILP problem much more difficult to be solved in practical time since more XOR operations result in more variables and constraints.

Due to the excellent performance of ARX-based ciphers in software, many symmetric-key ciphers are designed based on ARX operationsFootnote 2. It is worth noting that the cryptanalytic techniques for ARX ciphers are very different from those for ciphers with S-boxes such as AES and DES. In particular, the search algorithms for differential characteristics and linear approximations for ARX cipher utilize the different principle compared with those for ciphers with S-boxes. In [10, 12, 17], the methods of automatic search for differential characteristics in ARX designs are provided, but the methods are only compatible with ARX-based Hash functions where the key is known and can be freely chosen. By using the partial differential distribution table and Matsui’s branch-and-bound algorithm, Biryukov and Velichkov [4] presented the first automatic search algorithm for differential characteristics in ARX block ciphers, such as (X)TEA and Speck. In a very recent paper [5] appearing in this volume of FSE’16, Biryukov et al. proposed the first adaptation of Matsui’s algorithm for finding the best differential and linear trails in ARX ciphers.

Although MILP-based search algorithm has got extremely remarkable application for some block ciphers, the current method cannot be applied to ARX block ciphers. A straightforward method to apply MILP model for ARX constructions is to regard the modular addition in \(\mathbb {F}_2^n\) as a \(2n \times n\) S-box and compute a small number of linear inequalities to exactly represent the differential or linear pattern of the modular addition. However, in this way the number of linear inequalities is too large to be solved in practical time for real ARX ciphers where n is typically at least 16. This motivates us to study MILP-based search method for ARX block ciphers.

1.1 Our Contributions

In this paper, we revisit the differential property and linear property for modular addition and provide a new framework of constructing the MILP model. Concretely, we transform the differential property of modular addition shown in [13] into linear inequalities to describe all possible differential patterns and the corresponding differential probabilities. Moreover, we use linear inequalities to capture all possible linear patterns and the corresponding correlations based on the automaton algorithm for correlation of modular addition in [19, 25]. The number of the resulting linear inequalities is significantly less than that of linear inequalities produced by regarding modular addition as one S-box. With the linear inequalities, we can construct the MILP model to automatically search for differential characteristics and linear approximations using the the commercial optimizer Gurobi, where the object function is the probability of differential characteristic or the correlation of linear approximation.

During constructing MILP models, we assume that the two inputs to modular addition and the consecutive rounds are independent. However, as demonstrated in [26], for some ARX constructions, the inputs to modular addition and the consecutive rounds are not independent, which will result that the practical probability (resp. correlation) of our identified differential (resp. linear) tails for some fixed key may vary significantly from that derived from our model. This deviation will have effect on the success rate of the attacks from practitioner’s perspective.

As an illustration, we apply our method to the block cipher Speck, which is a family of lightweight block ciphers publicly released by the National Security Agency (NSA) and has been optimized for performance in software implementations [2]. A variety of block sizes and key sizes for different implementations are provided for it. Since its publication, Speck has received much attention and many cryptanalytic results have been given. Abed et al. presented differential and rectangle attacks for almost all variants of Speck [1]. At FSE’14 [6], Biryukov et al. searched for the differential characteristics, which cover 9, 11 and 14 rounds for Speck32, Speck48 and Speck64, respectively, and are better than the differential characteristics in [1]. In [9], Dinur proposed the sub-cipher attack and improved the key recovery attacks on all variants of Speck using the differential characteristics in [6]. In [5], Biryukov et al. presented the probabilities of the best differential trails for up to 10, 9, 8, 7, and 6 rounds of Speck32, Speck48, Speck64, Speck96 and Speck128 respectively and evaluate the security bounds of Speck against single-trail differential cryptanalysis under the Markov assumption. As regards to linear cryptanalysis, Yao et al. identified 9, 9, 12, 6 and 6 rounds linear approximations for Speck32, Speck48, Speck64, Speck96 and Speck128, respectively [30], and gave the key recovery attacks.

We use our models to search for the differential and linear trails for Speck. In order for the MILP tool to run in reasonable time for larger block sizes (>48 bits), we split the block cipher into two or three parts – upper (middle) and lower. We then search for trails independently in each part, by ensuring that the output difference (mask) for one part is the same as the input difference (mask) for its following part. For Speck48, Speck64, Speck96 and Speck128, we find better differential characteristics and linear approximations than those of previous works under the assumptions of independent inputs to the modular addition and independent rounds. With the new differential characteristics, we improve the differential attacks on the four variants of Speck. Comparing with the previous best attacks for them [9], we can attack one, one, three and five more rounds for Speck48, Speck64, Speck96 and Speck128 with any key size, respectively. We summarize known attacks on Speck in Table 1. We compare our identified differential characteristics and linear approximations with those of previous works in Table 2.

Table 1. Summary of attacks on speck
Table 2. Summary of differential characteristics and linear approximations for speck

Outline. The remainder of this paper is organized as follows. Section 2 gives a brief description of the existing MILP-based search methods for block cipher. Sections 3 and 4 introduce MILP-based algorithm for automatic searching for differential characteristics and linear approximations for ARX ciphers. We apply the new search tools in Speck and give the improved differential attacks on all variants of Speck except Speck32 in Sect. 5. Finally, we conclude the paper in Sect. 6.

2 Sun et al.’s MILP-Based Automatic Search for (Related-Key) Differential and Linear Trails (Hull)

In this section, we briefly recall Sun et al.’s algorithm. For more details of their algorithm, we refer to [22, 24].

Objective Function of Differential Model. Let \(x_i\) denote the difference variable for the bit i. That is, \(x_i=0\) if there is no difference at bit i; Otherwise, \(x_i=1\). Another bit variable \(A_j\) is used to denote the activity of an S-box, i.e., \(A_j=0\) if the S-box is non-active; Otherwise, \(A_j=1\). The objective function is to minimize the sum of all variables \(\sum _j A_j\), which indicates the activities of the S-boxes appearing in the schematic description of the encryption and key schedule algorithm.

Constraints of Differential Model. For every XOR operation with bit-level input differences a, b and bit-level output difference c, the constraints include

(1)

where \(d_{\oplus }\) is a dummy bit variable.

Next, we describe the constraints of the differential properties of an S-box in a more accurate way. For an \(\omega \times \nu \) S-box denoted by \(A_t\), the input and output differences are \((x_{0},\ldots ,x_{{\omega -1}})\) and \((y_{0},\ldots ,y_{{\nu -1}})\), respectively. Then

$$\begin{aligned} {\left\{ \begin{array}{ll} A_{t}-x_{{k}}\ge 0, k\in \{0,\ldots ,\omega -1\} \\ -A_{t}+\sum \limits ^{\omega -1}_{j=0}x_{{j}}\ge 0 \end{array}\right. } \end{aligned}$$
(2)

which ensures that nonzero input difference must activate the S-box.

Let (\(x_{0},\ldots ,x_{\omega -1},y_{0},\ldots ,y_{\nu -1}\)) \(\in \{0,1\}^{\omega +\nu }\subseteq \mathbb {R}^{\omega +\nu }\) denote an (\(\omega +\nu \))-dimensional vector, where \(\mathbb {R}\) is the real number field. By computing the H-Representation of the convex hull of all possible input-output differential patterns of an S-box, many linear inequalities which can be used to remove some impossible differential patterns of the S-box are obtained. The greedy algorithm in [24] is applied to select a subset of the H-Representation of the convex hull with less inequalities. As a result, they generate only a small number of linear inequalities, which can be used to exactly describe the differential pattern of S-box and construct the MILP problem. Using any MILP optimizer such as Gurobi [11], good differential characteristics can be found. If we set the value of the object function as N, finish the solving process and output the current solution till the value of object function is reduced to N. The corresponding solution is the identified differential characteristic with N active S-boxes.

Note that this exact searching method is also applicable to searching for the linear approximations.

Objective Function of Linear Model. Some notations for differential model are also used in linear model, e.g., \(A_j\) denotes the activity of an S-box and the objective function is to minimize \(\sum _j A_j\).

Constraints of Linear Model. For every XOR operation with input masks a, b and output mask c, the constraints should be

$$ a=b=c. $$

For every three-forked branch with input mask a and output masks b and c, the constraints should be

(3)

where \(d_{\rightthreetimes }\) is a dummy bit variable.

For an \(\omega \times \nu \) S-box denoted by \(A_t\), the input and output masks are \((x_{0},\ldots ,x_{{\omega -1}})\) and \((y_{0},\ldots ,y_{{\nu -1}})\), respectively. If the output mask is nonzero, \(A_t=1\); Otherwise, \(A_t=0\). Then, we have

$$\begin{aligned} {\left\{ \begin{array}{ll} A_{t}-y_{{k}}\ge 0, k\in \{0,\ldots ,\nu -1\} \\ -A_{t}+\sum \limits ^{\nu -1}_{j=0}y_{{j}}\ge 0 \end{array}\right. } \end{aligned}$$

which ensures that nonzero output mask must activate the S-box.

For an (\(\omega +\nu \))-dimensional vector (\(x_{0},\ldots ,x_{\omega -1},y_{0},\ldots ,y_{\nu -1}\)) \(\in \{0,1\}^{\omega +\nu }\subseteq \mathbb {R}^{\omega +\nu }\), compute a small number of linear inequalities to exactly represent the linear pattern of S-box. The other processes are similar to those in the model of searching for differential characteristics.

In addition, the technique has been extended to find differential or linear hull [24]. By adding the constraints imposed by the given properties (such as fixed difference or linear mask), they updated the MILP model and obtained all trails which consist of the given differential or linear hull.

3 MILP-Based Algorithm for Automatic Search for Differential Characteristics in ARX Ciphers

In this section, we analyze the differential characteristics of modular addition and identify important properties, which are crucial to the construction of our MILP-based models for ARX ciphers. Using our method, we can give the linear inequalities which can exactly describe all differential for modular addition.

3.1 XOR-Differential Characteristics of Modular Addition

Definition 1

Let \(\alpha , \beta \) and \(\gamma \) be fixed n-bit XOR differences. The XOR-differential probability (DP) of addition modulo \(2^n\) \((xdp^+)\) is the probability with which \(\alpha \) and \(\beta \) propagate to \(\gamma \) through the ADD operation, computed over all pairs of n-bit inputs (x,y):

$$\begin{aligned}&xdp^+(\alpha , \beta \rightarrow \gamma ) = 2^{-2n} \cdot \#\{(x,y) : ((x \oplus \alpha ) + (y \oplus \beta )) \oplus (x + y) = \gamma \}. \end{aligned}$$

In [13], Lipmaa et al. showed Algorithm 2 to compute \(xdp^{+}(\alpha , \beta \rightarrow \gamma )\) which consists of two steps: the first step is to verify if the differential characteristic is possible and the second step is to compute the differential probability \(xdp^{+}\). More precisely, the above two steps are shown in Theorems 1 and 2, respectively.

Theorem 1

(see [13]). The differential \((\alpha , \beta \rightarrow \gamma )\) is possible iff \((\alpha [0] \oplus \beta [0] \oplus \gamma [0])\) = 0 and \(\alpha [i-1]\) = \(\beta [i-1]\) = \(\gamma [i-1]=\alpha [i] \oplus \beta [i] \oplus \gamma [i]\) for \(\alpha [i-1]\) = \(\beta [i-1]\) = \(\gamma [i-1]\), \(i \in [1, n-1]\).

Theorem 2

(see [13]). Assume that \((\alpha , \beta \rightarrow \gamma )\) is a possible differential characteristic, then the differential probability \(xdp^+\) = \(2^{-\sum _{i=0}^{n-2} \lnot eq(\alpha [i],\beta [i],\gamma [i])}\), where

Theorem 1 can be used to decide if the differential characteristic \((\alpha , \beta \rightarrow \gamma )\) for modular addition is possible. For instance, the differential \((\alpha , \beta \rightarrow \gamma )= (11100, 11100 \rightarrow 11110)\) is impossible as \(\alpha [0]\) = \(\beta [0]\) = \(\gamma [0]\) \(\ne \) \(\alpha [1] \oplus \beta [1] \oplus \gamma [1]\). Using Theorem 2, the probability of the differential characteristic can be computed efficiently. For example, for the differential \((\alpha , \beta \rightarrow \gamma )\) = (11100, 00110 \(\rightarrow \) 10110), the probability \(xdp^+(\alpha , \beta \rightarrow \gamma )\)=\(2^{-(\lnot eq(0,0,0)+\lnot eq(0,1,1)+\lnot Eq(1,1,1)+\lnot Eq(1,0,0))}\)= \(2^{-2}\).

From Theorem 2, if the n-bit differential characteristic is possible, the probability is only related with (\(\alpha [i],\beta [i],\gamma [i]\)) for \(i \in [0, n-2]\). Taking advantage of this property, we can construct the MILP model to compute the differential probability \(xdp^+\). More details are shown in the following.

3.2 MILP Model for Differential Characteristics of Modular Addition

In order to append the first condition \(\alpha [0] \oplus \beta [0] \oplus \gamma [0] = 0\) in Theorem 1 to the set of the linear inequalities, we derive five linear inequalities satisfying the first condition. The five linear inequalities are listed as follows,

(4)

where \(d_\oplus \) is a dummy bit variable.

Let the vector \((\alpha [i], \beta [i], \gamma [i]\), \(\alpha [i+1], \beta [i+1], \gamma [i+1])\) denote the relation of the differential values for the i-th and the \((i+1)\)-th bits. We have that there are totally 56 possible patterns for the vector in accordance with Theorem 1. For example, the differential pattern (0, 0, 0, 1, 1, 1) is impossible as \(\alpha _{i}\) = \(\beta _{i}\) = \(\gamma _{i}\) \(\ne \) \(\alpha _{i+1} \oplus \beta _{i+1} \oplus \gamma _{i+1}\). Moreover, in order to compute the differential probability efficiently, we append \(\lnot eq(\alpha [i], \beta [i],\gamma [i])\) to the vector. As described in [23], using the \(\mathbf {inequality \_ generator()}\) function in the \(\mathbf {sage.}\) \(\mathbf {geometry.}\) \(\mathbf {polyhedron}\) class of the SAGE Computer Algebra System [20], we get 65 linear inequalities satisfying all 56 possible patterns. Based on the greedy algorithm in [24], the number of linear inequalities can be reduced from 65 to 13. Furthermore, the 13 linear inequalities can be used to compute the probability of (\(\alpha [i]\) \(\parallel \) \(\beta [i]\) \(\parallel \) \(\gamma [i] \rightarrow \alpha [i+1]\) \(\parallel \) \(\beta [i+1]\) \(\parallel \) \(\gamma [i+1]\)) as the variable \(\lnot eq(\alpha [i], \beta [i],\gamma [i])\) is involved.

Actually, the 13 linear inequalities only represent the second condition \(\alpha [i]\) = \(\beta [i]\) = \(\gamma [i]=\alpha [i+1] \oplus \beta [i+1] \oplus \gamma [i+1]\), \(i \in [0,n-2]\) in Theorem 1. In total, there are (\(13 \times (n-1) + 5\)) linear inequalities to represent the differential property of addition modulo \(2^n\) with two inputs of n-bit length. As described in Theorem 2, the differential probability \(xdp^+\) = \(2^{-\sum _{i=0}^{n-2}{\lnot eq(\alpha [i],\beta [i],\gamma [i])}}\).

3.3 MILP Model for Differential Characteristics of ARX Ciphers

Besides modular addition, the XOR operations, three-forked branch and the circular shift operations are also involved in ARX ciphers. For each XOR operation, we can also use Inequalities (4). For each three-forked branch operation with input differences a, b and output difference c, the constraints should be

$$ a=b=c. $$

For the case of circular shift, we can also list some equations for the related bits. So far, we have finished the construction of all linear inequalities or equations for each operations in ARX ciphers which compose the constraints of MILP model for differential characteristics of ARX ciphers.

As the differential probability \(xdp^+\) = \(2^{-\sum _{i=0}^{n-2}{\lnot eq(\alpha [i],\beta [i],\gamma [i])}}\) can be computed using the method described in Sect. 3.2, we set the objective function for the r-round differential characteristic as the \(\sum _{j=1}^{r}{\sum _{i=0}^{n-2}{\lnot eq(\alpha _j[i],\beta _j[i],\gamma _j[i])}}\) where \(\alpha _j\), \(\beta _j\) and \(\gamma _j\) are the input differences and output difference of modular addition for the j-th round. We aim to find the minimal value of \(\sum _{j=1}^{r}{\sum _{i=0}^{n-2}{\lnot eq(\alpha _j[i],\beta _j[i],\gamma _j[i])}}\) which represents the differential probability of the best identified differential characteristic. We can use the Gurobi optimizer to solve the system of inequalities to search for differential characteristics for ARX ciphers. However, just being able to obtain one differential characteristic may be not enough. We can apply Sun’s method [24] to our new MILP model and find the differential of ARX ciphers.

Note that in the above model, we assume that the two inputs to modular addition and the consecutive rounds are independent. However, for some ARX constructions, they are not independent, which will result that the practical probability of our identified differential characteristics for some fixed key may vary significantly from that derived from our model.

4 MILP Models for Automatic Search for Linear Approximations in ARX Ciphers

In this section, we revisit the property of linear approximations for modular addition operation and develop a new MILP-based tool to search for the linear approximations for ARX ciphers.

4.1 Linear Approximations for Modular Addition

Let n be a non-negative integer. Given two vectors x = \((a_{n-1},\ldots ,a_0)\) and y = \((b_{n-1},\ldots ,b_0)\) \(\in \mathbb {F}_2^n\), let \(x \cdot y\) denote the standard inner product \(x \cdot y = a_{n-1}b_{n-1} \oplus \cdots \oplus a_0b_0\).

Definition 2

Let \(\varLambda _{\alpha }\), \(\varLambda _{\beta }\) and \(\varGamma \) be fixed n-bit linear masks. The correlation of addition modulo \(2^n\) \((cor_{\boxplus })\) with input masks \(\varLambda _{\alpha }, \varLambda _{\beta }\) and output mask \(\varGamma \) can be computed over all pairs of n-bit inputs (x, y):

$$\begin{aligned}&cor_{\boxplus }(\varGamma , \varLambda _{\alpha }, \varLambda _{\beta }) = cor(\varGamma \cdot (x + y) \oplus \varLambda _{\alpha } \cdot x \oplus \varLambda _{\beta } \cdot y) \\&= 2^{-2n}(\# \{ x,y \in \mathbb {F}_2^n : \varGamma \cdot (x + y) \oplus \varLambda _{\alpha } \cdot x \oplus \varLambda _{\beta } \cdot y = 0 \} \\&- \# \{ x,y \in \mathbb {F}_2^n : \varGamma \cdot (x + y) \oplus \varLambda _{\alpha } \cdot x \oplus \varLambda _{\beta } \cdot y = 1 \}).&\end{aligned}$$

Based on a fairly simple classification of the linear approximations of the carry function, Nyberg and Wallén derive an efficient algorithm for computing the correlation of linear approximation of addition modulo \(2^n\) with k inputs in [19, 25]. Since we only consider modular addition with two inputs, we describe this method only for two inputs in [19, 25] as follows.

Theorem 3

(see [19, 25]). For the linear approximation of addition modulo \(2^n\) of two inputs with input masks \(\varLambda _{\alpha }, \varLambda _{\beta }\) and output mask \(\varGamma \), \(\varLambda _{\alpha }, \varLambda _{\beta }, \varGamma \in \mathbb {F}_2^n\) and \(\varLambda _{\alpha }=(\varLambda _{\alpha }[n-1],\ldots ,\varLambda _{\alpha }[0])\), \(\varLambda _{\beta }=(\varLambda _{\beta }[n-1],\ldots ,\varLambda _{\beta }[0])\), \(\varGamma =(\varGamma [n-1],\ldots ,\varGamma [0])\), we define the vector \(u=(u[n-1],\ldots , u[0])\) where \(u[i] = 4\varGamma [i] + 2\varLambda _{\alpha }[i] + \varLambda _{\beta }[i], 0 \le u[i]<8, 0\le i < n\). The correlation can be computed with the following linear representation,

$$\begin{aligned} cor_{\boxplus }(\varGamma , \varLambda _{\alpha },\varLambda _{\beta }) = L A_{u[n-1]}A_{u[n-2]} \cdots A_{u[1]}A_{u[0]}C, \end{aligned}$$
(5)

where \(A_r, r=0, \ldots , 7,\) is \(2\times 2\) matrice,

$$ A_{0} = \frac{1}{2}\begin{bmatrix} 2&0\\ 0&1\\ \end{bmatrix}, A_{1} = A_{2} = -A_{4} = \frac{1}{2}\begin{bmatrix} 0&0\\ 1&0\\ \end{bmatrix}, $$
$$A_{7} = \frac{1}{2}\begin{bmatrix} 0&2\\ 1&0\\ \end{bmatrix}, -A_{3} = A_{5} = A_{6} = \frac{1}{2}\begin{bmatrix} 0&0\\ 0&1\\ \end{bmatrix}, $$

L is a row vector \(L = (1 \ 0)\),and C is a column vector \(C = (1 \ 1)^T\).

For example, for the linear approximation with binary vector masks \((\varGamma =10100, \varLambda _{\alpha }\,=\,11110,\varLambda _{\beta }=11000)\), \(u = 73620_8\) and \(cor_{\boxplus }\)=\(LA_7A_3A_6A_2A_0C\) = \(-2^{-3}\).

In order to provide a fast implementation for Theorem 3, Nyberg and Wallén utilized the automaton to calculate \(L A_{u[n-1]}A_{u[n-2]} \cdots A_{u[1]}A_{u[0]}C\) by multiplying from left to right. Let \(e_0 = L = (1 \ 0)\) and \(e_1 = (0 \ 1)\), then we can show the state transitions in Fig. 1. When reading u from left to right, if the automaton ends up in state 0, then \(L A_{u[n-1]}A_{u[n-2]} \cdots A_{u[1]}A_{u[0]}C=0\). If the automaton ends up in state \(e_0\) or \(e_1\), then \(L A_{u[n-1]}A_{u[n-2]} \cdots A_{u[1]}A_{u[0]}C=\pm 2^{-t}\), where t is the number of transitions marked by a solid arrow, and the sign is determined by the number of occurrences of \(\{3,4\}\):

$$L A_{u[n-1]}A_{u[n-2]} \cdots A_{u[1]}A_{u[0]}C>0$$

if and only if the number of occurrences is even. For example, as \(u=73620_8\), \(LA_7A_3A_6A_2A_0C=-2^{-3}\).

Fig. 1.
figure 1

State transitions for \(u=73620_8\)

Based on the Fig. 1, we will give Proposition 1 to calculate the absolute value of the correlation \(cor_{\boxplus }(\varGamma , \varLambda _{\alpha },\varLambda _{\beta })\).

Proposition 1

For the linear approximation of addition modulo \(2^n\) of two inputs with input masks \(\varLambda _{\alpha }, \varLambda _{\beta }\) and output mask \(\varGamma \), the state transitions of the automaton are shown in Fig. 2, where \(u[i] = 4\varGamma [i] + 2\varLambda _{\alpha }[i] + \varLambda _{\beta }[i], 0 \le u[i]<8, 0\le i < n\) and \(\epsilon _j \in \{e_0, e_1\}, 0\le j <n\). If the correlation for the linear approximation is non-zero, the absolute value of the correlation can be computed as follows,

$$\begin{aligned} \left| cor_{\boxplus }(\varGamma , \varLambda _{\alpha },\varLambda _{\beta }) x \right| =2^{-\#\{0<i <n|\epsilon _i=e_1\}}. \end{aligned}$$
(6)
Fig. 2.
figure 2

State transitions for addition modulo \(2^n\)

Based on Proposition 1, we construct the MILP model to compute the absolute value of correlation of modular addition with two inputs in the following.

4.2 MILP Model for Linear Approximations of Modular Addition

In this part, we will introduce a method to describe linear property of modular addition in Theorem 3 and Proposition 1 as linear inequalities.

For the state transition from \(\epsilon _{i+1}\) to \(\epsilon _{i}\) under u[i], \( 0\le i < n\), the bit variable \(s_i\) is defined as follows, \(s_i=0\) if \(\epsilon _i =e_0\), and \(s_i=1\) if \(\epsilon _i =e_1\). We utilize the vector \((s_{i+1}, \varGamma [i], \varLambda _{\alpha }[i], \varLambda _{\beta }[i], s_i)\) to denote the state transition, so \(e_{s_{i+1}} A_{u[i]}=e_{s_i}\). For the vector \((s_{i+1}, \varGamma [i], \varLambda _{\alpha }[i], \varLambda _{\beta }[i], s_i)\), there are only 10 possible transitions for the vector. As described in Sect. 3.2, we also use the \(\mathbf {inequality \_ generator()}\) function in the \(\mathbf {sage.}\) \(\mathbf {geometry.}\) \(\mathbf {polyhedron}\) class of SAGE and the greedy algorithm in [24] to get eight linear inequalities satisfying all 10 possible transitions. Note that there is an additional constraint \(\epsilon _{n}=e_0\) according to Fig. 2. Hence, for linear approximation of addition modulo \(2^n\) with two inputs, the constraints contain \(8 \times n + 1\) linear inequalities and the absolute value of correlation is \(|cor_{\boxplus }|\) = \(2^{-\sum _{i=1}^{n-1}{s_i}}\).

4.3 MILP Model for Linear Approximations for ARX Ciphers

For each XOR, three-forked branch and circular shift operations, we can also use the method in Sect. 2 to produce the linear inequalities or equations. All linear inequalities or equations for each operations in ARX ciphers compose the constraints of MILP model for linear approximations of ARX ciphers. We can set the objective function for r-round linear approximation as the \(\sum _{j=1}^{r}{\sum _{i=1}^{n-1}{s_i}}\) to find the minimal value of it. It means to find the optimized linear approximation. We use the Gurobi optimizer to solve the system of inequalities to search for the linear approximations.

As describe in the model of searching for differential characteristics, we assume that the two inputs to modular addition and the consecutive rounds are independent. However, for some ARX constructions, they are not independent. So the practical correlation of our identified linear approximations for some fixed key may vary significantly from that derived from our model.

5 Application to Speck

5.1 Description of Speck

Speck is a family of ARX-based block ciphers proposed by the National Security Agency of the USA in [2], which contains 10 variants. The variants are characterized by the block size of 2n bits (where n is the word size) and the key size of mn bits. The Speck block cipher variant with block size 2n and key size mn is denoted as Speck2n / mn. The rotation constants \(\alpha \), \(\beta \) used in round functions and the number of rounds r are listed in Table 3 for all variants of the Speck.

The round function of Speck consists of XOR, modular addition in \(\mathbb {F}_2^n\) and rotation operations. If we denote the subkey in the i-th round as \(k_i\), the input and output of the i-th round as \((x_{i-1}, y_{i-1}\)) and \((x_{i}, y_{i})\), the round function is operated as follows,

$$\begin{aligned} x_{i}&=((x_{i-1}\ggg \alpha )\boxplus y_{i-1})\oplus k_{i},\quad \quad y_{i}&=(y_{i-1}\lll \beta )\oplus x_{i}, \end{aligned}$$

where \(\alpha \) and \(\beta \) are rotation constants listed in Table 3.

Table 3. Parameters for speck family of block ciphers

5.2 Differential Characteristics and Linear Approximations of Speck

In this subsection, we will give the details how to use the models in Sects. 3 and 4 to search for the differential characteristics and linear approximations for Speck.

Firstly, we produce the system of inequalities to construct the model for the differential or linear trails for Speck based on the methods in Sects. 3 and 4. Then we use Gurobi optimizer to solve our MILP model as [21–24]. Indeed, other MILP optimizers, such as CPLEX [8], can also be used. The procedure of our method is outlined as follows.

  • Step 1: Convert the system of inequalities describing r rounds of Speck into a format that is readable by Gurobi.

  • Step 2: Use Gurobi to search for the trails with the input from Step 1.

Without loss of generality, we describe how to construct the model to search for the differential characteristic of r-round Speck32. We denote the input difference and the output difference for the i-th round as \(\varDelta z_{i-1}\) and \(\varDelta z_{i}\), respectively, \(\varDelta z_i=(\varDelta z_i^{31},\ldots , \varDelta z_i^0), 0< i \le r\).

If we denote the two input differences of modular addition in the i-th round as \(\alpha _i\) and \(\beta _i\) and the output difference as \(\gamma _i\), then we have \(\alpha _i = (\varDelta z_{i-1}^{22}, \ldots , \varDelta z_{i-1}^{16}, \varDelta z_{i-1}^{31}, \ldots , \varDelta z_{i-1}^{23}), \beta _i = (\varDelta z_{i-1}^{15}, \ldots , \varDelta z_{i-1}^0), \gamma _i = (\varDelta z_{i}^{31},\ldots , \varDelta z_{i}^{16}).\) According to Sect. 3, we can produce \(13 \times (16 - 1) + 5 = 200\) linear inequalities to represent the differential property of \((\alpha _i, \beta _i \rightarrow \gamma _i)\) for modular addition in the i-th round.

For the XOR operation of two branches in the i-th round, the two input differences are \((\varDelta z_{i-1}^{13}, \ldots , \varDelta z_{i-1}^{0}, \varDelta z_{i-1}^{15},\varDelta z_{i-1}^{14})\) and \((\varDelta z_{i}^{31},\ldots , \varDelta z_{i}^{16})\), and the output difference is \((\varDelta z_{i}^{15},\ldots , \varDelta z_{i}^{0})\). Thus we have

$$\begin{aligned} \varDelta z_{i-1}^{13} \oplus \varDelta z_{i}^{31}= & {} \varDelta z_{i}^{15},\\ \vdots \\ \varDelta z_{i-1}^{14} \oplus \varDelta z_{i}^{16}= & {} \varDelta z_{i}^{0}. \end{aligned}$$

Then we use Inequalities (4) to describe the differential property of XOR operation of two branches in the i-th round, so \(5 \times 16 = 80\) linear inequalities are produced. Therefore, we use the above produced \(200 + 80 = 280\) linear inequalities to describe the differential property of \((\varDelta z_{i-1} \rightarrow \varDelta z_{i})\).

In the similar way, \(280\cdot r\) linear inequalities are derived to describe the differential property of r rounds \((\varDelta z_{0} \rightarrow \varDelta z_{1} \rightarrow \ldots \rightarrow \varDelta z_{r})\). Moreover, one additional condition \(\sum _{j=0}^{31} \varDelta z_0^j >0\) is required to ensure the non-zero plaintext difference.

We set the objective function as the minimal value of \(\sum _{i=1}^{r}{\sum _{j=0}^{32-2}{\lnot eq(\alpha _i[j],\beta _i[j],\gamma _i[j])}}\) and convert the above \(280 \cdot r +1\) linear inequalities as constraints into the LP format that is readable by Gurobi. Finally, we use the Gurobi to find the the differential characteristic of r rounds of Speck32.

Similarly, the process of constructing the model to search for the linear approximations for Speck can be implemented using the model in Sect. 4. Here we will not provide the details about it due to the limited space. The source code is published in https://github.com/fukai6/milp_speck.git.

As we search for the differential and linear trails for Speck with block size greater than 48, we use the splicing heuristic in order to speed up the search process. The splicing heuristic is to search two short trails and concatenate them to produce a long trail. For example, we can search \(r_1\)-round differential characteristic with output difference \(\delta \) and \(r_2\)-round differential characteristic with input difference \(\delta \) to construct an (\(r_1+r_2\))-round differential characteristic. Based on the observation from our identified differential characteristic for small number of rounds and differential characteristics presented in [4], we find that the differential probability probably is better when the left of \(\delta \) is 0x80 and the right of \(\delta \) is 0. In this way, we manually choose different values of \(r_1\) and \(r_2\), and set \(\delta =0x80||0\) as the output difference or input difference to search two differential trails, then concatenate them to produce an (\(r_1+r_2\))-round differential characteristic. For the linear approximation, we set the input mask or output mask as 0x1||0.

Finally, the best differential characteristics and linear approximations we found are listed in Tables 4, 5, 6 and 7, where \(\sum _{i=1}^r \text {log}_{2}p_{i}\) and \(\sum _{i=1}^r \text {log}_{2}c_{i}\) are the probability of differential characteristic and the correlation of linear approximation, respectively.

Table 4. Differential characteristics for Speck32, Speck48 and Speck64
Table 5. Differential characteristic for Speck96 and Speck128
Table 6. Linear approximations of Speck32, Speck48 and Speck64

Note that the differential characteristics in Table 5 have been produced with \(r_1=11, r_2=4\) for Speck64 and \(r_1=11, r_2=5\) for Speck 96, respectively. For Speck128, with the splicing heuristic, we can only get an 18-round differential characteristic with the probability \(2^{-126}\) by setting \(r_1=r_2=9\) with reasonable time. Thus, in order to find a better trail, we firstly search for a 13-round differential trail with the splicing heuristic by setting \(r_1=9\) and \(r_2 = 4\), then extend six rounds before the 13-round differential trial to get the 19-round differential characteristic in Table 5, where the 6-round trail has been also found with Gurobi.

The linear approximations in Tables 6 and 7 have been identified with the parameters: \(r_1=10,r_2=3\) for Speck64, \(r_1=3, r_2=12\) for Speck96 and \(r_1=6, r_2=10\) for Speck128.

Table 7. Linear approximation of Speck96 and Speck128

For the runtime of the searching algorithm, we spent about several hours on personal computer (4 Core, Intel(R) Core(TM) i5 CPU 650 @3.20 GHz) for Speck32 and about one day on IBM server (64 Core, Intel(R)Xeon(R) CPU E7330, 2.40 GHz) for other variants of Speck. Note that we have searched for all the differential characteristics and linear approximations for Speck32, however, for other variants we aim to only find better trails than the previous ones but we cannot guarantee they are the best trails.

A summary of the differential characteristics and the linear approximations for Speck is provided in Table 2, which shows that we got better differential characteristics and linear approximations for Speck48, Speck64, Speck96 and Speck128.

In order to check the effect of the assumptions of independent inputs to the modular addition and independent rounds for Speck, we experimentally calculate the probability for our identified differential characteristics in Tables 4 and 5. As it is not feasible to do this for many rounds, we break down the differential characteristics to small overlapping segments according to the differential probability of the segments. The calculated probability of each one of these segments has been verified experimentally by encrypting sufficiently many random plaintext pairs for some arbitrary keys. The test results are shown in Table 8. In Table 8, the first column is the tested cipher, the second column shows rounds covered by the segment of differential characteristic, the third column is the theoretic differential probability of the corresponding segment of differential characteristic, the fourth column is the total number of random chosen plaintext pairs used in the test, the fifth column is the total number of tested key values, and the last column is the number of keys which can get the calculated differential probability no less than the theoretic differential probability. Note that we only test the last segment from round 12 to round 19 for Speck-128 because the theoretic differential probabilities for the previous segments are too low to be tested. From Table 8, we can see that the number of good keys significantly deviates from the mean for some cases, which is due to the independent assumptions for Speck cipher. Such deviation will have effect on the success rate of the attacks in the practitioner’s perspective.

Table 8. Test for differential characteristics in Tables 4 and 5

5.3 Key Recovery Attack on Speck

In [9] Dinur presented a generic key recovery framework for Speck which can extend the differential attack for more rounds. The idea of the framework uses the guess-and-determine technique instead of counting technique for standard key recovery attack. Furthermore, the cryptanalytic technique of ARX cipher is utilized to speed up the attack on Speck.

For Speck2n / mn, if the differential characteristic with probability p for \(r-1\) rounds has been found, the attacker can add one round at the top of the differential characteristic and m rounds at the bottom of the differential characteristic to cover \(r+m\) rounds in total. It is not necessary to guess the subkey in the first round as it has no effect on the input difference. At last, the attacker can recover the mn-bit secret key of a variant with \(r+m\) rounds of Speck2n / mn using \(2 \cdot p^{-1}\) chosen plaintexts with time complexity of \(2\cdot p^{-1} \cdot 2^{(m-2)n}\) and memory complexity of \(2^{22}\) bytes.

With our identified differential characteristics for Speck, we can give the improved key recovery attack. Since the attack is same as that in [9], details are omitted. For each variant of Speck, we summarize our attacks and the previous differential attacks in Table 1, which shows that our attacks for the variants of Speck with block size 48, 64, 96 and 128 are best attacks in terms of the number of rounds.

6 Conclusion

In this paper, we construct the MILP model to automatically search for differential and linear approximations for ARX ciphers by researching the differential and linear property of modular addition under the assumptions of independent inputs to the modular addition and independent rounds. Then we use the new MILP model to search for the differential characteristics and linear approximations for Speck cipher. Compared with the previous best differential characteristics for them, our identified differential characteristics for Speck64, Speck96 and Speck128 are extended for one, three and five rounds, respectively, and our differential characteristic for Speck48 has higher probability. We use those new differential characteristics to improve the currently best public attacks on the four variants of Speck. In addition, we searched for the linear approximations for Speck cipher and improved the previous linear approximations for Speck variants with block size greater than 32.