Keywords

1 Introduction

is a block cipher recently proposed by the U.S. National Security Agency (NSA) as a lightweight alternative to the widely-used AES [18]. Benefited from immense optimizations in its round function, performs well on both hardware and software platforms. Meanwhile, with the addition of its supporting various block sizes, is considered a very promising candidate for resources-constrained embedded applications. However, when actually implemented on practical embedded platforms, protection against side-channel attacks must be taken into account.

Side-channel attacks (SCA) can recover sensitive information (e.g. key) of cryptographic devices by exploiting physical leakages (e.g. execution time [3], power consumption [4]) during the execution of cryptographic algorithms. This kind of attacks work because the observed physical leakages depend on the manipulated data. If the manipulated data is sensitive, the leaking information about them enables key-recovery attacks. Typically, the adversary calculates the hypothetical leakage based on a power model and a key hypothesis, then compares it with the actual leakage to determine if the key hypothesis is correct. A SCA attack that exploits the leakage related to one intermediate variable is called the first-order SCA, and an attack that exploits the combined leakage resulting from two or more intermediate variables is called second-order or higher-order SCA.

Indeed, the unprotected implementations of have been shown to be vulnerable to side-channel attacks, see [21, 22]. An especially popular and widely used SCA countermeasure is masking, as introduced in [5]. It consists in randomizing all intermediate variables with random numbers. When a sensitive variable x is masked by \(x^{'}=x \oplus r\), the manipulation on x is finished by processing the masked variable \(x^{'}\) and the mask r separately. This countermeasure makes the processed data independent of the sensitive data, thus secure against the first-order SCA. Masking schemes of have been proposed in [21, 24]. The masking scheme in [24] is designed to achieve the requirements for being a threshold implementation and it is provably secure against the first-order SCA. However, if implemented on software, this scheme can be broken in practice by the three-order SCA, which exploits the combined leakage related to three shares. The masking scheme in [21] handles the non-linear transformation of by partially unmasking the input and using the input mask to re-mask the output after the transformation. The above process is performed in a single look-up table operation so that no leaking about any sensitive data occurs. The security of this scheme depends on its realization and it can provide resistance against some common first-order SCA. However, this scheme is breakable in practice by the second-order SCA, which combines leakage information coming from two intermediate values with the same mask. In conclusion, those masking schemes just provide resistance against the first-order SCA and can be broken in practice by the second-order or higher-order SCA. Therefore, to counteract those attacks, higher-order masking must be used.

The higher-order masking is a generalization of the first-order case. It randomly splits every variable x into d+1 shares by letting \(x=x_0\oplus \ldots \oplus x_d\) as in a secret-sharing scheme [1]. Then, the shares \(x_i^{'}s\) are processed separately in such a way that the combined leakage about any tuple of d shares is independent from the sensitive variable x. In the past several years, a number of higher-order masking schemes have been proposed for block ciphers, and those schemes can be roughly divided into three categories: randomized computation based, randomized table based and mask conversion based. Randomized computation based schemes are the most popular one, which includes the famous hardware-oriented ISW schemes [6], the software-oriented RP schemes [12] and their successors [13, 17, 19]. Those schemes mainly target block ciphers based on s-box (e.g. AES, PRESENT [10]). The two remaining types are less popular, mainly described in [20, 23]. In addition, randomized table based schemes are suitable for the block ciphers using look-up table and mask conversion based schemes are dedicated for the block ciphers which combines boolean operations with arithmetic operations (ARX-based block ciphers like HIGHT [9], KTANTAN [11]). has neither s-box and look-up table nor any arithmetic operations, thus being beyond the protection scope of existing higher-order masking schemes. Therefore, it is very important to design higher-order masking schemes for .

In this paper, we design two higher-order boolean masking schemes for the software implementations of . The first scheme deals with inapplicability of the ISW scheme [6] on software and we propose a partition based method to solve it by exploiting the bit-oriented structure of . The second scheme is based on the design principle similar to Coron-Prouff-Rivain-Roche’s scheme [19]. Compared with Coron-Prouff-Rivain-Roche’s scheme, our scheme requires less random bits. The security proof of the two proposals is given in this paper and they are proven to be secure against \(d^{th}\)-order SCA when using \(n> d\) shares. In addition, their implementation performance on 8-bit AVR platforms is evaluated and the evaluation results show that the two schemes have a reasonable implementation cost. For example, the second-order masked implementations of the two schemes are just 12.6 and 7.4 times slower than the unprotected case respectively and a small amount of additional memory usage is required.

The rest of this paper is organized as follows. Section 2 briefly introduces some background knowledge. Sections 3 and 4 describe the two proposed masking schemes and their security analyses. Section 5 gives the implementation performance and Sect. 6 concludes the whole paper.

2 Preliminaries

2.1 Simon

is a block cipher based on the Feistel structure. supports blocks with 32, 48, 64, 96 and 128 bits. For each input size, it has a set of allowable key sizes ranging from 64 bits to 256 bits. Following the principles of Feistel structure, the input is split into two words. The corresponding key is also split into two to four words, which are used in the first round of . The number of rounds in ranges from 32 to 72, depending on the block and key sizes. For example, 64/128 has a block size of 64 bits and a key size of 128 bits. It generates a ciphertext after 44 rounds.

Given a round key k, the round function is defined on two n-bit input words x and y as:

$$\begin{aligned} R_k(x,y)= (\; y\oplus h(x)\oplus (x\lll 2)\oplus k,\; x), \end{aligned}$$
(1)

where \( h(x)= (x\lll 1) \& (x\lll 8)\) is a non-linear operation. For the sake of clarification, \(R_k(x,y)\) is represented by the left part of its output in the rest of paper. The key schedule algorithm of is an entirely linear operation.

2.2 Higher-Order Boolean Masking

Higher-order boolean masking is to protect cryptographic implementations against higher-order SCA. It randomly splits every sensitive variable x entering into the computation into d+1 variables \(x_0,x_1,\ldots ,x_d\), satisfying the following equation:

$$\begin{aligned} x_0\oplus x_1\oplus \ldots \oplus x_d =x. \end{aligned}$$
(2)

When a \(d^{th}\)-order masking is involved in protecting a block cipher implementation, a \(d^{th}\)-order masking scheme should be designed to enable computation on (d+1) boolean shares. As described in [12], when keeping the correctness of the computation, the \(d^{th}\)-order masking scheme must achieve \(d^{th}\)-order SCA security, which is defined as follows.

Definition 1

A (software implementation of) masking scheme is said to achieve \(d^{th}\)-order SCA security if every tuple of d or less intermediate variables is independent of any sensitive variable.

Moreover, in [12], Rivain and Prouff introduce a method to prove the \(d^{th}\)-order security of masking schemes. For an elementary masking scheme, proving its security applies similar technique as zero-knowledge proofs [2]. One shall show that the distribution of every d-tuple of intermediate variables can be perfectly simulated without knowing any sensitive variables. For a complex masking scheme (usually consisting of several elementary masking schemes), one should show that every involved elementary masking scheme achieves \(d^{th}\)-order SCA security, and then the security of the complex masking scheme should be demonstrated. In this paper, we follow the above security definition and method of proof because they seems well suitable for proving the security of our masking schemes.

2.3 Higher-Order Boolean Masking of SIMON

The key schedule algorithm of is a linear operation and the encryption of consists of the repetition of r rounds of an identical transformation. Therefore, designing a higher-order boolean masking scheme for lies in masking the round function of . The round function of makes use of three n-bit operations: xor \((\oplus )\), AND \( ( \& )\) and circular shift \((\lll )\). Among those, xor and circular shift operations are linear (easy to mask) and AND is a non-linear operation (hard to mask). Therefore, the AND operation, more precisely, the non-linear transformation \( h(x) = (x\lll 1) \& (x\lll 8)\) is the main difficulty of masking the round function of . If we assume that the non-linear transformation h(x) is protected by a masking algorithm, denoted as SecH\((x_0,\ldots ,x_d)\), then the masking scheme for the round function of can be described in Algorithm 1. It is clear that the security of the Algorithm 1 depends on the security of SecH. In the following two sections, we will describe two masking schemes for the non-linear transformation h(x) and prove their security in the probing model.

figure a

3 The First Scheme: Partition Based Masking Scheme

In this section, we introduce the first higher-order boolean masking scheme for the non-linear transformation h(x). This scheme is based on the ISW scheme [6]. We firstly recall the ISW scheme and then describe our scheme. The proof of security and implementation tricks of this scheme are also given in this section.

3.1 Description

Ishai-Sahai-Wagner’s Scheme.Ishai-Sahai-Wagner’s (ISW) scheme is a higher-order boolean masking scheme tailored to the boolean circuit. The core construction of this scheme is a masking algorithm for the binary AND operation. Let a, b be binary values from \(F_2\) and let \((a_i)_i\), \((b_i)_i\) be the (d+1) shares of a and b respectively. The algorithm securely computes a sharing \((c_i)_i\) of \( c =a \& b\) from \((a_i)_i\) and \((b_i)_i\) as follows:

  1. 1.

    For each \(0\le i < j\le d\), generate a random bit \(r_{i,j}\);

  2. 2.

    For each \(0\le i < j\le d\), compute \(r_{j,i}\)= \( (r_{i,j}\oplus a_i \& b_j)\oplus a_j \& b_i;\)

  3. 3.

    For each \(0\le i\le d\), compute \( c_i = a_i \& b_i \oplus \bigoplus _{j\ne i} r_{i,j}\).

This scheme is sound and can achieve \((\frac{d}{2})^{th}\)-order SCA security (proved in [6]). Furthermore, in [12], it is shown that this scheme is actually \(d^{th}\)-order secure if the input shares \((a_i)_{i\ge 1}\) and \((b_i)_{i\ge 1}\) are mutually independent. It is noteworthy that the ISW scheme is theoretical, and if implemented on practical platforms, it will suffer a prohibitive overhead (see [7, 12]).

Partition Based Masking Scheme.Since the non-linear transformation h(x) is bit-oriented, it is straightforward to apply the ISW scheme to protect the computation of h(x). However, this approach is impractical in software platform because it needs a lot of bit operations.

In this paper, we propose a partition based masking scheme to improve the implementation efficiency. Before describing this method, some notations are defined. For n-bit x, the \(i^{th}\) bit of x is denoted as \(x_i\) and the bitset of x is \(\{x_0,\ldots ,x_{n-1}\}\) . The index set \(\{0,\ldots ,n-1\}\) is represented as [0,n-1] and for a subset I of index set, \(x|_I\) is defined as \(\{x_i\}_{i\in I}\). Now let us describe our method. Firstly, we extend the ISW scheme to the multi-bit case and we denote the extended scheme for n-bit AND as \(SecAnd(n,\ldots ,\ldots )\). Secondly, we divide the bit set of h(x) into a partition and the partition must satisfy an additional condition that for every subset S in the partition, each bit of x is required to be used at most once to calculate the S. Thirdly, we use \(SecAnd(n,\ldots ,\ldots )\) to securely compute each subset S in the partition. Taking 32/64 as an example, x has a size of 16 bits and we get a partition of the bitset of y=h(x) as depicted in Table 1. Then the computation of each subset in the partition is protected by the \(SecAnd(n,\ldots ,\ldots )\). Note that for n-bit y, a partition of its bitset can be represented a partition of the index set [0,n-1]. In the following, for clarification, we use the partition of the index set instead of the partition of the bitset.

Table 1. A example of partition for 16-bit y.

For supporting a word size of n bits, if a partition of the index set [0,n-1] is obtained, then the partition based masking scheme can be described in Algorithm 2.

figure b

where t is the number of subset in the used partition and \(I_j\) represents the \(j^{th}\) subset. The difference between our method and the ISW scheme is that the latter is bit-oriented, while the former is bitset-oriented. In the following, we will show that our method is suitable for realization on software platforms.

3.2 Security Analysis

Before proving the security of our scheme, we need to introduce two lemmas (proven in [12]) as follows.

Lemma 1

A masking scheme achieves \(d^{th}\)-order SCA security if and only if the distribution of every d-tuple of its intermediate variables can be perfectly simulated from at most d shares of each of its input (d+1)-families.

Lemma 2

If a masking scheme T achieves \(d^{th}\)-order SCA security then the distribution of t \((t \le d )\) intermediate variable of T can be perfectly simulated from at most t shares of every input (d+1)-families of T.

The following theorem states the security of our scheme.

Theorem 1

The partition based masking scheme (Algorithm 2) is correct and can achieve \(d^{th}\)-order SCA security.

Proof. The correctness of this scheme can be directly derived from that of the ISW scheme and we focus on its security proof. Firstly, due to the property of each subset in the partition, the input shares of \(SecAnd(n_i,\ldots ,\ldots )\) in Algorithm 2 are mutually independent. Therefore, each \(SecAnd(n_i,\ldots ,\ldots )\) achieves \(d^{th}\)-order security (as proven in [12]). Secondly, given any a d-tuple \(v=(v_1,\ldots ,v_d)\) of Algorithm 2, by Lemma 2, perfectly simulating the \(c_i\) intermediates from the \(i^{th}\) \(SecAnd(n_i,\ldots ,\ldots )\) requires at most \(c_i\) shares of each of the involved inputs. Thus, perfectly simulating the distribution of the whole d-tuple v requires at most \(\sum ^t_{i=1}c_i=d\) shares of each of the involved inputs. Therefore, by the Lemma 1, Algorithm 2 achieves \(d^{th}\)-order SCA security.

3.3 Implementation Aspect

In order to efficiently implement Algorithm 2 on software platforms, we should choose such a partition that the size of each bit subset in it is equal to the size of word of target platform. E.g. for 8-bit platforms, the size of each bit subset should be 8. And for 8-bit platforms, we provide a simple method to obtain such a partition. We rewrite n=m*8 (m bytes), then we define the partition by constructing the \(j^{th}\) subset as \(y|_{I_j}:=\{y_j,y_{j+m},\ldots ,y_{j+7m}\}(0\le j< m)\). The correctness of this method can be easily verified.

4 The Second Scheme: Linearity Based Masking Scheme

In this section, we introduce the second higher-order boolean masking scheme for the non-linear transformation h(x). This scheme is based on the design principle similar to Coron-Prouff-Rivain-Roche’s scheme [19]. Therefore, we firstly recall Coron-Prouff-Rivain-Roche’s scheme, then describe our scheme. The security analysis and the comparison with Coron-Prouff-Rivain-Roche’s scheme are also given in this section.

4.1 Description

Coron-Prouff-Rivain-Roche’s Scheme. In [19], Coron et al. propose a masking scheme for the field multiplication of the form \(x\odot g(x)\), where \(\odot \) represents the multiplication over \(F_{2^n}\) and g(x) is a \(F_2\)-linear function. Alogrithm 3 describes this masking scheme. The construction of Algorithm 3 is based on the \(F_2\)-linearity of function g. If we define \(f(x,y) = (x\odot g(y)) \oplus (g(x)\odot y)\), the \(F_2\)-linearity of function g implies the \(F_2\)-bilinearity of f(x,y). Namely, for any \(x, y, r \in F_{2^n}\), we have \(f(x,y) = f(x,y\oplus r)\oplus f(x,r) = f(x\oplus r,y)\oplus f(r,y)\). Based on the \(F_2\)-bilinearity of f(x,y), another random values \(r'_{i,j}\) can be introduced to securely process the \(r_{j,i}= r_{i,j}\oplus f(x_i,x_j)\) (line 8 in Algorithm 3) as follows.

figure c
$$\begin{aligned} r_{j,i} =(r_{i,j}\oplus f(x_i,r^{'}_{i,j}\oplus x_j))\oplus f(x_i,r^{'}_{i,j}) \end{aligned}$$
(3)

The brackets in Eq. 3 specify the order in which the operations are performed. In addition, as show in [19], Algorithm 3 is sound and \(d^{th}\)-order secure. In the rest of paper, for simplicity, we denote CPRR scheme as Coron-Prouff-Rivain-Roche’s scheme.

Linearity Based Masking Scheme. CPRR scheme is designed for the field multiplication, thus invalid for the computation of h(x). In this part, firstly we observe the \(F_2\)-linearity related to the non-linear transformation h(x) and design a masking scheme for h(x) based on the CPRR scheme. The explanation about the masking algorithm is described as follows.

If we define \(g_1(x)=(x\lll 1)\) and \(g_2(x)=(x\lll 8)\), h(x) can be rewritten as \( h(x)=g_1(x) \& g_2(x)\), where function \(g_1(x)\) and \(g_2(x)\) are \(F_2\)-linear. Then we define \( F(x,y)= (g_1(x) \& g_2(y))\oplus (g_2(x) \& g_1(y))\). It can be checked that the \(F_2\)-bilinearity of F(x, y) holds. The \(F_2\)-bilinearity of F(x, y) enables us to apply exactly the same steps as the CRPP scheme to securely compute the h(x) except that the operation \(\odot \) is replaced by&.

Moreover, we show that the number of random bits required by the CRPP scheme can be reduced in half (from \(d(d+1)\) to \(d(d+1)/2\)). Observing the Eq. 3, if the \(f(x_i,r^{'}_{i,j}\oplus x_j))\) and \(f(x_i,r^{'}_{i,j})\) are processed separately, the random value \(r_{i,j}\) is redundant. Based on the above observation, we design the optimized masking scheme for h(x) and it is summarized in Algorithm 4.

figure d

4.2 Security Analysis

Theorem 2

The linearity based masking scheme (Algorithm 4) is correct and can achieve \(d^{th}\)-order SCA security.

Proof. Our proof consists of two parts: correctness proof and security proof. Firstly, we can get the following two equations from the Algorithm 4.

$$\begin{aligned} t_{i,j}=f(x_i\oplus r_{i,j},x_j)\;\;\;\;t_{j,i}=f(r_{i,j},x_j) \end{aligned}$$
(4)

Then, based on the Eq. 4 and the \(F_2\)-bilinearity of F(x, y), we have

$$ \begin{aligned} \bigoplus ^{d}_{i=0}y_{i} = (\bigoplus ^{d}_{i=0}(x_i\lll 1)) \& (\bigoplus ^{d}_{i=0}(x_i\lll 8)) =h(x) \end{aligned}$$
(5)

Therefore, the correctness of Algorithm 4 is proven.

Our security proof sketch is similar to that of CPRR scheme [19] and consists of two stages. This first stage is to construct a strict subset I of indices in [0,d]. The second stage is to design a simulator to perfectly simulate the distribution of the d-tuple v from \(x|_I = (x_i)_{i\in I}\). By Lemma 1, this will prove the \(d^{th}\)-order security as long as the cardinality of I is strictly smaller than d+1. The details of the proof are given in appendix A. Note that the construction of I and the simulator in our proof are different from that for CPRR scheme.

4.3 Comparison with CPRR Scheme

The design principle of our scheme is similar to CPRR scheme, but the construction is totally different. Specifically, the random values \(r_{i,j}\) (line 3 in Algorithm 3) are removed in our scheme. In addition, the \(f(x_i\oplus r_{i,j},x_j)\) and \(f(r_{i,j},x_j)\) are stored in two different variables in our scheme. Those changes result in significant efficiency improvement as illustrated in Table 2.

Table 2. Complexity comparison between our optimized scheme and CPRR scheme regarding the total number of operations

From the Table 2 we can see that the number of random bits required by our scheme reduces in half and the number of other operations remain reduced or unchanged. In addition, the implementation tricks (optimization based look-up table) for CPRR scheme are also suitable for our scheme when the size of x is small (e.g. smaller than 10).

5 Implementation Result

To evaluate the implementation efficiency of our two proposals, we have implemented the masked 64/128 round function for \(d \in \{1, 2, 3\}\) on a 8-bit AVR micorcontroller in assembly language. Table 3 lists the implementation performanceFootnote 1 of each masking scheme. It can be seen that for the second-order case, our two implementations are just 12.6 and 7.4 times slower than the unprotected implementation respectively. Those numerical values mean our masking schemes can be used practically in the embedded system. Furthermore, we find that the second scheme is better than the first regarding all aspects of implementation performance. This is mainly due to the fact that two additional bit permutations are used in the implementations of the latter.

In addition, we compare the implementation efficiency of our optimized scheme (Algorithm 4) in this paper with that of CPRR scheme [19]. We have implemented the masked non-linear transformation (h(x)) of 64/128 for \(d \in \{1, 2, 3\}\). In order to make the results more practical, we include the step of generating random numbers in the masked implementation and the pseudo-random function (rand()) in the standard C library is used. The execution times (in clock cycles) are reported in Table 4. As expected, our optimized scheme outperforms CPRR scheme and has a timing gain of at least 47 % and 48 % for the first-order and second-order cases respectively.

Table 3. The summary of implementation datas for a masked implementation of the 64/128 round function with masking order \(d\in \{1,2,3\}.\)
Table 4. Execution time (in clock cycles) for a masked implementation of the non-linear trans-formation (h(x)) of 64/128 w.r.t masking order d.

6 Conclusion

In this paper, we present two \(d^{th}\)-order boolean masking schemes for the software implementations of and prove the \(d^{th}\)-order SCA security of those two schemes. Our implementation results show that the proposed two schemes have a comparable implementation cost. In addition, we also compare the optimized scheme (the second scheme) in this paper with CPRR scheme with regard to implementation efficiency and the results confirm that our scheme executes the algorithmic computation in a more efficient manner.