Rigorous upper bounds on data complexities of block cipher cryptanalysis

Subhabrata Samajder; Palash Sarkar

doi:10.1515/jmc-2016-0026

Open Access Published by De Gruyter September 21, 2017

Rigorous upper bounds on data complexities of block cipher cryptanalysis

Subhabrata Samajder and Palash Sarkar

From the journal Journal of Mathematical Cryptology

https://doi.org/10.1515/jmc-2016-0026

Abstract

Statistical analysis of symmetric key attacks aims to obtain an expression for the data complexity which is the number of plaintext-ciphertext pairs needed to achieve the parameters of the attack. Existing statistical analyses invariably use some kind of approximation, the most common being the approximation of the distribution of a sum of random variables by a normal distribution. Such an approach leads to expressions for data complexities which are inherently approximate. Prior works do not provide any analysis of the error involved in such approximations. In contrast, this paper takes a rigorous approach to analyzing attacks on block ciphers. In particular, no approximations are used. Expressions for upper bounds on the data complexities of several basic and advanced attacks are obtained. The analysis is based on the hypothesis testing framework. Probabilities of type-I and type-II errors are upper bounded by using standard tail inequalities. In the cases of single linear and differential cryptanalysis, we use the Chernoff bound. For the cases of multiple linear and multiple differential cryptanalysis, Hoeffding bounds are used. This allows bounding the error probabilities and obtaining expressions for data complexities. We believe that our method provides important results for the attacks considered here and more generally, the techniques that we develop should have much wider applicability.

Keywords: Block cipher; linear cryptanalysis; differential cryptanalysis; log-likelihood ratio test; hypothesis testing; Chernoff bound; Hoeffding’s inequality

MSC 2010: 94A60; 11T71; 68P25; 62P99

1 Introduction

Statistical methods are commonly used for analyzing attacks on block ciphers and more generally symmetric key ciphers. For an attack that aims at recovering a portion of the secret key, there are three basic parameters of interest. (For a distinguishing attack, the situation is a little different and we consider this later.)

The success probability PS, i.e., the probability that the correct key will be recovered by the attack.
The advantage a such that the number of false alarms is a fraction 2-a of the number of possible values of the sub-key which is the target of the attack.
The data complexity N, which is the number of plaintext-ciphertext pairs required to achieve at least a pre-specified success probability and at least a pre-specified advantage.

The main goal of any statistical analysis of an attack is to be able to express the data complexity N in terms of PS and a. All the known methods for doing this, however, provide only approximate expressions for N without deriving bounds on the approximation errors.

1.1 Our contributions

The major motivation of this work is to derive rigorous upper bounds on the data complexity in terms of PS and a. In particular, we do not use any approximation in the statistical analysis^[1]. To show that this can indeed be done, we consider five basic cryptanalytic scenarios. These are single linear cryptanalysis, single differential cryptanalysis, multiple linear cryptanalysis, multiple differential cryptanalysis and the task of distinguishing between two probability distributions. In each case, we show that it is indeed possible to obtain rigorous upper bounds on the data complexity.

The theoretical work is supported by several computations. For the block cipher Serpent, we use the joint distribution of multiple linear approximations [15] to compute the approximate data complexity given by the analysis in [19] and also the upper bound on data complexity obtained in this work. The ratio of these two values turns out to be between 43 and 63. We further make detailed experimental comparisons of the upper bounds that we obtain to the previously best-known approximate values of data complexities by using simulated joint distributions. For the cases of single linear cryptanalysis, single differential cryptanalysis and distinguisher, the ratio of the upper bound to the approximate expression is around 10 or smaller. For multiple linear cryptanalysis, the ratio is between 4 and 200. These indicate that the upper bounds that we obtain are not too far away from the approximate values obtained earlier. From a practical point of view, we think it is better to use the upper bound to measure the strength of a cipher, since it may turn out that the approximate data complexities are actually underestimates.

For multiple differential cryptanalysis, however, the upper bound turns out to be much larger than the approximate estimate obtained earlier. The reason for this could be one or both of the following: the approximate value is an underestimate or the upper bound is an overestimate. Deciding the exact reason requires more work.

The data complexity expressions that we obtain are valid for all values of the success probability PS and advantage a. So, for example, these expressions can be evaluated to obtain data complexities for PS=0.1. Such an attack has a 10% chance of being successful and from a cryptanalytic point of view would be considered a valid attack. In earlier work on multiple linear cryptanalysis [19] for the data complexity expressions to be valid, the condition PS>0.5 is required. This is mentioned in [19] without any explanation. It turns out that the condition PS>0.5 is a consequence of using the normal approximation, and we refer to [32] for more details on this issue.

The hypothesis testing based approach is used to analyze the attacks. This requires obtaining the probabilities of type-I and type-II errors. In the approximate analysis, normal approximations are used to conveniently handle these probabilities. We use a different approach. The type-I and type-II error probabilities are essentially tail probabilities for a sum of some random variables. There are known rigorous methods for handling such tail probabilities, though, to the best of our knowledge, these methods have not been applied to the hypothesis testing setting.

For the cases of single linear and single differential cryptanalysis, it is required to bound the tail probabilities of a sum of independent Bernoulli distributed random variables. The usual method for handling this is to use the Chernoff bound. Using the Chernoff bound to upper bound the type-I and type-II error probabilities quite nicely leads to an expression for the data complexity.

In the cases of multiple linear or multiple differential cryptanalysis, the test statistic is no longer a sum of Bernoulli distributed random variables. As a result, the Chernoff bound does not apply. To tackle these cases, we take recourse to Hoeffding’s inequality. This inequality allows us to bound the required tail probabilities to obtain upper bounds on the type-I and type-II error probabilities. The case of distinguisher is tackled similarly.

The importance of our work is twofold. On the one hand, we bring an amount of rigor to the statistical treatment of basic block cipher cryptanalysis. More generally, the techniques that we apply have broad applicability and it should be possible to tackle data complexities of other attacks by using these techniques. From a practical point of view, our computations confirm that the upper bounds that we obtain are greater than the approximate data complexities reported earlier. Since it is not known whether the approximate values are under- or overestimates, we think it is better to use the upper bounds.

It is possible to simulate attacks on concrete block ciphers to determine the actual amount of data required to achieve a certain success probability and an advantage. For such an exercise to be meaningful, it has to be comprehensive. In particular, it needs to be conducted on a number of block ciphers representing different design approaches and different values of the block and key sizes. Since the attacks can be simulated only within certain limited computational resources, there is also a need to develop some method for extrapolating the experimental results to sizes of real interest. Convincingly carrying out these tasks is beyond the scope of the present work and is a possible future work.

1.2 Bounds on data complexity

We separately discuss the issue for key recovery attacks and distinguishing attacks.

Case of key recovery attacks. Let Nmin⁢(PS,a) be the minimum amount of data required to achieve success probability at least PS and advantage at least a, where the minimum is over all possible methods of statistical analysis. Any particular method of statistical analysis provides an expression for the data complexity that is required if the method is followed. Considering a statistical analysis as an algorithm 𝒜, let N𝒜⁢(PS,a) denote the data complexity expression obtained using 𝒜 to obtain success probability at least PS and advantage at least a. Clearly, N𝒜⁢(PS,a) is an upper bound on Nmin⁢(PS,a). It is also a lower bound in the sense that at least N𝒜⁢(PS,a) amount of data will be required to achieve the parameters PS and aif the method 𝒜 is followed.

A bound N𝒜⁢(PS,a) obtained using a statistical method 𝒜 is useful to a cryptanalyst. It tells the cryptanalyst that this amount of data is sufficient to attain success probability at least PS and advantage at least a. Put another way, an upper bound tells a cryptanalyst that no more data is required to achieve the attack parameters.

From a cipher designer’s point of view, a data complexity expression of the type N𝒜⁢(PS,a) is also useful. It tells the designer that if method 𝒜 is followed, then at least N𝒜⁢(PS,a) amount of data is required to attain the parameters PS and a. This provides useful information in quantifying the resistance of the cipher against a particular type of attack. This is particularly important if 𝒜 is the best-known method for carrying out the statistical analysis. It would be even more useful to a cipher designer to obtain Nmin⁢(PS,a). Unfortunately, to the best of our knowledge, there is no work in the literature which provides this information.

Case of distinguishing attacks. A distinguishing attack proceeds as a test of hypothesis to distinguish between two different probability distributions. In this case, the data complexity is considered to be a function of the error probability which is defined to be half the sum of the probabilities of type-I and type-II errors. Let Nmin⁢(Pe) be the minimum amount of data required to ensure that the error probability is at most Pe, where the minimum is over all possible methods of statistical analysis. For a particular statistical method 𝒜, let N𝒜⁢(Pe) be the data complexity required to ensure error probability at most Pe. Similar to the case of key recovery attacks, N𝒜⁢(Pe) is an upper bound on Nmin⁢(Pe) and at least N𝒜⁢(Pe) amount of data is required to ensure error probability at most Pe if the method 𝒜 is followed. Also, the usefulness of N𝒜⁢(Pe) to a cryptanalyst and to a cipher designer remains the same as in the case of key recovery attacks.

An asymptotic expression for Nmin⁢(Pe) has been described in [4]. The expression is given in terms of the Chernoff information which involves taking an infimum over all real numbers in (0,1). Consequently, the resulting expression cannot be computed and [4] provides approximations.

To the best of our knowledge, all previously proposed statistical methods either for key recovery attacks or for distinguishing attacks use approximations to obtain expressions for data complexity without detailed analysis of the approximation errors. Consequently, the obtained data complexities cannot be considered to be either lower or upper bounds. The present work provides upper bounds on the data complexities, and we write rigorous upper bound to emphasize that no approximations are used in our analysis.

1.3 How good are the bounds?

The bounds on data complexity that we obtain crucially depend on the bounds for tail probabilities that we use. We have used the Chernoff and the Hoeffding bounds. These are general bounds which apply to sums of independent random variables. This leads to the question of whether better bounds are known and whether these can be applied to the current context?

The theory of large deviations is concerned with the probability of rare events and so tail probabilities can be handled by this theory. It can be shown that the tail probability is upper bounded by an exponential in N times a function called the rate function. This rate function is the Legendre transform of the moment generating function of the corresponding random variable. In theory, it is indeed possible to express the tail probabilities in terms of the rate function. However, this does not automatically provide meaningful bounds for the data complexity. There are several difficulties involved. For a more detailed discussion of these difficulties, we refer to [33].

1.4 Previous and related works

Linear cryptanalysis. This was first proposed by Matsui in [26] to cryptanalyze the block cipher DES. Later Matsui [27] extended this idea by using two linear approximations. In an independent work, Kaliski and Robshaw [22] extended Matsui’s attack involving single linear approximation to ℓ (≥1) linear approximations. Their result, however, was restrictive as it is required for all ℓ linear approximations to have the same plaintext and ciphertext bits though the key bits could be different.

Biryukov et al. [8] further refined the idea of multiple linear cryptanalysis. The authors considered ℓ linear approximations without any assumption on their structure. This, though, also had a restriction. The analysis was valid only for ℓ independent linear approximations. Analysis under the independence assumption was separately done by Junod and Vaudenay [21]. Murphy [30] argued that the independence assumption need not be valid.

In a later work, Baignères et al. [2] used the log-likelihood ratio (LLR) statistic to build an optimal distinguisher between two distributions. This result did not require the independence assumption. The theme of obtaining optimal distinguishers was also investigated in [20, 3].

Selçuk [34] proposed an order statistics based ranking methodology for analyzing single linear and differential cryptanalysis. The paper provided expressions for the data complexity of these attacks. The order statistics based approach uses a well-known theorem from statistics to approximate the distribution of an order statistics using the normal distribution. Consequently, the data complexities obtained in [34] are approximate. The order statistics based approach was built upon by Hermelin et al. [19]. They combined the results obtained in [2, 30, 31, 34] to develop a multilinear cryptanalytic method without the independence assumption.

Differential cryptanalysis. This cryptanalytic method was first proposed by Biham and Shamir in [7]. It was used to successfully cryptanalyze reduced round variants (with up to 15 rounds) of DES using less than 256 operations. Later in [6], the authors further improved their attack by considering several differentials having the same output difference. Over time, several variants of differential cryptanalysis have been proposed. These include higher-order differentials [24], truncated differentials [23], cube attack [16], boomerang attack [36], impossible differential cryptanalysis [5] and improbable differential cryptanalysis [35].

The general approach to multiple differential cryptanalysis was considered in [10]. This work considered ℓ differentials having both unequal input and unequal output differences. The case of ℓ differentials having the same input differences but different output differences was analyzed in detail in [11]. The order statistics based framework was used to derive an expression for the data complexity. A general study of data complexity and success probability of statistical attacks was carried out in [12].

We note that a recent work [32] performs a concrete analysis of normal approximations used in symmetric key cryptanalysis by using the Berry–Esseen theorem. In particular, the work critiques the order statistics based approach advocated by Selçuk [34] and points out several shortcomings. More generally, the entire approach of using normal approximations (without consideration of the error) is questioned.

A related line of work is based on the key dependent behavior of linear and differential characteristics [1, 9, 13, 25] and uses approximations. The techniques introduced in this paper should also be applicable to this setting and can form the basis for future work.

2 Background

In this section, we provide the background for the work. The section starts with a brief background on block cipher cryptanalysis (to the extent necessary for understanding this paper) with emphasis on linear cryptanalysis. Next we provide some details about the important log-likelihood ratio (LLR) test statistics. Appendix A provides relevant details of tail probability inequalities, specifically the Chernoff bound for Poisson trials and the Hoeffding bounds.

2.1 Background for block cipher cryptanalysis

The description of block cipher cryptanalysis given here is tailored towards linear cryptanalysis. Differential cryptanalysis is separately considered later.

A block cipher is a function E:{0,1}k×{0,1}n→{0,1}n such that for each K∈{0,1}k the function

EK⁢(⋅)⁢=Δ⁢E⁢(K,⋅)

is a bijection from {0,1}n to itself. Here K is the secret key. The n-bit input to the block cipher is called the plaintext, and the n-bit output of the block cipher is called the ciphertext.

Practical constructions of block ciphers have an iterated structure consisting of several rounds. Each round consists of applying a round function parameterized by a round key. The round functions are bijections of {0,1}n. An expansion function, called the key scheduling algorithm, is applied to the secret key to obtain round keys. Let the round keys be k(0),k(1),…, and denote the round functions as Rk(0)(0),Rk(1)(1),…. We assume that each of the round keys consists of 𝔨 bits. Further, denote by K(i) the concatenation of the first i round keys, i.e., K(i)=||⁡k(0)⋯k(i-1), which consists of 𝔨⁢i bits. Let EK(i)(i) denote the composition of the first i round functions, i.e.,

EK(0)(0)=Rk(0)(0),

EK(i)(i)=Rk(i-1)(i-1)∘⋯∘Rk(0)(0)=Rk(i-1)(i-1)∘EK(i-1)(i-1),i≥1.

A block cipher may have many rounds and a reduced round cryptanalysis may target only a few of these rounds. Suppose that an attack targets r+1 rounds. For a plaintext P, let C be the output after r+1 rounds and let B be the output after r rounds. So, B=EK(r)(r)⁢(P) and C=Rk(r)(r)⁢(B).

Relations between plaintext and the input to the last round. The basic step in block cipher cryptanalysis is to perform a detailed analysis of the structure of a block cipher. Such a study reveals one or more possible relations between the following quantities: a plaintext P, the input to the last round B and possibly K(r). Such relations can be in the form of a linear function or in the form of a differential as we explain later. Usually, such a relation holds only with some probability. The probability is taken over the uniform random choice of P. If there is more than one relation, then it is required to consider the joint distribution of the probabilities that these relations hold. Obtaining relations and their possibly joint distribution is a non-trivial task which requires a great deal of experience and ingenuity. These relations form the bedrock on which a statistical analysis of an attack can be carried out.

Target sub-key. A single relation between P and B will usually involve only a subset of the bits of B. If several (or multiple) relations between P and B are known, it is required to consider the subset of the bits of B which cover all the relations. Obtaining these bits from C will require a partial decryption of the last round. Such a partial decryption will involve a subset of the bits of secret key (or of the last round key). Obtaining the correct values of these key bits is the goal of the attack, and these bits will be called the target sub-key. The size of the target sub-key in bits will be denoted by m. So, m key bits are sufficient to partially decrypt C to obtain the bits of B which are involved in any of the relation between P and B. There are 2m possible choices of the target sub-key bits out of which one is correct and all others are incorrect. The goal is to pick out the correct key.

Setting of an attack. Suppose there are N plaintext-ciphertext pairs (Pj,Cj), j=1,…,N, which have been generated using the correct key and are available. For each choice κ of the last round key bits, it is possible to invert Cj to obtain the relevant bits of Bκ,j. The relevant bits are those which are required to evaluate the relations discovered in the prior analysis of the block cipher. Note that Bκ,j depends on κ even though Cj may not. If κ is the correct choice for the target sub-key, then Cj indeed depends on κ, otherwise Cj has no relation to κ.

Given Pj and the relevant bits of Bκ,j, it is possible to evaluate all the known relations. From the results of these evaluations, a test statistic Tκ is defined. Since there are a total of 2m possible values of κ, there are also 2m random variables Tκ. These random variables are assumed to be independent, and the distributions of these random variables depend on whether κ is correct or incorrect. It is also assumed that the distributions of Tκ for incorrect κ are identical. This assumption was considered in [17]. For an attack to be possible, it is required to obtain the two possible distributions of Tκ, one when κ is the correct choice and the other when κ is an incorrect choice.

2.2 Linear cryptanalysis

Assume that the analysis of the structure of the block cipher provides ℓ≥1 linear approximations. These are given by masks ΓP(i),ΓB(i) and ΓK(i) for i=1,…,ℓ. The subscript P denotes the plaintext mask, the subscript B denotes the mask after r rounds and the subscript K denotes the mask for K(r). So, ΓP(i) and ΓB(i) are in {0,1}n and ΓK(i) is in {0,1}𝔨⁢r. If ℓ>1, then the attack is called multiple linear cryptanalysis, and if ℓ=1, we will call the attack single linear cryptanalysis, or simply linear cryptanalysis. Define

Li=〈ΓP(i),P〉⊕〈ΓB(i),B〉 for ⁢i=1,…,ℓ.

Inner key bits. For a fixed but unknown key K(r), the quantity zi=〈ΓK(i),K(r)〉 is a single unknown bit. Denote by z=(z1,…,zℓ) the collection of the ℓ bits arising in this manner. The key masks ΓK(1),…,ΓK(ℓ) are known. So, z is determined only by the unknown key K(r). The bits represented by z are called the inner key bits. The key K(r) is unknown but fixed, and so there is no randomness in K(r). Correspondingly, z is also unknown but fixed and there is no randomness in z.

Consider a uniform random choice of P. The round functions are deterministic bijections, and so the uniform distribution on P induces a uniform distribution on B. Each Li is a random variable which can take the values 0 or 1. The randomness of Li arises solely from the randomness of P. Define the random variable X to be the following:

X=(L1,…,Lℓ).

So, X is distributed over {0,1}ℓ and its distribution is determined by the distribution of the Li’s, which in turn is determined by the distribution of P.

Consider the event

(2.1)Li=〈ΓK(i),K(r)〉=zi.

Note that we are not assuming any randomness over the key K(r) and the bits zi’s have no randomness even though they are unknown. So, the distribution of Li⊕zi is determined completely by the distribution of Li. The relation in (2.1) holds with certain probability and the event is conventionally called a linear approximation of the underlying block cipher.

Joint distribution parameterized by inner key bits. A linear approximation of the type given by (2.1) holds with some probability over the uniform random choice of P. The random variables L1,…,Lℓ are not necessarily independent. The joint distribution of these variables is given as follows: for z=(z1,…,zℓ) and η=(η1,…,ηℓ)∈{0,1}ℓ, define

pz⁢(η)=Pr⁡[L1=η1⊕z1,…,Lℓ=ηℓ⊕zℓ]=12ℓ+ϵη⁢(z),

where -1/2ℓ≤ϵη⁢(z)≤1-1/2ℓ.

The vector

p~z⁢=Δ⁢(pz⁢(0),…,pz⁢(2ℓ-1))

is a probability distribution, where the integers {0,…,2ℓ-1} are identified with the set {0,1}ℓ. For each choice of z, we obtain a different distribution. These distributions are, however, related to each other. Suppose z′=z⊕β for some β∈{0,1}ℓ. Then it is easy to verify that ϵη⁢(z′)=ϵη⊕β⁢(z). It follows that

pz⊕β⁢(η)=pz⁢(η⊕β).

Let p~ be the probability distribution

p~⁢=Δ⁢p~0ℓ,

and under the usual identification of {0,1}ℓ and the integers in {0,…,2ℓ-1} write

(2.2)p~=(p0,…,p2ℓ-1),

so that for η∈{0,1}ℓ,

pη⁢=Δ⁢p⁢(η)=12ℓ+ϵη.

Notation. There are N plaintext-ciphertext pairs (Pj,Cj) for j=1,…,N. For a choice κ of the target sub-key, the Cj’s are partially decrypted to obtain the relevant bits of Bκ,j. For κ∈{0,…,2m-1}, j=1,…,N and i=1,…,ℓ, define

Lκ,j,i=〈ΓP(i),Pj〉⊕〈ΓB(i),Bκ,j〉,Xκ,j=(Lκ,j,1,…,Lκ,j,ℓ).

2.3 LLR statistics

Let p~=(p0,…,pν-1) and q~=(q0,…,qν-1) be two probability distributions over a finite alphabet of size ν>0. The Kullback–Leibler divergence between p~ and q~ is defined as follows:

D⁢(||⁡p~q~)=∑η=0ν-1pη⁢ln⁡(pηqη).

The problem of distinguishing between the two distributions is the following: Let X1,…,XN be a sequence of independent and identically distributed random variables taking values in the set {0,…,ν-1}. It is known that all the Xi’s follow one of the distributions p~ or q~, but which one is not known.

The goal is to formulate a test of hypothesis to distinguish between these two distributions. This test takes the form where the null hypothesis

H0:“the distribution is p~”

is tested against the alternate hypothesis

H1:“the distribution is q~”.

Note that p~ is a probability distribution on {0,…,ν-1} and the probability at a point η∈{0,…,ν-1} is written as pη. For 1≤j≤N, the random variable Xj takes values in the set {0,…,ν-1}. So, the derived random variable pXj is well defined. One may set Wj=pXj. The possible values of Wj are p0,p1,…,pν-1. If Xj follows p~, then for η∈{0,…,ν-1},

Pr⁡[Wj=pη]=Pr⁡[Xj=η]=pη,

if Xj follows another distribution q~, then Pr⁡[Wj=pη]=Pr⁡[Xj=η]=qη.

For j=1,…,N, define

Yj=ln⁡(pXjqXj).

Let μ0 and σ02 be the mean and variance of Yj under hypothesis H0. Similarly, let μ1 and σ12 be the mean and variance of Yj under hypothesis H1. Then the expressions for μ0, μ1, σ02 and σ12 can be computed to be the following:

(2.3){μ0=D⁢(||⁡p~q~),μ1=-D⁢(||⁡q~p~),σ02=∑η=0ν-1p⁢(η)⁢(ln⁡(p⁢(η)q⁢(η)))2-μ02,σ12=∑η=0ν-1q⁢(η)⁢(ln⁡(q⁢(η)p⁢(η)))2-μ12.

The LLR random variable is defined to be the following:

LLR=∑j=1NYj=∑j=1Nln⁡(pXjqXj)=∑η=0ν-1Qη⁢ln⁡(pηqη),

where Qη=#⁢{j:Xj=η}. By following the method described in [2], it is possible to define a test of hypothesis to distinguish between the two distributions p~ and q~ by using approximately

(2.4)N=((σ0+σ1)⁢Φ-1⁢(1-Pe)D⁢(||⁡p~q~)+D⁢(||⁡q~p~))2

plaintext-ciphertext pairs, where Pe is half the sum of the probabilities of type-I and type-II errors and Φ is the standard normal distribution function. More details are given in Appendix B.

3 Single linear approximation

In this section, we consider the case of a single linear approximation. Let P1,…, PN be N independent and uniformly distributed plaintexts. For simplicity, in this section, we will write L instead of L1 and Lκ,j instead of Lκ,j,1. Since there is a single linear approximation, the joint distribution p~ reduces to simply a probability value p=Pr⁡[Lκ,j=0]≠12 when κ is the correct choice. For an incorrect choice of κ, it is conventional to assume that Pr⁡[Lκ,j=0]=12. For the correct choice of κ, the random variable Lκ,j follows Bernoulli⁢(p) for all j, where p=12+ϵ=12±|ϵ|. The appropriate sign is determined by the correct value of the inner key bit z*, and we can write p=12+(-1)z*⁢|ϵ|. Under the wrong key hypothesis, for an incorrect choice of κ, the random variable Lκ,j follows Bernoulli⁢(12) for all j.

Let c=2⁢(p-12)=2⁢(-1)z*⁢|ϵ| and define μ0=p=(1+c)/2 and μ1=12. The hypothesis testing framework will be used. The test statistics is Tκ=|Xκ-N⁢μ1|, where Xκ=∑j=1NLκ,j. Consider the following test of hypothesis.

Hypothesis Test 1 (Single linear cryptanalysis).

H0: “κ is correct” versus H1: “κ is incorrect.” Decision rule: reject H0 if Tκ≤t.

Proposition 3.1.

Let 0<α,β<1. In Hypothesis Test 1, the value of t can be chosen such that for

(3.1)N≥2⁢(ln⁡(2/β)+3⁢(1+|c|)⁢ln⁡(1/α))2c2

the probabilities of the type-I and type-II errors are upper bounded by α and β, respectively.

Proof.

The requirement is to show the bound on N given the values of α and β. As is usual in the hypothesis testing framework, we will obtain two equations, one relating α, t and N and another relating β, t and N. Eliminating the variable t between these two equations will provide the expression for N in terms of α and β.

Note that under H0 we have E⁢[Xκ]=N⁢μ0, and under H1 we have E⁢[Xκ]=N⁢μ1. Define

δ0=|μ0-μ1|-t/Nμ0.

The decision threshold t will be chosen to satisfy 0<tN<|μ0-μ1|. Now, for t in this range, we have 0<δ0<|μ0-μ1|/μ0<1. So, it is possible to apply the Chernoff bound (specifically (A.2) and (A.3) of Theorem A.1) with this δ0.

First suppose μ0>μ1. Then δ0=(μ0-μ1-tN)/μ0, and so (1-δ0)⁢μ0=μ1+tN.

Pr⁡[Type-I Error]=Pr⁡[Tκ≤t∣H0⁢ holds]

=Pr⁡[-t≤Xκ-N⁢μ1≤t∣H0⁢ holds]

≤Pr⁡[Xκ-N⁢μ1≤t∣H0⁢ holds]

=Pr⁡[Xκ≤t+N⁢μ1∣H0⁢ holds]

=Pr⁡[Xκ≤(1-δ0)⁢N⁢μ0∣H0⁢ holds]

≤exp⁡(-N⁢μ0⁢δ022)

≤exp⁡(-N⁢μ0⁢δ023).

Recall that Xκ is the sum Lκ,1+⋯+Lκ,N, and under H0 each Lκ,j follows Bernoulli⁢(p). So, the last step of the above calculation follows from the Chernoff bound (equation (A.3)).

Suppose that μ1>μ0 (note that since p≠12, the case μ0=μ1 does not occur). Then δ0=(μ1-μ0-tN)/μ0 and so (1+δ0)⁢μ0=μ1-tN. In this case,

Pr⁡[Type-I Error]=Pr⁡[Tκ≤t∣H0⁢ holds]

=Pr⁡[-t≤Xκ-N⁢μ1≤t∣H0⁢ holds]

≤Pr⁡[Xκ≥(1+δ0)⁢N⁢μ0∣H0⁢ holds]

≤exp⁡(-N⁢μ0⁢δ023)

The last step follows from the Chernoff bound (equation (A.2)). The actual bound used in this case is different from that used for the case of μ0>μ1.

A relation involving α and N is obtained by enforcing

α=exp⁡(-N⁢μ0⁢δ023).

This shows that Pr⁡[Type-I Error]≤α irrespective of the values of μ0 and μ1. From the expressions for α and δ0 and using the fact that 0<tN<|μ0-μ1|, we obtain

(3.2)t=N×|μ0-μ1|-3⁢N⁢μ0⁢ln⁡(1α).

The probability of type-II error is given by

Pr⁡[Type-II Error]=Pr⁡[Tκ>t∣H1⁢ holds]

=Pr⁡[|Xκ-N⁢μ1|>t∣H1⁢ holds]

=Pr⁡[Xκ>t+N⁢μ1∣H1⁢ holds]+Pr⁡[Xκ⁢<-t+N⁢μ1∣⁢H1⁢ holds].

Let

δ1=tN⁢μ1

so that tN+μ1=(1+δ1)⁢μ1 and -tN+μ1=(1-δ1)⁢μ1. The analysis of the type-I error shows 0<tN<|μ0-μ1|, from which it follows that 0<δ1<1. Using (A.4) and (A.5) of Theorem A.1, we obtain

Pr⁡[Type-II Error]≤2⁢exp⁡(-N⁢μ1⁢δ12).

A relation involving β and N is obtained by enforcing

β=2⁢exp⁡(-N⁢μ1⁢δ12)=2⁢exp⁡(-t2N⁢μ1).

This shows that Pr⁡[Type-II Error]≤β. Solving for t in terms of β and using 0<tN<|μ0-μ1| yield

(3.3)t=N⁢μ1⁢ln⁡(2β).

Eliminating t from (3.2) and (3.3), we obtain

(3.4)N=2⁢(ln⁡(2/β)+3⁢(1+c)⁢ln⁡(1/α))2c2.

The two expressions for t given by (3.2) and (3.3) combined with the condition 0<tN<|μ0-μ1| give rise to two lower bounds on N. It is easy to check that the expression for N given by (3.4) satisfies both these lower bounds.

Recall that c=2⁢(-1)z*⁢|ϵ|. So, depending on the value of z*, (3.4) provides two expressions for N, with the expression for z*=1 being (slightly) greater than the expression for z*=0. Taking z*=1 provides the expression on the right-hand side of (3.1). So, for any N greater than this value, the probabilities of type-I and type-II errors are upper bounded by α and β respectively. ∎

4 Multiple linear cryptanalysis

We assume the setting and notation explained in Sections 2.1 and 2.2. There are ℓ≥1 linear approximations, κ denotes the choice of the target sub-key and z denotes the choice of the inner key bits. There are N plaintext-ciphertext pairs (P1,C1),…,(PN,CN). For a choice κ of the target sub-key, a choice z=(z1,…,zℓ) of the inner key bit, j∈{1,…,N} and 1≤i≤ℓ, define

Lκ,j,i=〈ΓP(i),Pj〉⊕〈ΓP(i),Bκ,j〉,Xκ,j=(Lκ,j,1,Lκ,j,2,…,Lκ,j,ℓ),

Yκ,z,j=ln⁡(pz⁢(Xκ,j)/2-ℓ)=ln⁡(2ℓ⁢pz⁢(Xκ,j)).

Suppose z is the correct choice of the inner key bits. For a particular choice of κ, the random variables Xκ,z,1,…,Xκ,z,N are independent, and these variables follow either the distribution p~z or the distribution q~=(2-ℓ,…,2-ℓ) according to whether κ is the correct choice or κ is an incorrect choice.

The test statistic is defined to be

LLRκ,z=Yκ,z,1+⋯+Yκ,z,N=∑η∈{0,1}ℓQκ,η⁢ln⁡(2ℓ⁢pz⁢(η)),

where Qκ,η=#⁢{j:Xκ,j=η}. Consider the following test of hypothesis.

Hypothesis Test 2 (Multiple linear cryptanalysis).

H0: “κ is correct” versus H1: “κ is incorrect.” Decision rule: Case μ0>μ1: reject H0 if LLRκ,z≤t for all z∈{0,1}ℓ, where t∈(N⁢μ1,N⁢μ0). Case μ0<μ1: reject H0 if LLRκ,z≥t for all z∈{0,1}ℓ, where t∈(N⁢μ0,N⁢μ1).

Proposition 4.1.

Let 0<α,β<1. In Hypothesis Test 2, it is possible to choose t such that for

(4.1)N≥υ2⁢{ln⁡(2ℓ/β)+ln⁡(1/α)}22⁢(D⁢(||⁡p~q~)+D⁢(||⁡q~p~))2

the probabilities of the type-I and type-II errors are upper bounded by α and β respectively. Here

υ=maxη∈{0,1}ℓ⁡ln⁡(2ℓ⁢pη)-minη∈{0,1}ℓ⁡ln⁡(2ℓ⁢pη)=ln⁡(maxη⁡pηminη⁡pη).

Proof.

Under H0 each Yκ,z,j has mean μ0=D⁢(||⁡p~zq~), while under H1 each Yκ,z,j has mean μ1=-D⁢(||⁡q~p~z). It is not difficult to prove that μ0 and μ1 have the same value for all z (see [32] for a proof), and so we simply write μ0=D⁢(||⁡p~q~) and μ1=-D⁢(||⁡q~p~), where p~=(p0,…,p2ℓ-1) as defined in (2.2).

We now proceed to analyze the probabilities of type-I and type-II errors and derive expressions for the data complexity. While doing this, we avoid using normal approximations. We use Hoeffding’s inequalities (see Appendix A.2) to bound the probabilities of the two types of errors.

Recall that for a fixed value of κ and z, the random variables LLRκ,z,j (j=1,…,N) are independently and identically distributed with each random variables taking values in the set

{ln⁡(2ℓ⁢p0),…,ln⁡(2ℓ⁢p2ℓ-1)}.

This implies that for a fixed value of κ and z,

υmin=minη∈{0,1}ℓ⁡ln⁡(2ℓ⁢pη)≤LLRκ,z,j≤maxη∈{0,1}ℓ⁡ln⁡(2ℓ⁢pη)=υmax

for all j=1,…,N. Let υ=υmax-υmin. Therefore, one can use Hoeffding bounds (see Appendix A.2) on the sum of independent and identically distributed random variables

LLRκ,z=∑j=1NLLRκ,z,j,

where

DN=∑j=1N(υmax-υmin)2=N⁢υ2.

We now turn to bounding the error probabilities and obtaining expression for the data complexity. This is done separately for the two cases depending on the relative values of μ0 and μ1. Let z* be the correct choice of the inner key bits.

Case μ0>μ1. In this case, for t∈(N⁢μ1,N⁢μ0) to be determined later, the null hypothesis is rejected if LLRκ,z≤t for all z∈{0,1}ℓ. Then

Pr⁡[Type-I Error]=Pr⁡[LLRκ,z≤t⁢ for all ⁢z∣H0⁢ holds]

≤Pr⁡[LLRκ,z*≤t∣H0⁢ holds]

=Pr⁡[LLRκ,z*⁢N⁢μ0≤-(N⁢μ0-t)∣H0⁢ holds]

≤exp⁡(-2⁢(N⁢μ0-t)2N⁢υ2).

The last inequality follows from Hoeffding’s inequality (see (A.7)). Similarly, the probability of type-II error is computed as follows:

Pr⁡[Type-II Error]=Pr⁡[LLRκ,z>t⁢ for some ⁢z∣H1⁢ holds]

≤∑z∈{0,1}ℓPr⁡[LLRκ,z>t∣H1⁢ holds]

=2ℓ⋅Pr⁡[LLRκ,z>t∣H1⁢ holds]

=2ℓ⋅Pr⁡[LLRκ,z-N⁢μ1>t-N⁢μ1∣H1⁢ holds]

≤2ℓ⁢exp⁡(-2⁢(t-N⁢μ1)2N⁢υ2).

The last inequality follows from Hoeffding’s inequality (see (A.6)). Define

α=exp⁡(-2⁢(N⁢μ0-t)2N⁢υ2),β=2ℓ⁢exp⁡(-2⁢(t-N⁢μ1)22⁢N⁢υ2).

Then Pr⁡[Type-I Error]≤α and Pr⁡[Type-II Error]≤β. The expression for α gives two values for t. By using the upper bound on t, i.e., t<N⁢μ0, the expression for t has to be

(4.2)2⁢t=2⁢N⁢μ0-υ⁢N⁢ln⁡(1α).

The lower bound on t, i.e., N⁢μ1<t, provides the following lower bound on N:

(4.3)N>υ2⁢ln⁡(1/α)2⁢(μ0-μ1)2.

Similarly, the expression for β leads to two values for t, and again using the range for t, we obtain

(4.4)2⁢t=2⁢N⁢μ1+υ⁢N⁢ln⁡(2ℓβ)

and

(4.5)N>υ2⁢ln⁡(2ℓ/β)2⁢(μ0-μ1)2.

From (4.2) and (4.4) we obtain the expression on the right-hand side of (4.1). The expression for N given by (4.1) satisfies the bounds in (4.3) and (4.5).

Case μ0<μ1. In this case, for t∈(N⁢μ0,N⁢μ1) to be determined later, the null hypothesis is rejected if LLRκ,z>t for all z∈{0,1}ℓ. Then

Pr⁡[Type-I Error]=Pr⁡[LLRκ,z≥t⁢ for all ⁢z∣H0⁢ holds]

≤Pr⁡[LLRκ,z*≥t∣H0⁢ holds]

=Pr⁡[LLRκ,z*-N⁢μ0≥t-N⁢μ0∣H0⁢ holds]

≤exp⁡(-2⁢(t-N⁢μ0)2N⁢υ2).

The last inequality follows from Hoeffding’s inequality (see (A.6)). Similarly, the probability of type-II error is computed as follows:

Pr⁡[Type-II Error]=Pr⁡[LLRκ,z⁢<t⁢ for some ⁢z∣⁢H1⁢ holds]

≤∑z∈{0,1}ℓPr⁡[LLRκ,z⁢<t∣⁢H1⁢ holds]

=2ℓ⋅Pr⁡[LLRκ,z-N⁢μ1⁢<-(N⁢μ1-t)∣⁢H1⁢ holds]

≤2ℓ⁢exp⁡(-2⁢(t-N⁢μ1)2N⁢υ2).

The last inequality follows from Hoeffding’s inequality (see (A.7)). Further analysis of this case in the manner similar to that done for μ1<μ0 shows that the expression for N in this case is also given by (4.1). ∎

Algorithmically, the test is performed in the following manner: Consider μ0>μ1, the case for μ0<μ1 being similar. Initialize a set ℒ to be the empty set. For each κ and z, if LLRκ,z>t, then ℒ←ℒ∪{κ}. At the end, ℒ contains the list of candidate keys.

We consider the time required for computing LLRκ,z for all values of κ and z. For a fixed κ, the values of Qκ,η for all η∈{0,1}ℓ can be computed in O⁢(ℓ⁢N) time. Given these Qκ,η’s, for any z, the value of LLRκ,z can be computed in O⁢(2ℓ) additional time, for a fixed κ, given the values of Qκ,η’s, the values of LLRκ,z for all z∈{0,1}ℓ can be computed in O⁢(22⁢ℓ) additional time. Thus, the values of LLRκ,z for all κ∈{0,1}m and for all z∈{0,1}ℓ can be computed in O⁢(2m⁢(ℓ⁢N+22⁢ℓ)) time.

5 Single differential cryptanalysis

Let the n-bit strings δ0,δ1,…,δr with δ0≠0 be the input differences to the rounds of an r+1-round block cipher. Let P be a plaintext and set P′=P⊕δ0. Let B(0)=P,B(1),…,B(r) denote the inputs to round number 0,…,r, respectively, i.e.,

B(i+1)=Rk(i)(i)⁢(B(i))

corresponding to the plaintext P. Further, let B(0)⁣′=P′,B(1)⁣′,…,B(r)⁣′ be the inputs to round numbers 0,…,r, respectively, corresponding to the plaintext P′. Then

A=⋀i=0r(B(i)⊕B(i)⁣′=δi)

denotes the event that the differential characteristic δ0→δ1→⋯→δr occurs. Suppose that Pr⁡[A]=p for the correct key K. Notice that as in the case of linear cryptanalysis the randomness also comes from the uniform random choice of P.

As in Section 2.2, we assume that guessing m bits of the key allows the partial decryption of C to obtain B(r). These m bits will constitute the target sub-key, and the goal will be to obtain the correct value of the sub-key. Further, as done previously, we will denote a choice of the target sub-key by κ.

Let D denote the event B(r)⊕B(r)⁣′=δr. Further, let Pr⁡[D∣A¯]=p′ and p0=p+(1-p)⁢p′. Then for the correct choice κ of the target sub-key Pr⁡[D]=p0, since δ0 is not the zero string, P≠P′. This further implies that B(i)≠B(i)⁣′ for i=1,…,r since each round function is a bijection. For incorrect choices of κ, it is assumed that B(r) and B(r)⁣′ correspond to uniform sampling without replacement of two n-bit strings from {0,1}n. Hence, Pr⁡[D]=1/(2n-1) for an incorrect choice of κ. Let pw=1/(2n-1). In general, p0>pw and we will be proceeding with this assumption. The analysis for the case p0<pw is similar.

Consider N plaintext pairs (P1,P1′),…,(PN,PN′) with Pj⊕Pj′=δ0 and their corresponding ciphertexts (C1,C1′),…,(CN,CN′). For a choice κ of the target sub-key, the attacker obtains

(Bκ,1(r),Bκ,1(r)⁣′),…,(Bκ,N(r),Bκ,N(r)⁣′)

by partially decrypting (C1,C1′),…,(CN,CN′), respectively. So, for j=1,…,N, it is possible to determine whether the condition

Bκ,j(r)⊕Bκ,j(r)⁣′=δr

holds.

For a choice κ of the target sub-key, define the binary-valued random variables Wκ,1,…,Wκ,N as follows: Wκ,j=1 if Bκ,j(r)⊕Bκ,j(r)=δr, and Wκ,j=0 otherwise. If κ is the correct choice, then Pr⁡[Wκ,j=1]=p0, and if κ is an incorrect choice, then Pr⁡[Wκ,j=1]=pw for all j.

The test statistics is Tκ=|Xκ-μ1|. Consider the following test of hypothesis.

Hypothesis Test 3 (Single differential cryptanalysis).

H0: “κ is correct” versus H1: “κ is incorrect.” Decision rule: reject H0 if Tκ≤t.

Proposition 5.1.

Let 0<α,β<1. In Hypothesis Test 3 it is possible to choose t such that for

(5.1)N≥3⁢(p0⁢ln⁡(1/α)+pw⁢ln⁡(2/β))2(p0-pw)2

the probabilities of the type-I and type-II errors are upper bounded by α and β, respectively.

Proof.

Let μ0=p0 and μ1=pw, where Xκ=Wκ,1+⋯+Wκ,N. Under H0 we have E⁢[Xκ]=N⁢μ0, and under H1 we have E⁢[Xκ]=N⁢μ1.

This setting is almost the same as that for single linear cryptanalysis, the only differences being the facts that μ1=pw is not in general 12 and that the inner key bit z is absent. As a result of μ1 not being equal to 12, for analyzing the type-II error probability we have to apply slightly different forms of the Chernoff bounds.

The expressions for δ0, δ1, α and the expression for t in terms of α are obtained as in the case of single linear cryptanalysis to be the following:

δ0=|μ0-μ1|-t/Nμ0,

δ1=tN⁢μ1,

α=exp⁡(-N⁢μ0⁢δ023),

t=N×|μ0-μ1|-3⁢N⁢μ0⁢ln⁡(1α).

Due to the use of the bounds (A.2) and (A.3), the expression for β changes as does the expression for t in terms of β:

β=2⁢exp⁡(-N⁢μ1⁢δ123),t=3⁢N⁢μ1⁢ln⁡(2β).

Equating the two expressions for t provides the expression on the right-hand side of (5.1).

To apply the Chernoff bound (see Theorem A.1), it is required that 0<δ0,δ1<1. As in Section 3, having 0<tN<|μ0-μ1| ensures that the conditions on δ0 and δ1 hold. The bound on t leads to two lower bounds on N, and the expression for N given by (5.1) satisfies these two lower bounds. ∎

6 Multiple differential cryptanalysis

Here we consider a version of the multiple differential cryptanalysis, where the attacker uses ν r-round differentials all having the same input difference. Suppose that the ν r-round differentials for a block cipher are given by n-bit strings δ0 and δr(1),…,δr(ν), where δ0 denotes the input difference and δr(i) denotes the i-th output difference. Each of the δr(i)’s must be non-zero n-bit strings, and so ν≤2n-1. As in the case of linear cryptanalysis, consider an m-bit target sub-key for some m≤n. Guessing the value of this sub-key allows the inversion of the (r+1)-th round. For a uniform random plaintext P and a choice κ of the target sub-key, define a random variable Xκ as follows:

(6.1)Xκ={iif Rκ(r)(EK(r)(P))-1⊕Rκ(r)(EK(r)(P⊕δ0))-1=δr-1(i),0otherwise.

For 1≤i≤ν, let pi and θ be such that

Pr⁡[Xκ=i]={pi if κ is the correct choice,θ if κ is an incorrect choice.

Under the wrong key assumption, θ=1/(2n-1). Further, define

p0=1-(p1+⋯+pν),θ0=1-ν⁢θ.

Then both p~=(p0,p1,…,pν) and θ~=(θ0,θ,…,θ) are proper probability distributions. For the correct choice of κ, p0 is the probability that none of the ν differentials hold. Similarly, for an incorrect choice of κ, θ0 is the probability that none of the ν differentials hold. The random variable Xκ follows p~ if κ is the correct choice, and Xκ follows θ~ if κ is an incorrect choice.

Define another random variable

Yκ=ln⁡(pXκθXκ).

Let μ0=E⁢[Yκ] if Xκ follows p~ (i.e., κ is the correct choice), and let μ1=E⁢[Yκ] if Xκ follows θ~ (i.e., κ is an incorrect choice). Then μ0=D⁢(||⁡p~θ~) and μ1=D⁢(||⁡p~θ~).

Consider the N plaintext-ciphertext pairs (P1,C1),…,(PN,CN). For a choice κ of the target sub-key and j=1,…,N, let Xκ,j be the random variable given by (6.1) corresponding to (Pj,Cj) and let

Yκ=ln⁡(pXκ,jθXκ,j).

The test statistics is defined to be the following:

LLRκ=∑j=1NYκ,j=∑η∈{0,…,ν}Qκ,η⁢ln⁡(pηθη),

where Qκ,η=#⁢{j:Yκ,j=η}. Consider the following test of hypothesis.

Hypothesis Test 4 (Multiple differential cryptanalysis).

H0: “κ is correct” versus H1: “κ is incorrect.” Decision rule: Case μ0>μ1: reject H0 if LLR≤t, where t∈(N⁢μ1,N⁢μ0). Case μ0<μ1: reject H0 if LLR≥t, where t∈(N⁢μ0,N⁢μ1).

Proposition 6.1.

Let 0<α,β<1 and N be such that

(6.2)N≥υ2⁢{ln⁡(1/β)+ln⁡(1/α)}22⁢(D⁢(||⁡p~θ~)+D⁢(||⁡θ~p~))2.

Then the probabilities of the type-I and type-II errors in Hypothesis Test 4 are upper bounded by α and β, respectively. Here,

υ=maxη∈{0,…,ν}⁡ln⁡(pηθη)-minη∈{0,…,ν}⁡ln⁡(pηθη).

Proof.

Under H0 we have E⁢[LLR]=N⁢μ0, while under H1 we have E⁢[LLR]=N⁢μ1.

Here Yκ,1,…,Yκ,N are independently and identically distributed random variables taking values in the set {ln⁡(p0/θ0),…,ln⁡(pν/θν)}. Then, for a fixed κ,

υmin=minη∈{0,1,…,ν}⁡ln⁡(pηθη)≤Yκ,j≤maxη∈{0,1,…,ν}⁡ln⁡(pηθη)=υmax

for all j=1,…,N. Let υ=υmax-υmin. Therefore, Hoeffding bounds can be applied on the sum of independently and identically distributed random variables LLRκ=∑j=1NYκ,j, where DN=N⁢υ2.

The error analysis is carried out separately in the two cases μ0>μ1 and μ0<μ1.

Case μ0>μ1. In this case, N⁢μ1<t<N⁢μ0. The probabilities of type-I and type-II errors are computed as follows:

Pr⁡[Type-I Error]=Pr⁡[LLRκ≤t∣H0⁢ holds]

=Pr⁡[LLRκ-N⁢μ0≤-(N⁢μ0-t)∣H0⁢ holds]

≤exp⁡(-2⁢(N⁢μ0-t)2N⁢υ2),

Pr⁡[Type-II Error]=Pr⁡[LLRκ>t∣H1⁢ holds]

=Pr⁡[LLRκ-N⁢μ1>t-N⁢μ1∣H1⁢ holds]

≤exp(-2⁢(t-N⁢μ1)2N⁢υ2).

Here the inequalities given by (A.7) and (A.6) have been used. Define

α=exp⁡(-2⁢(N⁢μ0-t)2N⁢υ2),β=exp⁡(-2⁢(t-N⁢μ1)2N⁢υ2).

The equation for α gives two values of t. The range for t eliminates one of the values. Similarly, the equation for β gives two values of t, where one of the values is eliminated by using the range for t. The two allowed values of t are the following:

(6.3)2⁢t=2⁢N⁢μ0-υ⁢N⁢ln⁡(1α),

(6.4)2⁢t=2⁢N⁢μ1+υ⁢N⁢ln⁡(1β).

Eliminating t from equations (6.3) and (6.4), we obtain the expression given by the right-hand side of (6.2). The expression for t given by (6.3) has to satisfy N⁢μ1<t, and the expression for t given by (6.4) has to satisfy t<N⁢μ0. These give rise to two lower bounds on t both of which are satisfied by the expression for N given by (6.2).

Case μ0<μ1. The analysis of this case is similar and leads to an expression for N which is the same as that given by (6.2). ∎

7 Relating advantage to type-II error probability

The size of the target sub-key is m bits, and there is one correct choice and the rest are incorrect choices. The hypothesis test is carried out independently for each choice κ of the target sub-key. Every time a type-II error occurs, an incorrect choice gets labelled as a candidate key.

In the previous analyses, we have assumed β to be an upper bound on the probability of type-II error. For the present, let us assume that β is indeed the actual probability of type-II error. In the next section, we will consider the situation when β is an upper bound.

Since the probability of type-II error is β, the expected number of incorrect keys which get labelled as a candidate key is β⁢(2m-1). An attack is said to have an a-bit advantage if the size of the list of candidate keys produced by the attack is 2m-a. Equating (2m-1)⁢β=2m-a, we have that for an attack with a-bit expected advantage

(7.1)β=(2m2m-1)⁢2-a.

The right-hand side can be approximated by 2-a for moderate values of m. It is possible to use (7.1) to substitute 2m/(2m-1)×2-a for β in all the expressions for data complexities that have been obtained previously. This allows the data complexities to be expressed in terms of the expected advantage a.

While relating the expected advantage to β is sufficient for most purposes, it is possible to say more. One can upper bound the probability that the size of the list of false alarms exceeds a certain threshold. This is done as follows: For each incorrect choice κ of the target sub-key, define Wκ to be a random variable which takes the value 1 if a type-II error occurs for this choice of κ, and it takes the value 0 otherwise. Then the random variables Wκ are independent Bernoulli distributed random variables having probability of success β. Let

W=∑κ⁢ incorrect Wκ

and let μ=E⁢[W]=β⁢(2m-1). Using the Chernoff bound (A.1), we have that for any δ>0,

Pr⁡[W>(1+δ)⁢μ]<(eδ(1+δ)(1+δ))μ.

Define s such that s=(1+δ)⁢μ, which combined with μ=β⁢(2m-1) gives

(7.2)β=s(1+δ)⁢(2m-1).

Using s=(1+δ)⁢μ, we have

Pr⁡[W>s]<(es-μμ(sμ)(sμ))μ=es-μ⁢μsss=Pβ (say).

It is now possible to say that the probability that the list of false alarms exceeds s is at most Pβ. Since μ is fixed, fixing Pβ fixes s, and then the relation s=(1+δ)⁢μ also fixes δ. By using (7.2), β can be expressed in terms of s and δ. Substituting this expression for β in the data complexities obtained earlier provides expressions for data complexities in terms s and Pβ (and the type-I error probability).

8 Distinguishers

In this section, we consider the problem of distinguishing between the probability distributions p~ and q~ over the set {0,…,ν-1}. Let, as in Section 2.3, X1,…,XN be independent and identically distributed random variables following either p~ or q~, but which one is not known. As before, let Yj=ln⁡(pXj/qXj) for j=1,…,N and LLR=Y1+⋯+YN.

Consider the log-likelihood ratio (LLR) based test statistics to design a test of hypothesis to distinguish between p~ and q~.

Hypothesis Test 5 (Distinguisher).

H0: “the distribution is p~” versus H1: “the distribution is q~.” Decision rule: Case μ0>μ1: reject H0 if LLR≤t, where t∈(μ1,μ0). Case μ0<μ1: reject H0 if LLR≥t, where t∈(μ0,μ1).

Proposition 8.1.

Let 0<Pe<1. In Hypothesis Test 5, it is possible to choose t such that for

(8.1)N≥υ2⁢ln⁡(1/Pe)2⁢(D⁢(||⁡p~q~)+D⁢(||⁡q~p~))2

the type-I and type-II error probabilities satisfy

Pr⁡[Type-I error]+Pr⁡[Type-II error]≤2⁢Pe.

Here,

υ=maxη∈{0,…,ν-1}⁡ln⁡(pηqη)-minη∈{0,…,ν-1}⁡ln⁡(pηqη).

Proof.

Under H0 we know that Yj has mean μ0 and variance σ02, while under H1 it has mean μ1 and variance σ12. The expressions for μ0, μ1, σ02 and σ12 are given by (2.3). In the present case, we will not have any use for the variances. Under H0 we have E⁢[LLR]=N⁢μ0, while under H1 we have E⁢[LLR]=N⁢μ1. Also note that for each of the independently and identically distributed random variables Y1,…,YN,

υmin=minη∈{0,1,…,ν-1}⁡ln⁡(pηqη)≤Yj≤maxη∈{0,1,…,ν-1}⁡ln⁡(pηqη)=υmax.

Let υ=υmax-υmin. Therefore, Hoeffding bounds can be applied on the sum of independently and identically distributed random variables LLR=∑j=1NYj, where DN=N⁢υ2.

We now consider the probabilities of type-I and type-II errors. Since the form of the test is determined by the relative values of μ0 and μ1, the analysis is also done separately.

Case μ0>μ1. We have

Pr⁡[Type-I Error]=Pr⁡[LLR≤t∣H0⁢ holds]

=Pr⁡[LLR-N⁢μ0≤-(N⁢μ0-t)∣H0⁢ holds]

≤exp⁡(-2⁢(N⁢μ0-t)2N⁢υ2).

The last inequality follows from Hoeffding’s inequality (see (A.7)). Similarly, the probability of type-II error is computed as follows:

Pr⁡[Type-II Error]=Pr⁡[LLR>t∣H1⁢ holds]

=Pr⁡[LLR-N⁢μ1>t-N⁢μ1∣H1⁢ holds]

≤exp⁡(-2⁢(t-N⁢μ1)2N⁢υ2).

The last inequality follows from Hoeffding’s inequality (see (A.6)).

Case μ0<μ1. We have

Pr⁡[Type-I Error]=Pr⁡[LLR≥t∣H0⁢ holds]

=Pr⁡[LLR-N⁢μ0≥t-N⁢μ0∣H0⁢ holds]

≤exp⁡(-2⁢(t-N⁢μ0)2N⁢υ2).

The last inequality follows from Hoeffding’s inequality (see (A.6)). Similarly, the probability of type-II error is computed as follows:

Pr⁡[Type-II Error]=Pr⁡[LLR⁢<t∣⁢H1⁢ holds]

=Pr⁡[LLR-N⁢μ1⁢<-(t-N⁢μ1)∣⁢H1⁢ holds]

≤exp⁡(-2⁢(N⁢μ1-t)2N⁢υ2).

The last inequality follows from Hoeffding’s inequality (see (A.7)). Let

α=exp⁡(-2⁢(N⁢μ0-t)2N⁢υ2),β=exp⁡(-2⁢(t-N⁢μ1)2N⁢υ2).

These expressions are upper bounds on the probabilities of type-I and type-II errors, respectively, irrespective of whether μ0>μ1 or μ0<μ1.

The quantities α and β are determined by N. To obtain a relation between Pe and N, we set

Pe=α+β2.

Then it follows that (Pr⁡[Type-I Error]+Pr⁡[Type-II Error])≤2⁢Pe. Setting t=N⁢(μ0+μ1)/2 ensures α=β, and then we obtain the following:

(8.2)Pe=exp⁡(-2⁢N⁢(μ0-μ1)2υ2)=exp⁡(-2⁢N⁢(D⁢(||⁡p~q~)+D⁢(||⁡q~p~))2υ2).

From the expression for Pe given by (8.2) the expression for N is given by the right-hand side of (8.1). From this the statement of the result follows. ∎

9 Upper bounds

In the previous sections, we have obtained expressions for data complexities. These expressions are in terms of upper bounds on the probabilities of type-I and type-II errors.

Let α⋆ and β⋆ be the actual probabilities of type-I and type-II errors, respectively, and further let α and β be upper bounds on α⋆ and β⋆, respectively. The success probability is PS⋆, which by definition is 1-α*. Letting PS=1-α, we have PS⋆≥PS. Setting PS to a pre-specified value ensures that the actual probability of success PS⋆ is at least this value.

Following the discussion in Section 7, the probability of type-II error can be related to the expected advantage of an attack. Let a⋆ be such that 2-a⋆×2m/(2m-1)=β⋆. Also, define a=-lg⁡β, so that β=2-a. Then

2-a=β≥β⋆=2-a⋆×2m2m-1≥2-a⋆,

which shows that a⋆≥a. So, fixing a to a pre-specified value ensures that the actual advantage is at least this value.

By using Ps=1-α and β=2-a, all the expressions for the data complexities obtained earlier can be written in terms of PS and a.

The main question about data complexity that a cryptanalyst is interested in is the following: for a pre-specified value of PS and a, what is the minimum number of plaintext-ciphertext pairs which ensures that PS⋆≥PS and a⋆≥a? By following the discussion in Section 1.2, Nmin⁢(PS,a) denotes this minimum required data complexity.

The data complexity expressions that we have obtained for the key recovery attacks earlier provide expressions for N in terms of PS and a, which can be written as N⁢(PS,a). In other words, this means N⁢(PS,a) plaintext-ciphertext pairs are sufficient to obtain PS⋆≥PS and a⋆≥a. Again from the discussion in Section 1.2 we have Nmin⁢(PS,a)≤N⁢(PS,a) for all the cases of key recovery attacks. Similarly, for the case of distinguishing attacks Nmin⁢(Pe)≤N⁢(Pe). We record these in the following theorem.

Theorem 9.1.

For key recovery attacks, using a single linear approximation based on Hypothesis Test 1,
Nmin⁢(PS,a)≤2⁢{(a+1)⁢ln⁡2+3⁢(1+|c|)⁢ln⁡(1/(1-PS))}2c2.
For key recovery attacks, using multiple linear approximations based on Hypothesis Test 2,
Nmin⁢(PS,a)≤υ2⁢{(a+ℓ)⁢ln⁡2+ln⁡(1/(1-PS))}22⁢(D⁢(||⁡p~q~)+D⁢(||⁡q~p~))2.
For key recovery attacks, using a single differential based on Hypothesis Test 3,
Nmin⁢(PS,a)≤3⁢{pw⁢(a+1)⁢ln⁡2+p0⁢ln⁡(1/(1-PS))}2(p0-pw)2.
For key recovery attacks, using multiple differentials based on Hypothesis Test 4,
Nmin⁢(PS,a)≤υ2⁢{a⁢ln⁡2+ln⁡(1/(1-PS))}22⁢(D⁢(||⁡p~θ~)+D⁢(||⁡θ~p~))2.
For distinguishing attacks, based on Hypothesis Test 5,
Nmin⁢(Pe)≤υ2⁢ln⁡(1/Pe)2⁢(D⁢(||⁡p~q~)+D⁢(||⁡q~p~))2.

10 Comparison

Previous works have obtained expressions for data complexities of the various attacks considered in this paper. The analyses have been based on using the central limit theorem to approximate the distribution of the sum of some random variables using the normal distribution. In this work, we have not used any approximation in our analysis. It is of interest to compare the rigorous upper bounds on data complexities that we have obtained with the expressions for data complexities using normal approximations.

We start by making a theoretical comparison of the various expressions. To facilitate the comparison, we introduce some notation to denote the expressions for the variances that arise in the different cases.

Let

p~$⁢=Δ⁢(2-ℓ,…,2-ℓ)

be the uniform probability distribution over {0,1}ℓ. The variances in case of multiple linear cryptanalysis will be denoted by

(σ0(L))2 and (σ1(L))2

(see [32] for further details). For multiple differential cryptanalysis we denote the variances by

(σ0(D))2 and (σ1(D))2

(see [32] for further details). Lastly, for the LLR distinguisher we denote the variances by

(σ0(Dist))2 and (σ1(Dist))2

(see [32] for further details). The expressions are all similar and our use of different notation is only for the sake of convenience in comparison.

Table 1 compares the expressions for the approximate data complexities that exist in the literature to the corresponding upper bounds on the data complexities obtained in this paper. For single linear and single differential cryptanalysis, the approximate expressions for data complexities were originally obtained in [34]. The approximate expression for the data complexity of multiple linear cryptanalysis was obtained in [19], while the approximate expression for the data complexity of multiple differential cryptanalysis was obtained in [11]. These expressions were obtained using the order statistics based approach. In [32], the hypothesis testing framework was used to analyze data complexities. The actual forms of the approximate expressions for the data complexities listed in Table 1 are from [32]. For the case of distinguisher, the original analysis based on normal approximation was done in [2]. This was recapitulated in Section 2.3, and the approximate expression for the data complexity listed in Table 1 is given by (2.4).

The main observation from Table 1 is that in each case the denominator of the approximate expression is the same as that of the upper bound. So, the difference between the approximate expression and the upper bound arises from the difference in the numerator. A comparison of the numerators essentially involves comparing the inverse of the standard normal distribution function with the natural logarithm and seems to be difficult to do analytically. Instead, we perform an experimental comparison.

Table 1

Upper bound on the data complexities along with the existing data complexities. Here LC denotes linear cryptanalysis and DC denotes differential cryptanalysis.

Attack Type	Approximate data complexities	Upper bounds
Single LC	{Φ-1⁢(1-2-a-1)+1-c2⁢Φ-1⁢(PS)}2c2	2⁢{(a+1)⁢ln⁡2+3⁢(1+\|c\|)⁢ln⁡(1/(1-PS))}2c2
Single DC	{pw⁢(1-pw)⁢Φ-1⁢(1-2-a)+p0⁢(1-p0)⁢Φ-1⁢(PS)}2(p0-pw)2	3⁢{pw⁢(a+1)⁢ln⁡2+p0⁢ln⁡(1/(1-PS))}2(p0-pw)2
Multiple LC	{σ1(L)⁢Φ-1⁢(1-2-ℓ-a)+σ0(L)⁢Φ-1⁢(PS)}2(D⁢(\|\|⁡p~p~$)+D⁢(\|\|⁡p~$p~))2	υ2⁢{(a+ℓ)⁢ln⁡2+ln⁡(1/(1-PS))}22⁢(D⁢(\|\|⁡p~q~)+D⁢(\|\|⁡q~p~))2
Multiple DC	{σ1(D)⁢Φ-1⁢(1-2-a)+σ0(D)⁢Φ-1⁢(PS)}2(D⁢(\|\|⁡p~θ~)+D⁢(\|\|⁡θ~p~))2	υ2⁢{a⁢ln⁡2+ln⁡(1/(1-PS))}22⁢(D⁢(\|\|⁡p~θ~)+D⁢(\|\|⁡θ~p~))2
Distinguisher	{(σ0(Dist)+σ1(Dist))⁢Φ-1⁢(1-Pe)}2(D⁢(\|\|⁡p~q~)+D⁢(\|\|⁡q~p~))2	υ2⁢ln⁡(1/Pe)2⁢(D⁢(\|\|⁡p~q~)+D⁢(\|\|⁡q~p~))2

10.1 Comparison for Serpent

This section compares the approximate data complexity of multiple linear cryptanalysis with the upper bound for the block cipher Serpent. Collard et al. [14] presented reduced round linear cryptanalysis of the block cipher Serpent by using a set of linear approximations [15]. This set was later used in [18, 19]. The experiments conducted by Hermelin et al. [18] made use of one subset of 64 linear approximations among the set given in [15]. It was found that this subset can be generated from 10 linear approximations, which they called the basis linear approximations; [18, Table 2] lists these 10 linear approximations. These linear approximations can be used to recover 10 bits of the first round key. Thus, we have ℓ=10 and m=10.

Notice that in order to generate the full joint distribution it is required to get the biases for all the 210-1=1023 non-zero linear approximations generated from the 10 basis linear approximations. Since only 64 out of these 1023 linear approximations were given in [15], Hermelin et al. [18, 19] used two different techniques to generate the full distribution. We have used the second method.

Following [19], we fixed the value of PS to 0.95. Table 2 summarizes the output of the experiment for a=1,…,10. In the table, NLLR denotes the data complexity given by [19, (38)] and NUpp denotes the upper bound for multiple linear cryptanalysis given in Theorem 9.1. From the table it follows that the upper bound on the data complexity is about 43 to 63 times that of the approximate value.

Table 2

Comparison between NLLR and NUpp for the block cipher Serpent.

a	NLLR	NUpp	NUpp/NLLR
1	4.48×106	1.95×108	43.60
2	4.95×106	2.22×108	44.84
3	5.35×106	2.50×108	46.72
4	5.72×106	2.80×108	48.84
5	6.09×106	3.11×108	51.08
6	6.44×106	3.44×108	53.39
7	6.79×106	3.79×108	55.73
8	7.14×106	4.15×108	58.11
9	7.49×106	4.53×108	60.51
10	7.83×106	4.93×108	62.93

10.2 Comparisons using simulated joint distributions

The approximate expressions contain terms of the type Φ-1⁢(x) and the corresponding term in the upper bound is A⁢ln⁡(1/(1-x)) for A=1,2,3,6. (For x=PS this can be seen directly; the other x’s are 1-2-a-1, 1-2-a, 1-2-ℓ-a and 1-Pe, and the corresponding values of 1/(1-x) are 2a+1, 2a, 2ℓ+a and 1/Pe, respectively.) These terms do not depend on the probability distributions p~ or q~.

Comparing Φ-1⁢(x) with A⁢ln⁡(1/(1-x)). For x varying from 1-2-2 to 1-2-100, Figure 1 shows the plots of

Φ-1⁢(x),ln⁡(11-x),1Φ-1⁢(x)⁢ln⁡(11-x).

This shows that for the given range of x, the ratio ln⁡(1/(1-x))/Φ-1⁢(x) is between 1 and 2. For A=2,3 or 6, the ratio increases by A. Figure 2 shows the plots for the ratio A⁢ln⁡(1/(1-x))/Φ-1⁢(x) for A=1,2,3 and 6.

From these plots we can infer that the difference in the approximate data complexities and the upper bounds arising due to the difference in Φ-1⁢(x) and A⁢ln⁡(1/(1-x)) is only by a small constant.

$Figure 1 Plots of Φ-1⁢(x){\Phi^{-1}(x)}, ln⁡(1/(1-x)){\sqrt{\ln(1/(1-x))}} and ln⁡(1/(1-x))/Φ-1⁢(x){\sqrt{\ln(1/(1-x))}/\Phi^{-1}(x)}.$

Figure 1

Plots of Φ-1⁢(x), ln⁡(1/(1-x)) and ln⁡(1/(1-x))/Φ-1⁢(x).

$Figure 2 Plots of A⁢ln⁡(1/(1-x))/Φ-1⁢(x){\sqrt{A\ln(1/(1-x))}/\Phi^{-1}(x)} for A=1,2,3{A=1,2,3} and 6.$

Figure 2

Plots of A⁢ln⁡(1/(1-x))/Φ-1⁢(x) for A=1,2,3 and 6.

Comparisons of components depending on actual distributions. Some of the components in the numerators of the expressions given in Table 1 depend on the actual distributions p~ and q~. Performing these comparisons requires simulating appropriate distributions. Below, we mention the actual simulations that were done and the corresponding results.

Comparing 1-c2 and 1+|c|. Clearly, 1-c2<1+|c| holds. For our computations, we took c in the range (-2-40,2-40), and in this range 1-c2≈1≈1+|c|.

Comparing σ0(L) and σ1(L) with υ/2. This arises in the case of multiple linear cryptanalysis. For simulating the distributions, we took ℓ=5 and randomly selected the probabilities of p~ in such a way that ϵη∈(-2-40,2-40) for all η=0,1,…,25-1. The values σ0(L), σ1(L) and υ/2 were then compared by computing the ratios υ/(2⁢σ0(L)), υ/(2⁢σ1(L)) and σ0(L)/σ1(L). This experiment was repeated 10 times.

The result showed the ratios σ0(L)/σ1(L)≈1 and 2⁢υ/σ0(L)≈2⁢υ/σ1(L). Table 3 gives the values of υ/2, σ0(L) and 2⁢υ/σ0(L).

Table 3

Values of υ/2, σ0(L) and 2⁢υ/σ0(L).

υ/2	σ0(L)	υ/(2⁢σ0(L))
5.98×10-9	6.35×10-10	9.42
3.54×10-9	5.67×10-10	6.25
2.04×10-9	5.44×10-10	3.76
4.18×10-8	2.62×10-9	15.92
1.19×10-8	8.85×10-10	13.41
1.69×10-8	1.15×10-9	14.70
6.06×10-9	6.32×10-10	9.60
1.31×10-8	9.50×10-10	13.83
1.52×10-8	1.05×10-9	14.49
1.16×10-8	8.74×10-10	13.27

Table 4

Values of υ/2, σ0(D) and 2⁢υ/σ0(D).

υ/2	σ0(D)	υ/(2⁢σ0(D))
0.0071	1.56×10-7	32174.65
0.0070	1.51×10-7	32578.80
0.0070	1.51×10-7	32891.29
0.0066	1.60×10-7	28959.21
0.0074	1.44×10-7	36168.05
0.0076	1.62×10-7	32985.94
0.0077	1.72×10-7	31684.23
0.0071	1.44×10-7	34608.71
0.0073	1.53×10-7	33980.48
0.0074	1.50×10-7	34872.68

Comparing σ0(D) and σ1(D) with υ/2. This arises in the case of multiple differential cryptanalysis. For the simulation we took n=32,m=10 and ν=20, and again ensured that ϵη∈(-2-40,2-40) for all η=0,1,…,20. Random distributions were generated by using these parameters like multiple linear cryptanalysis. The ratios 2⁢υ/σ0(D), 2⁢υ/σ1(D) and σ0(D)/σ1(D) were considered. The experiment was also repeated 10 times.

As before, the result showed the ratios 2⁢υ/σ0(D)≈2⁢υ/σ1(D) and σ0(D)/σ1(D)≈1. Table 4 gives the values of υ/2, σ0(D) and 2⁢υ/σ0(D).

The experiment clearly shows that value of σ0(D) is quite small compared to υ/2. The reason being that for M=40 and n=32 the difference M-n=8 is quite small. We explain this more clearly. For the distributions considered, we have pη=qη+ϵη for all η≠0, where qη=1/(2n-1)≈2-n and ϵη∈(-2-M,2-M). This implies

1-2-(M-n)<pηqη≈1+ϵη2-n<1+2-(M-n).

Therefore, we have

1-2-(M-n)<υmin and υmax<1+2-(M-n),

which implies that υ is upper bounded by 2-(M-n-1), i.e., 0≤υ<2-(M-n-1). Therefore, υ is small if 2-(M-n-1) is small. In the present case, we have M=40 and n=32, which implies that 2-(M-n-1)=2-7. Similarly, for multiple linear cryptanalysis υ is upper bounded by 2-(M-ℓ-1). Previously we had taken ℓ=10, which implies that 2-(M-ℓ-1)=2-29. This somewhat explains the reason as to why the value υ/2 is closer to σ0(L) in case of multiple linear cryptanalysis than compared to σ0(D) for the multiple differential cryptanalysis.

Comparing (σ0(Dist)+σ1(Dist)) with υ/2. This is relevant for the distinguisher. The distinguisher is defined for arbitrary probability distributions p~ and q~. For the experimental comparison, we applied the distinguisher to the context of multiple linear cryptanalysis. Here, as before, we chose ℓ=5 and ϵη in the same range as that of multiple linear cryptanalysis. Unlike the previous cases, here it is required to compute

υ/(2⁢(σ02⁢(Dist)+σ1(Dist))).

As before, the experiment was repeated 10 times and the observations are listed in Table 5.

Overall comparison of approximate data complexities with the upper bounds. The size of the target sub-key was taken to be m=10 bits, and the block size n=32. For single linear cryptanalysis, we chose c randomly in the range (-2-40,2-40). For single differential cryptanalysis, it was assumed that p0=pw+c, where pw=1/(2n-1) and c was chosen randomly from (-2-40,2-40). In the cases of multiple linear cryptanalysis and the LLR distinguisher we took ℓ=5, and for multiple differential cryptanalysis we took ν=20. In all three cases, the ϵη’s were randomly chosen from (-2-40,2-40).

As is normally the case, the success probability PS was fixed to a constant. We have used three different success probabilities, namely PS=1-2-5, 1-2-7 and 1-2-10. The advantage was varied from a=2 to 100 for all cases other than the LLR distinguisher. For each value of a, the ratio of the upper bound on the data complexity to the approximate data complexity was computed and the minimum and maximum of these values were recorded. The rows of Table 6 report these minimums and maximums. For the case of the LLR distinguisher, it is required that α=β, and hence for our example a=5,7 and 10. For each of these values of a, we ran the experiment 100 times and recorded the minimum and the maximum. The last row of Table 6 reports these values.

Table 5

Values of υ/2, (σ0(Dist)+σ1(Dist)) and 2⁢υ/(σ0(Dist)+σ1(Dist)).

υ/2	σ0(Dist)+σ1(Dist)	2⁢υ/(σ0(Dist)+σ1(Dist))
1.33×10-9	5.43×10-10	2.45
2.14×10-8	1.40×10-9	15.28
4.11×10-9	5.81×10-10	7.08
1.86×10-9	5.45×10-10	3.41
4.35×10-9	5.94×10-10	7.33
4.34×10-9	5.76×10-10	7.55
1.83×10-8	1.22×10-9	14.96
1.32×10-9	5.48×10-10	2.40
1.98×10-8	1.31×10-9	15.16
8.10×10-9	7.13×10-10	11.35

Table 6

Maximum and minimum values of the ratios of the upper bound to the approximate data complexity for each row of Table 1.

	PS=1-2-5		PS=1-2-7		PS=1-2-10
Type of attack	Maximum	Minimum	Maximum	Minimum	Maximum	Minimum
Single LC	6.02	1.70	5.21	1.73	4.63	1.76
Single DC	5.09	1.89	4.17	1.84	3.50	1.80
Multiple DC	2.30×109	3.05×108	2.63×109	1.70×108	1.90×109	2.54×108
Multiple LC	200.55	4.43	197.75	4.43	199.06	4.53
LLR distinguisher	2.55	1.01	2.17	0.86	1.82	0.77

From Table 6 it can be observed that other than the case of multiple differential cryptanalysis, the upper bound is not significantly larger than the approximate data complexity. For multiple differential cryptanalysis, the upper bound is significantly greater than the approximate value. To a large extent, the higher value of the upper bound is explained by the differences in the values of υ and the variances as reported in Tables 3, 4 and 5.

For the cases where the approximate data complexities and the upper bounds are close, our conclusion is that it is perhaps better to use the upper bounds as the data complexities of the corresponding attacks. While this will push up the data requirement to some extent, it is based on rigorous analysis and is certain to hold in all cases. For multiple differential cryptanalysis, the gap in the approximate and upper bound on data complexity is fairly large, so that no clear conclusion can be drawn. This gap could be due to the approximate value being a significant underestimate or due to the fact that the upper bound is an overestimate. At this point of time, we are unable to determine the exact reason. More work is necessary to settle this point.

10.3 Comparing the two upper bounds for single linear and differential cryptanalysis

Note that in our analysis we get two upper bounds on data complexity of single linear cryptanalysis – one obtained directly by using the Chernoff bound and another by putting ℓ=1 in the expression for data complexity of multiple linear cryptanalysis. Putting ℓ=1 in equation (4.1), we get

υ2=[max⁡{ln⁡(1+c),ln⁡(1-c)}-max⁡{ln⁡(1-c),ln⁡(1+c)}]2=(ln⁡(1+c1-c))2,

μ0=12⁢[ln⁡(1-c2)+c⁢ln⁡(1+c1-c)],

μ1=12⁢ln⁡(1-c2),

N=2⁢{(a+1)⁢ln⁡2+ln⁡(1/(1-PS))}2c2.

This needs to be compared with the expression obtained by using the Chernoff bound, i.e.,

N=2⁢{(a+1)⁢ln⁡2+3⁢(1+|c|)⁢ln⁡(1/(1-PS))}2c2.

Let us denote (a+1)⁢ln⁡2 by x, ln⁡(1/(1-PS)) by y, the data complexity obtained using the Chernoff bound by NC, and the data complexity obtained using Hoeffding bounds by NH. Then

NH-NC=2c2⁢{(x+y)2-(x+3⁢(1+|c|)⁢y)2}

=-2c2⁢{(2+3⁢∣c∣)⁢y2+2⁢(3⁢(1+∣c∣)-1)⁢x⁢y}

<0,

where the inequality follows since x and y are greater than zero and 3⁢(1+∣c∣)>1. Thus we have NH<NC, which means that the data complexity obtained by using the Hoeffding bound gives a better upper bound in case of single linear cryptanalysis.

Similarly, one obtains two upper bounds on the data complexity of single differential cryptanalysis. Putting ν=1 in the right-hand side of (6.2), we get

p~=(1-p0,p0),θ~=(1-pw,pw),

υ2=[max⁡{ln⁡(p0pw),ln⁡(1-p01-pw)}-min⁡{ln⁡(p0pw),ln⁡(1-p01-pw)}]2=(ln⁡(p0⁢(1-pw)pw⁢(1-p0)))2,

D⁢(||⁡p~θ~)=(1-p0)⁢ln⁡(1-p01-pw)+p0⁢ln⁡(p0pw)=ln⁡(1-p01-pw)+p0⁢ln⁡(p0⁢(1-pw)pw⁢(1-p0)),

D⁢(||⁡θ~p~)=ln⁡(1-pw1-p0)-pw⁢ln⁡(p0⁢(1-pw)pw⁢(1-p0)),

(D⁢(||⁡p~θ~)+D⁢(||⁡θ~p~))2=(p0-pw)2⁢(ln⁡(p0⁢(1-pw)pw⁢(1-p0)))2=(p0-pw)2⁢υ2,

NH={a⁢ln⁡2+ln⁡(1/(1-PS))}22⁢(p0-pw)2.

This needs to be compared with the expression obtained using the Chernoff bound, i.e.,

NC=3⁢{pw⁢(a+1)⁢ln⁡2+p0⁢ln⁡(1/(1-PS))}2(p0-pw)2.

Then

NH-NC={(x+y)2-6⁢(pw⁢x+p0⁢y)2}2⁢(p0-pw)2={(1-6⁢pw)⁢x2+(1-6⁢p0)⁢y2+2⁢(1-6⁢p0⁢pw)⁢x⁢y}2⁢(p0-pw)2

Now, 1-6⁢pw≥0 implies pw≤16, or in other words n≥3. Recall, that n denotes the block size. Therefore, it is safe to assume that pw≤16. Similarly, it is also safe to assume that p0≤16. Then these two assumptions give 1-6⁢p0⁢pw≥0. Thus we have

NH-NC≥0,

or in other words NH≥NC. Therefore, the data complexity obtained using Chernoff bounds gives a better upper bound in case of single differential cryptanalysis.

11 Conclusion

The paper obtains rigorous upper bounds on the data complexities of linear and differential cryptanalysis. No use is made of the central limit theorem to approximate the distribution of a sum of random variables using the normal distribution. Computations show that the obtained upper bounds are not too far away from previously obtained approximate data complexities. Due to the rigorous nature of our analysis, we believe that this approach may be adopted in the future to analyze other techniques for cryptanalysis.

The statistical techniques used to obtain the upper bounds are fairly standard, though to the best of our knowledge they have not been used in this context earlier. We, however, make no claims that the bounds that we obtain cannot be improved. In fact, one of the goals of our work is to stimulate interest in rigorous statistical analysis of attacks on block ciphers. Hopefully, the community will further explore this direction of research since we believe that if something is worth doing, then it is worth doing properly.

Communicated by Josef Pieprzyk

A Concentration inequalities

A.1 Chernoff bounds

We briefly recall some results on tail probabilities of sums of Poisson trials that will be used later. These results can be found in standard texts such as [29, 28] and are usually referred to as the Chernoff bounds.

Theorem A.1.

Let X1,X2,…,Xλ be a sequence of independent Poisson trials such that Pr⁡[Xi=1]=pi for 1≤i≤λ. Then for X=∑i=1λXi and μ=E⁢[X]=∑i=1λpi the following bounds hold:

(A.1)Pr⁡[X≥(1+δ)⁢μ]<(e-δ(1+δ)(1+δ))μ, ⁢δ>0,

Pr⁡[X≤(1-δ)⁢μ]≤(e-δ(1-δ)(1-δ))μ, ⁢0<δ<1.

These bounds can be simplified to the following form:

(A.2)Pr⁡[X≥(1+δ)⁢μ]≤e-μ⁢δ2/3, ⁢0<δ≤1,

(A.3)Pr⁡[X≤(1-δ)⁢μ]≤e-μ⁢δ2/2, ⁢0<δ<1.

Further, if pi=12 for i=1,…,λ, then the following stronger bounds hold:

(A.4)Pr⁡[X≥(1+δ)⁢μ]≤e-δ2⁢μ, ⁢δ>0,

(A.5)Pr⁡[X≤(1-δ)⁢μ]≤e-δ2⁢μ, ⁢0<δ<1.

A.2 Hoeffding inequality

We briefly recall Hoeffding’s inequality for a sum of independent random variables. The result can be found in standard texts such as [28].

Theorem A.2 (Hoeffding inequality).

Let X1,X2,…,Xλ be a finite sequence of independent random variables such that for all i=1,…,λ there exist real numbers ai,bi∈R with ai<bi and ai≤Xi≤bi. Let X=∑i=1λXi. Then for any positive t>0,

(A.6)Pr⁡[X-E⁢[X]≥t]≤exp⁡(-2⁢t2Dλ),

(A.7)Pr⁡[X-E⁢[X]≤-t]≤exp⁡(-2⁢t2Dλ),

Pr⁡[|X-E⁢[X]|≥t]≤2⁢exp⁡(-2⁢t2Dλ),

where Dλ=∑i=1λ(bi-ai)2.

B Data complexity of distinguisher using normal approximation

The LLR based test statistics for distinguishing between p~ and q~ is taken to be

T=LLR/N-μ1σ1/N.

The following two asymptotic assumptions are usually made:

If the Xj’s follow q~, then T approximately follows the standard normal distribution Φ⁢(0,1) for sufficiently large N.
On the other hand, if the Xj’s follow p~, then T is rewritten as
T=σ0σ1⁢Z+N⁢(μ0-μ1)σ1,
where
Z=LLR/N-μ0σ0/N.
The latter approximately follows, for sufficiently large N, the standard normal distribution Φ⁢(0,1).

Both the above assumptions involve an error term. The error can be bounded above by using the Berry–Esseen theorem; see [32] for details of this analysis.

The form of the test is determined by the relative values of μ0 and μ1.

Hypothesis Test 6.

μ0>μ1: reject H0 if T≤t, where t is in the range μ1<t<μ0. μ0<μ1: reject H0 if T≥t, where t is in the range μ0<t<μ1.

Let α and β be the probabilities of type-I and type-II errors respectively. Define

Pe=α+β2.

The goal is to choose a value of t for which α=β holds. The analysis of α and β is done as follows: First suppose μ0>μ1. Then

α=Pr⁡[Type-I Error]=Pr⁡[T≤t∣H0⁢ holds]=Φ⁢(σ1⁢tσ0-N⁢(μ0-μ1)σ0),

β=Pr⁡[Type-II Error]=Pr⁡[T>t∣H1⁢ holds]=1-Φ⁢(t)=Φ⁢(-t).

In this case, t=N⁢(μ0-μ1)/(σ0+σ1) ensures that α=β.

Now suppose that μ0<μ1. Proceeding as above shows that choosing t=N⁢(μ1-μ0)/(σ0+σ1) ensures α=β. So, irrespective of the relative values of μ0 and μ1, for

t=N⁢|μ0-μ1|σ0+σ1

the expression for Pe is the following:

(B.1)Pe=Φ⁢(-t)=Φ⁢(-N⁢|μ0-μ1|σ0+σ1)=Φ⁢(-N⁢|D⁢(||⁡p~q~)+D⁢(||⁡q~p~)|σ0+σ1).

In [2], a second-order Taylor series expansion of the ln term was used in the expression for the Kullback–Leibler divergence. This resulted in the expression for Pe simplifying to Pe=Φ⁢(-N⁢C⁢(p~,q~)/2), where C⁢(p~,q~) is defined to be the capacity between the two probability distributions p~ and q~.

From the expression for Pe given by (B.1), it is possible to obtain an expression for the data complexity N required to achieve a desired value of Pe:

N=((σ0+σ1)⁢Φ-1⁢(1-Pe)D⁢(||⁡p~q~)+D⁢(||⁡q~p~))2.

Acknowledgements

We thank the referee for a careful reading of the paper and providing comments which have helped improving the work.

References

[1] M. A. Abdelraheem, M. Å gren, P. Beelen and G. Leander, On the distribution of linear biases: Three instructive examples, Advances in Cryptology (Santa Barbara 2012), Lecture Notes in Comput. Sci. 7417, Springer, Heidelberg (2012), 50–67. 10.1007/978-3-642-32009-5_4Search in Google Scholar

[2] T. Baignères, P. Junod and S. Vaudenay, How far can we go beyond linear cryptanalysis?, Advances in Cryptology (Jeju Island 2004), Lecture Notes in Comput. Sci. 3329, Springer, Berlin (2004), 432–450. 10.1007/978-3-540-30539-2_31Search in Google Scholar

[3] T. Baignères, P. Sepehrdad and S. Vaudenay, Distinguishing distributions using Chernoff information, Provable Security, Lecture Notes in Comput. Sci. 6402, Springer, Berlin (2010), 144–165. 10.1007/978-3-642-16280-0_10Search in Google Scholar

[4] T. Baignères and S. Vaudenay, The complexity of distinguishing distributions (invited talk), Information Theoretic Security (Calgary 2008), Lecture Notes in Comput. Sci. 5155, Springer, Berlin (2008), 210–222. 10.1007/978-3-540-85093-9_20Search in Google Scholar

[5] E. Biham, A. Biryukov and A. Shamir, Cryptanalysis of Skipjack reduced to 31 rounds using impossible differentials, J. Cryptology 18 (2005), no. 4, 291–311. 10.1007/s00145-005-0129-3Search in Google Scholar

[6] E. Biham and A. Shamir, Differential cryptanalysis of DES-like cryptosystems, J. Cryptology 4 (1991), no. 1, 3–72. 10.1007/BF00630563Search in Google Scholar

[7] E. Biham and A. Shamir, Differential cryptanalysis of DES-like cryptosystems, Advances in Cryptology (Santa Barbara 1990), Lecture Notes in Comput. Sci. 537, Springer, Berlin (2003), 2–21. 10.1007/3-540-38424-3_1Search in Google Scholar

[8] A. Biryukov, C. De Cannière and M. Quisquater, On multiple linear approximations, Advances in Cryptology (Santa Barbara 2004), Lecture Notes in Comput. Sci. 3152, Springer, Berlin (2004), 1–22. 10.1007/978-3-540-28628-8_1Search in Google Scholar

[9] C. Blondeau, A. Bogdanov and G. Leander, Bounds in shallows and in miseries, Advances in Cryptology (Santa Barbara 2013), Lecture Notes in Comput. Sci. 8042, Springer, Berlin (2013), 204–221. 10.1007/978-3-642-40041-4_12Search in Google Scholar

[10] C. Blondeau and B. Gérard, Multiple differential cryptanalysis: Theory and practice, Fast Software Encryption (Lyngby 2011), Lecture Notes in Comput. Sci. 6733, Springer, Berlin (2011), 35–54. 10.1007/978-3-642-21702-9_3Search in Google Scholar

[11] C. Blondeau, B. Gérard and K. Nyberg, Multiple differential cryptanalysis using LLR and χ2 statistics, Security and Cryptography for Networks, Lecture Notes in Comput. Sci. 7485, Springer, Heidelberg (2012), 343–360. 10.1007/978-3-642-32928-9_19Search in Google Scholar

[12] C. Blondeau, B. Gérard and J.-P. Tillich, Accurate estimates of the data complexity and success probability for various cryptanalyses, Des. Codes Cryptogr. 59 (2011), no. 1–3, 3–34. 10.1007/s10623-010-9452-2Search in Google Scholar

[13] A. Bogdanov and E. Tischhauser, On the wrong key randomisation and key equivalence hypotheses in Matsui’s algorithm 2, Fast Software Encryption (Singapore 2013), Lecture Notes in Comput. Sci. 8424, Springer, Berlin (2014), 19–38. 10.1007/978-3-662-43933-3_2Search in Google Scholar

[14] B. Collard, F.-X. Standaert and J.-J. Quisquater, Experiments on the multiple linear cryptanalysis of reduced round serpent, Fast Software Encryption (Lausanne 2008), Lecture Notes in Comput. Sci. 5086, Springer, Berlin (2008), 382–397. 10.1007/978-3-540-71039-4_24Search in Google Scholar

[15] B. Collard, F.-X. Standaert and J.-J. Quisquater, data file, 2008. Search in Google Scholar

[16] I. Dinur and A. Shamir, Cube attacks on tweakable black box polynomials, Advances in Cryptology (Cologne 2009), Lecture Notes in Comput. Sci. 5479, Springer, Berlin (2009), 278–299. 10.1007/978-3-642-01001-9_16Search in Google Scholar

[17] C. Harpes, G. G. Kramer and J. L. Massey, A generalization of linear cryptanalysis and the applicability of Matsui’s piling-up lemma, Advances in Cryptology (Saint-Malo 1995), Lecture Notes in Comput. Sci. 921, Springer, Berlin (1995), 24–38. 10.1007/3-540-49264-X_3Search in Google Scholar

[18] M. Hermelin, J. Y. Cho and K. Nyberg, Multidimensional linear cryptanalysis of reduced round serpent, Information Security and Privacy (Wollongong 2008), Lecture Notes in Comput. Sci. 5107, Springer, Berlin (2008), 203–215. 10.1007/978-3-540-70500-0_15Search in Google Scholar

[19] M. Hermelin, J. Y. Cho and K. Nyberg, Multidimensional extension of Matsui’s algorithm 2, Fast Software Encryption (Leuven 2009), Lecture Notes in Comput. Sci. 5665, Springer, Berlin (2009), 209–227. 10.1007/978-3-642-03317-9_13Search in Google Scholar

[20] P. Junod, On the optimality of linear, differential, and sequential distinguishers, Advances in Cryptology (Warsaw 2003), Lecture Notes in Comput. Sci. 2656, Springer, Berlin (2003), 17–32. 10.1007/3-540-39200-9_2Search in Google Scholar

[21] P. Junod and S. Vaudenay, Optimal key ranking procedures in a statistical cryptanalysis, Fast Software Encryption (Lund 2003), Lecture Notes in Comput. Sci. 2887, Springer, Berlin (2003), 235–246. 10.1007/978-3-540-39887-5_18Search in Google Scholar

[22] B. S. Kaliski, Jr. and M. J. B. Robshaw, Linear cryptanalysis using multiple approximations, Advances in Cryptology (Santa Barbara 1994), Lecture Notes in Comput. Sci. 839, Springer, Berlin (1994), 26–39. 10.1007/3-540-48658-5_4Search in Google Scholar

[23] L. R. Knudsen, Truncated and higher order differentials, Fast Software Encryption (Leuven 1994), Lecture Notes in Comput. Sci. 1008, Springer, Berlin (1995), 196–211. 10.1007/3-540-60590-8_16Search in Google Scholar

[24] X. Lai, Higher order derivatives and differential cryptanalysis, Communications and Cryptography (Ascona 1994), Kluwer Int. Ser. Eng. Comput. Sci. 276, Kluwer Academic Publishers, Boston (1994), 227–233. 10.1007/978-1-4615-2694-0_23Search in Google Scholar

[25] G. Leander, On linear hulls, statistical saturation attacks, PRESENT and a cryptanalysis of PUFFIN, Advances in Cryptology (Tallinn 2011), Lecture Notes in Comput. Sci. 6632, Springer, Heidelberg (2011), 303–322. 10.1007/978-3-642-20465-4_18Search in Google Scholar

[26] M. Matsui, Linear cryptanalysis method for DES cipher, Advances in Cryptology (Lofthus 1993), Lecture Notes in Comput. Sci. 765, Springer, Berlin (1993), 386–397. 10.1007/3-540-48285-7_33Search in Google Scholar

[27] M. Matsui, The first experimental cryptanalysis of the data encryption standard, Advances in Cryptology (Santa Barbara 1994), Lecture Notes in Comput. Sci. 839, Springer, Berlin (1994), 1–11. 10.1007/3-540-48658-5_1Search in Google Scholar

[28] M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Cambridge University Press, Cambridge, 2005. 10.1017/CBO9780511813603Search in Google Scholar

[29] R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press, Cambridge, 1995. 10.1017/CBO9780511814075Search in Google Scholar

[30] S. Murphy, The independence of linear approximations in symmetric cryptanalysis, IEEE Trans. Inform. Theory 52 (2006), no. 12, 5510–5518. 10.1109/TIT.2006.885528Search in Google Scholar

[31] K. Nyberg and M. Hermelin, Multidimensional walsh transform and a characterization of bent functions, Proceedings of the 2007 IEEE Information Theory Workshop on Information Theory for Wireless Networks, IEEE Press, Solstrand (2007), 83–86. 10.1109/ITWITWN.2007.4318037Search in Google Scholar

[32] S. Samajder and P. Sarkar, Another look at normal approximations in cryptanalysis, J. Math. Cryptol. 10 (2016), no. 2, 69–99. 10.1515/jmc-2016-0006Search in Google Scholar

[33] S. Samajder and P. Sarkar, Can large deviation theory be used for estimating data complexity?, Cryptology ePrint Archive (2016), https://eprint.iacr.org/2016/465.pdf. Search in Google Scholar

[34] A. A. Selçuk, On probability of success in linear and differential cryptanalysis, J. Cryptology 21 (2008), no. 1, 131–147. 10.1007/s00145-007-9013-7Search in Google Scholar

[35] C. Tezcan, The improbable differential attack: Cryptanalysis of reduced round CLEFIA, Progress in Cryptology (Hyderabad 2010), Lecture Notes in Comput. Sci. 6498, Springer, Berlin (2010), 197–209. 10.1007/978-3-642-17401-8_15Search in Google Scholar

[36] D. Wagner, The boomerang attack, Fast Software Encryption (Rome 1999), Lecture Notes in Comput. Sci. 1636, Springer, Berlin (1999), 156–170. 10.1007/3-540-48519-8_12Search in Google Scholar

Received: 2016-5-10

Revised: 2017-8-16

Accepted: 2017-8-24

Published Online: 2017-9-21

Published in Print: 2017-10-1

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Rigorous upper bounds on data complexities of block cipher cryptanalysis

Abstract

1 Introduction

1.1 Our contributions

1.2 Bounds on data complexity

1.3 How good are the bounds?

1.4 Previous and related works

2 Background

2.1 Background for block cipher cryptanalysis

2.2 Linear cryptanalysis

2.3 LLR statistics

3 Single linear approximation

Hypothesis Test 1 (Single linear cryptanalysis).

Proposition 3.1.

Proof.

4 Multiple linear cryptanalysis

Hypothesis Test 2 (Multiple linear cryptanalysis).

Proposition 4.1.

Proof.

5 Single differential cryptanalysis

Hypothesis Test 3 (Single differential cryptanalysis).

Proposition 5.1.

Proof.

6 Multiple differential cryptanalysis

Hypothesis Test 4 (Multiple differential cryptanalysis).

Proposition 6.1.

Proof.

7 Relating advantage to type-II error probability

8 Distinguishers

Hypothesis Test 5 (Distinguisher).

Proposition 8.1.

Proof.

9 Upper bounds

Theorem 9.1.

10 Comparison

10.1 Comparison for Serpent

10.2 Comparisons using simulated joint distributions

10.3 Comparing the two upper bounds for single linear and differential cryptanalysis

11 Conclusion

A Concentration inequalities

A.1 Chernoff bounds

Theorem A.1.

A.2 Hoeffding inequality

Theorem A.2 (Hoeffding inequality).

B Data complexity of distinguisher using normal approximation

Hypothesis Test 6.

Acknowledgements

References

Journal and Issue

Articles in the same Issue