LWE Without Modular Reduction and Improved Side-Channel Attacks Against BLISS

Bootle, Jonathan; Delaplace, Claire; Espitau, Thomas; Fouque, Pierre-Alain; Tibouchi, Mehdi

doi:10.1007/978-3-030-03326-2_17

Jonathan Bootle¹⁵,
Claire Delaplace^16,17,
Thomas Espitau¹⁸,
Pierre-Alain Fouque¹⁶ &
…
Mehdi Tibouchi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11272))

Included in the following conference series:

International Conference on the Theory and Application of Cryptology and Information Security

2457 Accesses
31 Citations

Abstract

This paper is devoted to analyzing the variant of Regev’s learning with errors (LWE) problem in which modular reduction is omitted: namely, the problem (ILWE) of recovering a vector $\mathbf {s}\in \mathbb {Z}^n$ given polynomially many samples of the form $(\mathbf {a},\langle \mathbf {a},\mathbf {s}\rangle + e)\in \mathbb {Z}^{n+1}$ where $\mathbf { a}$ and e follow fixed distributions. Unsurprisingly, this problem is much easier than LWE: under mild conditions on the distributions, we show that the problem can be solved efficiently as long as the variance of e is not superpolynomially larger than that of $\mathbf { a}$. We also provide almost tight bounds on the number of samples needed to recover $\mathbf {s}$.

Our interest in studying this problem stems from the side-channel attack against the BLISS lattice-based signature scheme described by Espitau et al. at CCS 2017. The attack targets a quadratic function of the secret that leaks in the rejection sampling step of BLISS. The same part of the algorithm also suffers from a linear leakage, but the authors claimed that this leakage could not be exploited due to signature compression: the linear system arising from it turns out to be noisy, and hence key recovery amounts to solving a high-dimensional problem analogous to LWE, which seemed infeasible. However, this noisy linear algebra problem does not involve any modular reduction: it is essentially an instance of ILWE, and can therefore be solved efficiently using our techniques. This allows us to obtain an improved side-channel attack on BLISS, which applies to 100% of secret keys (as opposed to ${\approx }7\%$ in the CCS paper), and is also considerably faster.

You have full access to this open access chapter, Download conference paper PDF

Coded-BKW with Sieving

Solving LWR via BDD Strategy: Modulus Switching Approach

Provable Dual Attacks on Learning with Errors

1 Introduction

Learning with Errors. Regev’s learning with errors problem (LWE) is the problem of recovering a uniformly random vector $\mathbf {s}\in (\mathbb {Z}/q\mathbb {Z})^n$ given polynomially many samples of the form $(\mathbf { a}, \langle \mathbf { a},\mathbf { s}\rangle + e\bmod q)$, with $\mathbf {a}$ uniform in $(\mathbb {Z}/q\mathbb {Z})^n$, and e sampled according to a fixed distribution over $\mathbb {Z}/q\mathbb {Z}$ (typically a discrete Gaussian). Regev showed [43] that for suitable parameters, this problem is as hard as worst-case lattice problems, and is polynomial-time equivalent to its decision version, which asks to distinguish the distribution of tuples $(\mathbf { a}, \langle \mathbf { a},\mathbf {s}\rangle + e\bmod q)$ as above from the uniform distribution over $(\mathbb {Z}/q\mathbb {Z})^{n+1}$. These results are a cornerstone of modern lattice-based cryptography, which is to a large extent based on LWE and related problems.

Many variants of the LWE problem have been introduced in the literature, mostly with the goal of improving the efficiency of lattice-based cryptography. For example, papers have been devoted to the analysis of LWE when the error e has a non-Gaussian distribution and/or is very small [6, 16, 38], when the secret $\mathbf { s}$ is sampled from a non-uniform distribution [2, 3, 5, 7, 12], or when the vectors $\mathbf { a}$ are non-uniform [20, 23]. A long line of research has considered variants of LWE in which auxiliary information is provided about the secret $\mathbf { s}$ [12, 15, 21, 31]. Extensions of LWE over more general rings have also been extensively studied, starting from the introduction of the Ring-LWE problem [29, 36, 37, 46]. Yet another notable variant of LWE is the learning with rounding (LWR) problem [4, 8, 9], in which the scalar product $\langle \mathbf { a},\mathbf { s}\rangle $ is partly hidden not by adding some noise e, but by disclosing only its most significant bits.

Recently, further exotic variants have emerged in association with schemes submitted to the NIST postquantum cryptography standardization process. One can mention for example Compact-LWE [33, 34], which has been broken [11, 30, 48]; learning with truncation, considered in pqNTRUSign [24]; and Mersenne variants of Ring-LWE, introduced for ThreeBears [22] and Mersenne–756839 [1].

The ILWE Problem. In this paper, we introduce a simpler variant of LWE in which computations are carried out over $\mathbb {Z}$ rather than $\mathbb {Z}/q\mathbb {Z}$, i.e. without modular reduction. More precisely, we consider the problem which we call ILWE (“integer LWE”) of finding a vector $\mathbf { s}\in \mathbb {Z}^n$ given polynomially many samples of the form $(\mathbf { a},\langle \mathbf { a},\mathbf {s}\rangle + e)\in \mathbb {Z}^{n+1}$, where $\mathbf { a}$ and e follow fixed distributions on $\mathbb {Z}$.

This problem may occur more naturally in statistical learning theory or numerical analysis than it does in cryptography; indeed, contrary to LWE, it is usually not hard. It can even be solved efficiently when the error e is much larger than the inner product $\langle \mathbf {a},\mathbf { s}\rangle $ (but not superpolynomially larger), under relatively mild conditions on the distributions involved.

The fact that standard learning techniques like least squares regression should apply to this problem can be regarded as folklore, and is occasionally mentioned in special cases in the cryptographic literature (see e.g. [20, Sect. 7.6]). The main purpose of this work is to give a completely rigorous treatment of this question, and in particular to analyze the number of samples needed to solve ILWE both in an information-theoretic sense and using concrete algorithms.

ILWE and Side-Channel Attacks on BLISS. Our main motivation for studying the ILWE problem is a side-channel attack against the BLISS lattice-based signature scheme described by Espitau et al. at CCS 2017 [19].

BLISS [17] is one of the most prominent, efficient and widely implemented lattice-based signature schemes, and it has received significant attention in terms of side-channel analysis. Several papers [13, 19, 40] have pointed out that, in available implementations, certain parts of the signing algorithm can leak sensitive information about the secret key via various side-channels like cache timing, electromagnetic emanations and secret-dependent branches. They have shown that this leakage can be exploited for key recovery.

We are in particular interested in the leakage that occurs in the rejection sampling step of BLISS signature generation. Rejection sampling is an essential element of the construction of BLISS and other lattice-based signatures following Lyubashevsky’s “Fiat–Shamir with aborts” framework [35]. Implementing it efficiently in a scheme using Gaussian distributions, as is the case for BLISS, is not an easy task, however, and as observed by Espitau et al., the optimization used in BLISS turns out to leak two functions of the secret key via side-channels: an exact, quadratic function, as well as a noisy, linear function.

The attack proposed by Espitau et al. relies only on the quadratic leakage, and as a result uses very complex and computationally costly techniques from algorithmic number theory (a generalization of the Howgrave-Graham–Szydlo algorithm for solving norm equations). In particular, not only does the main, polynomial-time part of their algorithm takes over a CPU month for standard BLISS parameters, technical reasons related to the hardness of factoring make their attack only applicable to a small fraction of BLISS secret key (around $7\%$; these are keys satisfying a certain smoothness condition). They note that using the linear leakage instead would be much simpler if the linear function was exactly known, but cannot be done due to its noisy nature: recovering the key then become a high-dimensional noisy linear algebra problem analogous to LWE, which should therefore be hard.

However, the authors missed an important difference between that linear algebra problem and LWE: the absence of modular reduction. The problem can essentially be seen as an instance of ILWE instead, and our analysis thus shows that it is easy to solve. This results in a much more computationally efficient attack taking advantage of the leakage in BLISS rejection sampling, which moreover applies to all secret keys.

Our Contributions. We propose a detailed theoretical analysis of the ILWE problem and show how it can be applied to the side-channel attack on BLISS. We also provide numerical simulations showing that our proposed algorithms behave in a way consistent with the theoretical predictions.

On the theoretical side, our first contribution is to prove that, in an information-theoretic sense, solving the ILWE problem requires at least $m=\varOmega \big ((\sigma _e/\sigma _a)\big )^2$ samples from the ILWE distribution when the error e has standard deviation $\sigma _e$, and the coefficients of the vectors $\mathbf { a}$ in samples have standard deviation $\sigma _a$. We show this by estimating the statistical distance between the distributions arising from two distinct secret vectors $\mathbf { s}$ and $\mathbf { s}'$. In particular, the ILWE problem is hard when $\sigma _e$ is superpolynomially larger than $\sigma _a$, but can be easy otherwise, including when $\sigma _e$ exceeds $\sigma _a$ by a large polynomial factor.

We then provide and analyze concrete algorithms for solving the problem in that case. Our main focus is least squares regression followed by rounding. Roughly speaking, we show that this approach solves the ILWE problem with m samples when $m \ge C\cdot \big (\sigma _e/\sigma _a)^2\log n$ for some constant C (and is also a constant factor larger than n, to ensure that the noise-free version of the corresponding linear algebra problem has a unique solution, and that the covariance matrix of the vectors $\mathbf { a}$ is well-controlled). Our result applies to a very large class of distributions for $\mathbf { a}$ and e including bounded distributions and discrete Gaussians. It relies on subgaussian concentration inequalities.

Interestingly, ILWE can be interpreted as a bounded distance decoding problem in a certain lattice in $\mathbb {Z}^n$ (which is very far from random), and the least squares approach coincides with Babai’s rounding algorithm for the approximate closest vector problem (CVP) when seen through that lens. As a side contribution, we also show that even with a much stronger CVP algorithm (including an exact CVP oracle), one cannot improve the number of samples necessary to recover $\mathbf { s}$ by more than a constant factor. And on another side note, we also consider alternate algorithms to least squares when very few samples are available (so that the underlying linear algebra system is not even full-rank), but the secret vector is known to be sparse. In that case, compressed sensing techniques using linear programming [14] can solve the problem efficiently.

After this theoretical analysis, we concretely examine the noisy linear algebra problem arising from the linear part of the BLISS rejection sampling leakage, and show that is strongly resembles an ILWE problem, which allows us to estimate the number of side-channel traces needed to recover the secret key.

Simulation results both for the vanilla ILWE problem and the BLISS attack are consistent with the theoretical predictions (only with better constants). In particular, we obtain a much more efficient attack on BLISS than the one in [19], which moreover applies to $100\%$ of possible secret keys. The only drawback is that our attack requires a larger number of traces (around 20000 compared to 512 in [19] for BLISS–I parameters), and even that is to a large extent counterbalanced by the fact that we can easily handle errors in the values read off from side-channel traces, whereas Espitau et al. need all their leakage values to be exact.

2 Preliminaries

2.1 Notation

For $r \in \mathbb {R}$, we denote by $\lceil r \rfloor $ the nearest integer to r (rounding down for half-integers), and by $\lfloor r \rfloor $ the largest integer less or equal to r. For a vector $\mathbf { x} = (x_1,\dots ,x_n)\in \mathbb {R}^n$, the p-norm $\Vert \mathbf { x}\Vert _p$ of $\mathbf { x}$, $p\in [1,\infty )$, is given by $\Vert \mathbf { x}\Vert _p = \big ( |x_1|^p + \cdots + |x_n|^p\big )^{1/p}$, and the infinity norm by $\Vert \mathbf { x}\Vert _\infty = \max \big (|x_1|,\dots ,|x_n|\big )$. For a matrix $A\in \mathbb {R}^{m\times n}$, the operator norm $\Vert A\Vert _p^\text {op}$ of A with respect to the p-norm, $p\in [1,\infty ]$, is given by:

$$\begin{aligned} \Vert A \Vert _p^\text {op}= \sup _{\mathbf { x}\in \mathbb {R}^n\setminus \{0\}} \frac{\Vert A\mathbf {x}\Vert _p}{\Vert \mathbf { x}\Vert _p} = \sup _{\Vert \mathbf { x}\Vert _p = 1} \Vert A\mathbf {x}\Vert _p. \end{aligned}$$

For any random variable X, we denote by $\mathbb {E}[X]$ its expectation and by ${{\mathrm{Var}}}(X) = \mathbb {E}[X^2] - \mathbb {E}[X]^2$ its variance. We write $X\sim \chi $ to denote that X follows the distribution $\chi $. If $\chi $ is a discrete distribution over some set S, then for any $s\in S$, we denote by $\chi (s)$ the probability that a sample from $\chi $ is equal to s. In particular, if $f:S\rightarrow \mathbb {R}$ is any function and $X\sim \chi $, we have:

$$\begin{aligned} \mathbb {E}[f(s)] = \sum _{s\in S} f(s) \cdot \chi (s). \end{aligned}$$

Similarly, the statistical distance $\varDelta (\chi ,\chi ')$ of two distributions $\chi ,\chi '$ over the set S is:

$$\begin{aligned} \varDelta (\chi ,\chi ') = \frac{1}{2}\sum _{s\in S} \big |\,\chi (s) - \chi (s')\big |. \end{aligned}$$

Let $\rho (x) = \exp (-\pi x^2)$ for all $x\in \mathbb {R}$. We define $\rho _{c,\sigma }(x) = \rho \big ((x-c)/\sigma \big )$ the Gaussian function of parameters $c,\sigma $. For any subset $S\subset \mathbb {R}$ such that the sum converges, we let:

$$\begin{aligned} \rho _{c,\sigma }(S) = \sum _{s\in S} \rho _{c,\sigma }(s). \end{aligned}$$

The discrete Gaussian distribution $D_{c,\sigma }$ centered at c and of parameter $\sigma $ is the distribution on $\mathbb {Z}$ defined by

$$\begin{aligned} D_{c,\sigma }(x) = \frac{\rho _{c,\sigma }(x)}{\rho _{c,\sigma }(\mathbb {Z})} = \frac{\exp \big (-\pi (x-c)^2/\sigma ^2\big )}{\rho _{c,\sigma }(\mathbb {Z})} \end{aligned}$$

for all $x\in \mathbb {Z}$. We omit the subscript c in $\rho _{c,\sigma }$ and $D_{c,\sigma }$ when $c=0$.

2.2 LWE over the Integers

It is possible to define a variant of the LWE problem “over the integers”, i.e. without modular reduction. We call this problem ILWE (“integer-LWE”), and define it as follows. The problem arising from the scalar product leakage in the BLISS rejection sampling is essentially of that form.

Definition 2.1

(ILWE Distribution). For any vector $\mathbf { s}\in \mathbb {Z}^n$ and any two probability distributions $\chi _a, \chi _e$ over $\mathbb {Z}$, the ILWE distribution $\mathscr {D}_{\mathbf { s},\chi _a, \chi _e}$ associated with those parameters (which we will simply denote $\mathscr {D}_{\mathbf { s}}$ for short when $\chi _a,\chi _e$ are clear) is the probability distribution over $\mathbb {Z}^n\times \mathbb {Z}$ defined as follows: samples from $\mathscr {D}_{\mathbf { s},\chi _a, \chi _e}$ are of the form

$$\begin{aligned} (\mathbf { a}, b) = \big (\mathbf { a}, \langle \mathbf { a},\mathbf { s}\rangle + e\big ) \quad \text {with} \quad \mathbf { a} \leftarrow \chi _a^n \text { and } e \leftarrow \chi _e. \end{aligned}$$

Definition 2.2

(ILWE Problem). The ILWE problem is the computational problem parametrized by $n,m,\chi _a,\chi _e$ in which, given m samples $\{ (\mathbf { a}_i,b_i) \}_{1\le i\le m}$ from a distribution of the form $\mathscr {D}_{\mathbf {s},\chi _a, \chi _e}$ for some $\mathbf { s}\in \mathbb {Z}^n$, one is asked to recover the vector $\mathbf { s}$.

2.3 Subgaussian Probability Distributions

In this paper, the distributions $\chi _a$, $\chi _e$ we will consider will usually be of mean zero and rapidly decreasing. More precisely, we will assume that those distributions are subgaussian. The notion of a subgaussian distribution was introduced by Kahane in [27], and can be defined as follows.

Definition 2.3

A random variable X over $\mathbb {R}$ is said to be $\tau $-subgaussian for some $\tau >0$ if the following bound holds for all $s\in \mathbb {R}$:

$$\begin{aligned} \mathbb {E}\big [\exp (sX)\big ] \le \exp \Big (\frac{\tau ^2 s^2}{2}\Big ). \end{aligned}$$

(2.1)

A $\tau $-subgaussian probability distribution is defined in the same way.

This section collects useful facts about subgaussian random variables; most of them are well-known, and presented mostly in the interest of a self-contained and consistent presentation (as definitions of related notions tend to vary slightly from one reference to the next).

For a subgaussian random variable X, there is a minimal $\tau $ such that X is $\tau $-subgaussian. This $\tau $ is sometimes called the subgaussian moment of the random variable (or of its distribution).

As expressed in the next lemma, subgaussian distributions always have mean zero, and their variance is bounded by $\tau ^2$.

Lemma 2.4

A $\tau $-subgaussian random variable X satisfies:

$$\begin{aligned} \mathbb {E}[X] = 0 \quad \text {and} \quad \mathbb {E}[X^2] \le \tau ^2. \end{aligned}$$

Proof

For s around zero, we have:

$$\begin{aligned} \mathbb {E}[\exp (sX)] = 1 + s\mathbb {E}[X] + \frac{s^2}{2} \mathbb {E}[X^2] + o(s^2). \end{aligned}$$

Since, on the other hand, $\exp (s^2\tau ^2/2) = 1 + \frac{s^2}{2} \tau ^2 + o(s^2)$, the result follows immediately from (2.1). $\square $

Many usual distributions over $\mathbb {Z}$ or $\mathbb {R}$ are subgaussian. This is in particular the case for Gaussian and discrete Gaussian distributions, as well as all bounded probability distributions with mean zero.

Lemma 2.5

The following distributions are subgaussian.

(i)
The centered normal distribution $\mathcal {N}(0,\sigma ^2)$ is $\sigma $-subgaussian.
(ii)
The centered discrete Gaussian distribution $D_{\sigma }$ of parameter $\sigma $ is $\frac{\sigma }{\sqrt{2\pi }}$-subgaussian for all $\sigma \ge 0.283$.
(iii)
The uniform distribution $\mathscr {U}_{\alpha }$ over the integer interval $[-\alpha ,\alpha ]\cap \mathbb {Z}$ is $\frac{\alpha }{\sqrt{2}}$-subgaussian for $\alpha \ge 3$.
(iv)
More generally, any distribution over $\mathbb {R}$ of mean zero and supported over a bounded interval [a, b] is $\big (\frac{b-a}{2}\big )$-subgaussian.

Moreover, in the cases (i)–(iii) above, the quotient $\tau \ge 1$ between the subgaussian moment and the standard deviation satisfies:

(i)
$\tau = 1$;
(ii)
$\tau < \sqrt{2}$ assuming $\sigma \ge 1.85$;
(iii)
$\tau \le \sqrt{3/2}$

respectively.

Proof

See the full version of this paper [10]. $\square $

The main property of subgaussian distributions is that they satisfy a very strong tail bound.

Lemma 2.6

Let X be a $\tau $-subgaussian distribution. For all $t>0$, we have

$$\begin{aligned} \Pr [ X > t ] \le \exp \Big (-\frac{t^2}{2\tau ^2}\Big ). \end{aligned}$$

(2.2)

Proof

Fix $t>0$. For all $s\in \mathbb {R}$ we have, by Markov’s inequality:

$$\begin{aligned} \Pr [ X>t ] = \Pr [\exp (sX) > e^{st}] \le \frac{\mathbb {E}[\exp (sX)]}{e^{st}} \end{aligned}$$

since the exponential is positive. Using the fact that X is $\tau $-subgaussian, we get:

$$\begin{aligned} \Pr [ X>t ] \le \exp \Big (\frac{s^2\tau ^2}{2} - st\Big ) \end{aligned}$$

and the right-hand side is minimal for $s=t/\tau ^2$, which exactly gives (2.2). $\square $

The following result states that a linear combination of independent subgaussian random variables is again subgaussian.

Lemma 2.7

Let $X_1, \dots , X_n$ be independent random variables such that $X_i$ is $\tau _i$-subgaussian. For all $\mu _1,\dots ,\mu _n\in \mathbb {R}$, the random variable $X = \mu _1 X_1 + \cdots + \mu _n X_n$ is $\tau $-subgaussian with:

$$\begin{aligned} \tau ^2 = \mu _1^2 \tau _1^2 + \cdots + \mu _n^2 \tau _n^2. \end{aligned}$$

Proof

Since the $X_i$’s are independent, we have, for all $s\in \mathbb {R}$:

$$\begin{aligned} \mathbb {E}[\exp (sX)]&= \mathbb {E}\Big [ \exp \big ( s(\mu _1X_1 + \cdots + \mu _nX_n) \big ) \Big ] \\ {}&= \mathbb {E}\Big [ \exp ( \mu _1s X_1) \cdots \exp (\mu _ns X_n) \Big ] = \prod _{i=1}^n \mathbb {E}\big [ \exp ( \mu _is X_i)\big ]. \end{aligned}$$

Now, since $X_i$ is $\tau _i$-subgaussian, we have

$$\begin{aligned} \mathbb {E}\big [ \exp ( \mu _is X_i)\big ] \le \exp \Big (\frac{s^2 (\mu _i \tau _i)^2}{2}\Big ) \end{aligned}$$

for all i. Therefore:

$$\begin{aligned} \mathbb {E}[\exp (sX)] \le \prod _{i=1}^n \exp \Big (\frac{s^2 (\mu _i \tau _i)^2}{2}\Big ) = \exp \Big (\frac{s^2 \tau ^2}{2}\Big ) \end{aligned}$$

with $\tau ^2 = \mu _1^2 \tau _1^2 + \cdots + \mu _n^2 \tau _n^2$ as required. $\square $

The previous result shows that the notion of a subgaussian random variable has a natural extension to higher dimensions.

Definition 2.8

A random vector $\mathbf { x}$ in $\mathbb {R}^n$ is called a $\tau $-subgaussian random vector if for all vectors $\mathbf { u}\in \mathbb {R}^n$ with $\Vert \mathbf { u}\Vert _2=1$, the inner product $\langle \mathbf { u}, \mathbf { x}\rangle $ is a $\tau $-subgaussian random variable.

It clearly follows from Lemma 2.7 that if $X_1,\dots ,X_n$ are independent $\tau $-subgaussian random variables, then the random vector $\mathbf { x} = (X_1,\dots ,X_n)$ is $\tau $-subgaussian. In particular, if $\chi $ is a $\tau $-subgaussian distribution, then a random vector $\mathbf { x}\sim \chi ^n$ is $\tau $-subgaussian. A nice feature of subgaussian random vectors is that the image of such a random vector under any linear transformation is again subgaussian.

Lemma 2.9

Let $\mathbf { x}$ be a $\tau $-subgaussian random vector in $\mathbb {R}^n$, and $A\in \mathbb {R}^{m\times n}$. Then the random vector $\mathbf { y} = A\mathbf { x}$ is $\tau '$-subgaussian, with $\tau ' = \Vert A^T\Vert _2^\text {op}\cdot \tau $.

Proof

Fix a unit vector $\mathbf { u}_0\in \mathbb {R}^m$. We want to show that the random variable $\langle \mathbf { u}_0, \mathbf { y}\rangle $ is $\tau '$-subgaussian. To do so, first observe that:

$$\begin{aligned} \langle \mathbf { u}_0, \mathbf { y}\rangle = \langle A^T \mathbf { u}_0, \mathbf { x}\rangle = \mu \langle \mathbf { u}, \mathbf { x}\rangle \end{aligned}$$

where $\mu = \Vert A^T \mathbf { u}_0\Vert _2$, and $\mathbf { u} = \frac{1}{\mu }A^T \mathbf { u}_0$ is a unit vector of $\mathbb {R}^n$. Since $\mathbf { x}$ is $\tau $-subgaussian, we know that the inner product $\langle \mathbf { u}, \mathbf { x}\rangle $ is a $\tau $-subgaussian random variable. As a result, by Lemma 2.7 in the trivial case of a single variable, we obtain that $\langle \mathbf { u}_0, \mathbf { y}\rangle = \mu \langle \mathbf { u}, \mathbf { x}\rangle $ is $\big (|\mu |\tau \big )$-subgaussian. But by definition of the operator norm, $|\mu |\le \Vert A^T\Vert _2^\text {op}$, and the result follows. $\square $

3 Information-Theoretic Analysis

A first natural question one can ask regarding the ILWE problem is how hard it is in an information-theoretic sense. In other words, given two vectors $\mathbf { s}, \mathbf { s}'\in \mathbb {Z}^n$, how close are the ILWE distributions $\mathscr {D}_{\mathbf { s}}, \mathscr {D}_{\mathbf { s}'}$ associated to $\mathbf { s}$ and $\mathbf { s}'$, or equivalently, how many samples do we need to distinguish between those distributions?

In this section, we show that, at least when the error distribution $\chi _e$ is either uniform or Gaussian, the statistical distance between $\mathscr {D}_{\mathbf { s}}$ and $\mathscr {D}_{\mathbf { s}'}$ admits a bound of the form $O\big (\frac{\sigma _a}{\sigma _e}\Vert \mathbf { s}-\mathbf { s}'\Vert \big )$. In particular, distinguishing between those distributions with constant success probability requires

$$\begin{aligned} \varOmega \left( \frac{1}{\Vert \mathbf { s}-\mathbf { s}'\Vert ^2} \Big (\frac{\sigma _e}{\sigma _a}\Big )^2\right) \end{aligned}$$

samples, and the distributions are statistically indistinguishable when $\sigma _e$ is superpolynomially larger than $\sigma _a$. To see this, we first give a relatively simple expression for the statistical distance.

Lemma 3.1

The statistical distance between $\mathscr {D}_{\mathbf { s}}$ and $\mathscr {D}_{\mathbf { s}'}$ is given by:

$$\begin{aligned} \varDelta (\mathscr {D}_{\mathbf { s}}, \mathscr {D}_{\mathbf { s}'}) = \mathbb {E}\big [\varDelta (\chi _e, \chi _e - \langle \mathbf { a},\mathbf {s}-\mathbf { s}'\rangle )\big ], \end{aligned}$$

where $\chi _e + t$ denotes the translation of $\chi _e$ by the constant t, and the expectation is taken over $\mathbf { a} \leftarrow \chi _a^n$.

Proof

By definition of the statistical distance, we have:

$$\begin{aligned} \varDelta (\mathscr {D}_{\mathbf { s}}, \mathscr {D}_{\mathbf { s}'}) = \frac{1}{2} \sum _{(\mathbf { a},b)\in \mathbb {Z}^{n+1}} \left| \Pr \big [(\mathbf { a},b)\leftarrow \mathscr {D}_{\mathbf {s}}\big ] - \Pr \big [(\mathbf { a},b)\leftarrow \mathscr {D}_{\mathbf { s}'}\big ] \right| . \end{aligned}$$

Now to sample from $\mathscr {D}_{\mathbf { s}}$, one first samples $\mathbf { a}$ according to $\chi _a^n$, independently sample e according to $\chi _e$, and returns $(\mathbf { a},b)$ with $b = \langle \mathbf { a},\mathbf { s}\rangle + e$. Therefore:

$$\begin{aligned} \Pr \big [(\mathbf { a},b)\leftarrow \mathscr {D}_{\mathbf { s}}\big ] = \chi _a^n(\mathbf { a})\cdot \chi _e(b - \langle \mathbf { a},\mathbf { s}\rangle ), \end{aligned}$$

and similarly for $\mathscr {D}_{\mathbf { s}'}$. Thus, we can write:

$$\begin{aligned} \varDelta (\mathscr {D}_{\mathbf { s}}, \mathscr {D}_{\mathbf { s}'})&= \frac{1}{2} \sum _{(\mathbf { a},b)\in \mathbb {Z}^{n+1}} \chi _a^n(\mathbf { a}) \cdot \left| \chi _e(b - \langle \mathbf { a},\mathbf { s}\rangle ) - \chi _e(b - \langle \mathbf { a},\mathbf {s}'\rangle ) \right| \\&= \sum _{\mathbf { a}\in \mathbb {Z}^n} \chi _a^n(\mathbf { a}) \cdot \frac{1}{2} \sum _{b\in \mathbb {Z}} \left| \chi _e(b - \langle \mathbf { a},\mathbf { s}\rangle ) - \chi _e(b - \langle \mathbf { a},\mathbf { s}'\rangle ) \right| \\&= \sum _{\mathbf { a}\in \mathbb {Z}^n} \chi _a^n(\mathbf { a}) \cdot \frac{1}{2} \sum _{x\in \mathbb {Z}} \left| \chi _e(x) - \chi _e(x + \langle \mathbf { a},\mathbf {s} - \mathbf { s}'\rangle ) \right| \end{aligned}$$

where the last equality is obtained with the change of variables $x = b - \langle \mathbf { a},\mathbf { s}\rangle $. We now observe that the expression

$$\begin{aligned} \frac{1}{2} \sum _{x\in \mathbb {Z}} \left| \, \chi _e(x) - \chi _e(x + \langle \mathbf { a},\mathbf {s} - \mathbf {s}'\rangle ) \right| \end{aligned}$$

is exactly the statistical distance $\varDelta (\chi _e, \chi _e - \langle \mathbf { a},\mathbf { s} - \mathbf { s}'\rangle )$, and therefore we do obtain:

$$\begin{aligned} \varDelta (\mathscr {D}_{\mathbf { s}}, \mathscr {D}_{\mathbf { s}'}) = \mathbb {E}\big [\varDelta (\chi _e, \chi _e - \langle \mathbf { a},\mathbf { s}-\mathbf { s}'\rangle )\big ] \end{aligned}$$

as required. $\square $

Thus, we can bound the statistical distance $\varDelta (\mathscr {D}_{\mathbf { s}}, \mathscr {D}_{\mathbf { s}'})$ using a bound on the statistical distance between $\chi _e$ and a translated distribution $\chi _e+t$. We provide such a bound when $\chi _e$ is either uniform in a centered integer interval, or a discrete Gaussian distribution.

Lemma 3.2

Suppose that $\chi _e$ is either the uniform distribution $\mathscr {U}_\alpha $ in $[-\alpha ,\alpha ]\cap \mathbb {Z}$ for some positive integer $\alpha $, or the centered discrete Gaussian distribution $D_{\sigma }$ with parameter $\sigma \ge 1.60$. In either case, let $\sigma _e = \sqrt{\mathbb {E}[\chi _e^2]}$ be the standard deviation of $\chi _e$. We then have the following bound for all $t\in \mathbb {Z}$:

$$\begin{aligned} \varDelta (\chi _e,\chi _e + t) \le C\cdot |t|/\sigma _e \end{aligned}$$

where $C=1/\sqrt{12}$ in the uniform case and $C=1/\sqrt{2}$ in the discrete Gaussian case.

Proof

See the full version of this paper [10]. $\square $

Combining Lemmas 3.1 and 3.2, we obtain a bound of the form announced at the beginning of this section.

Theorem 3.3

Suppose that $\chi _e$ is as in the statement of Lemma 3.2. Then, for any two vectors $\mathbf {s},\mathbf {s}'\in \mathbb {Z}^n$, the statistical distance between $\mathscr {D}_ {\mathbf {s}}$ and $\mathscr {D}_{\mathbf { s}'}$ is bounded as:

$$ \varDelta (\mathscr {D}_{\mathbf { s}}, \mathscr {D}_{\mathbf { s}'}) \le C\cdot \frac{\sigma _a}{\sigma _e}\Vert \mathbf { s}-\mathbf { s}'\Vert _2, $$

where C is the constant appearing in Lemma 3.2.

Proof

Lemma 3.1 gives:

$$ \varDelta (\mathscr {D}_{\mathbf { s}}, \mathscr {D}_{\mathbf { s}'}) = \mathbb {E}\big [\varDelta (\chi _e, \chi _e - \langle \mathbf { a},\mathbf {s}-\mathbf {s}'\rangle )\big ], $$

and according to Lemma 3.2, the statistical distance on the right-hand side is bounded as:

$$ \varDelta (\chi _e,\chi _e + \langle \mathbf { a},\mathbf { s}-\mathbf { s}'\rangle ) \le \frac{C}{\sigma _e}\cdot \big |\langle \mathbf { a},\mathbf { s}-\mathbf {s}'\rangle \big |. $$

It follows that:

$$ \varDelta (\mathscr {D}_{\mathbf { s}}, \mathscr {D}_{\mathbf { s}'}) \le \frac{C}{\sigma _e}\cdot \mathbb {E}\Big [ \big |\langle \mathbf { a},\mathbf {s}-\mathbf {s}'\rangle \big | \Big ] \le \frac{C}{\sigma _e}\sqrt{ \mathbb {E}\Big [ \langle \mathbf { a},\mathbf {s}-\mathbf {s}'\rangle ^2 \Big ] } $$

where the second inequality is a consequence of the Cauchy–Schwarz inequality. Now, for any $\mathbf { u}\in \mathbb {Z}^n$, we can write:

$$ \mathbb {E}\Big [ \langle \mathbf { a},\mathbf { u}\rangle ^2 \Big ] = \mathbb {E}\Big [ \sum _{1\le i,j\le n} u_i u_j a_i a_j \Big ] = \sum _{1\le i,j\le n} u_i u_j \mathbb {E}[ a_i a_j ] = \sigma _a^2 \Vert \mathbf { u} \Vert _2^2 $$

since $\mathbb {E}[a_i a_j] = \sigma _a^2\delta _{ij}$. As a result:

$$ \varDelta (\mathscr {D}_{\mathbf { s}}, \mathscr {D}_{\mathbf { s}'}) \le C\cdot \frac{\sigma _a}{\sigma _e}\Vert \mathbf { s} - \mathbf {s}'\Vert _2 $$

as required. $\square $

As discussed in the beginning of this section, this shows that distinguishing between $\mathscr {D}_{\mathbf {s}}$ and $\mathscr {D}_{\mathbf { s}'}$ requires $\varOmega \left( \frac{1}{\Vert \mathbf { s}-\mathbf {s}'\Vert ^2} \Big (\frac{\sigma _e}{\sigma _a}\Big )^2\right) $ samples. In particular, recovering $\mathbf { s}$ (which implies distinguishing $\mathscr {D}_{\mathbf { s}}$ from all $\mathscr {D}_{\mathbf { s}'}$ for $\mathbf { s}'\ne \mathbf { s}$) requires

$$\begin{aligned} m = \varOmega \big ( (\sigma _e/\sigma _a)^2 \big ) \end{aligned}$$

(3.1)

samples. In what follows, we will describe efficient algorithms that actually recover $\mathbf {s}$ from only slightly more samples than this lower bound.

Remark 3.4

Contrary to the results of the next section, which will apply to arbitrary subgaussian distributions, we cannot establish an analogue of Lemma 3.2 using only a bound on the tail of the distribution $\chi _e$. For example, if $\chi _e$ is supported over $2\mathbb {Z}$, then $\varDelta (\chi _e,\chi _e+t) = 1$ for any odd t! One would presumably need an assumption of the small-scale regularity of $\chi _e$ to extend the result.

4 Solving the ILWE Problem

We now turn to describing efficient algorithms to solve the ILWE problem. We are given m samples $(\mathbf { a}_i, b_i)$ from the ILWE distribution $\mathscr {D}_{\mathbf { s}}$, and try to recover $\mathbf { s}\in \mathbb {Z}^n$. Since $\mathbf { s}$ can a priori be any vector, we, of course, need at least n samples to recover it; indeed, even without any noise, fewer samples can at best reveal an affine subspace on which $\mathbf { s}$ lies, but not its actual value. We are thus interested in the regime when $m\ge n$.

The equation for $\mathbf { s}$ can then be written in matrix form:

$$\begin{aligned} \mathbf { b} = A\mathbf { s} + \mathbf { e} \end{aligned}$$

(4.1)

where $A\in \mathbb {Z}^{m\times n}$ is distributed according to $\chi _a^{m\times n}$, $\mathbf { e}\in \mathbb {Z}^m$ is distributed as $\chi _e^m$, $A, \mathbf { b}$ are known and $\mathbf { e}$ is unknown.

The idea to find $\mathbf { s}$ will be to use simple statistical inference techniques to find an approximate solution $\tilde{\mathbf { s}}\in \mathbb {R}^n$ of the noisy linear system (4.1) and to simply round that solution coefficient by coefficient to get a candidate $\lceil \tilde{\mathbf { s}} \rfloor = (\lceil \tilde{s}_1 \rfloor ,\ldots ,\lceil \tilde{s}_n \rfloor )$ for $\mathbf { s}$. If we can establish the bound:

$$\begin{aligned} \Vert \mathbf { s} - \tilde{\mathbf { s}} \Vert _\infty < 1/2 \end{aligned}$$

(4.2)

or, a fortiori, the stronger bound $\Vert \mathbf { s} - \tilde{\mathbf { s}}\Vert _2 < 1/2$, then it follows that $\lceil \tilde{\mathbf { s}} \rfloor = \mathbf { s}$ and the ILWE problem is solved.

The main technique we propose to use is least squares regression. Under the mild assumption that both $\chi _a$ and $\chi _e$ are subgaussian distributions, we will show that the corresponding $\tilde{\mathbf { s}}$ satisfies the bound (4.2) in the linear programming setting with high probability when m is sufficiently large. Moreover, the number m of samples necessary to establish those bounds, and hence solve ILWE, is only a $\log n$ factor larger than the information-theoretic minimum given in (3.1) (with the additional constraint that m should be a constant factor larger than n, to ensure that A is invertible and has well-controlled singular values).

We also briefly discuss lattice reduction as well as compressed sensing techniques based on linear programming. We show that even an exact-CVP oracle cannot significantly improve upon the $\log n$ factor of the least squares method. On the other hand, if the secret is known to be very sparse, compressed sensing techniques can recover the secret even in cases when $m < n$, where the least squares method is not applicable.

4.1 Least Squares Method

The first approach we consider to obtain an estimator $\tilde{\mathbf { s}}$ of $\mathbf { s}$ is the linear, unconstrained least squares method: $\tilde{\mathbf { s}}$ is chosen as a vector in $\mathbb {R}^n$ minimizing the squared Euclidean norm $\Vert \mathbf { b} - A \tilde{\mathbf { s}}\Vert _2^2$. In particular, the gradient vanishes at $\tilde{\mathbf { s}}$, which means that $\tilde{\mathbf {s}}$ is simply a solution to the linear system:

$$\begin{aligned} A^T A \tilde{\mathbf { s}} = A^T \mathbf {b}. \end{aligned}$$

As a result, we can compute $\tilde{\mathbf { s}}$ in polynomial time (at most $O(mn^2)$) and it is uniquely defined if and only if $A^T A$ is invertible.

It is intuitively clear that $A^T A$ should be invertible when m is large. Indeed, one can write that matrix as:

$$\begin{aligned} A^T A = \sum _{i=1}^m \mathbf { a}_i \mathbf { a}_i^T \end{aligned}$$

where the $\mathbf { a}_i$’s are the independent identically distributed rows of A, so the law of large numbers shows that $\frac{1}{m} A^T A$ converges almost surely to $\mathbb {E}\big [\mathbf { a} \mathbf { a}^T\big ]$ as $m\rightarrow +\infty $, where $\mathbf { a}$ is a random variable in $\mathbb {Z}^n$ sampled from $\chi _a^n$. We have:

$$ \mathbb {E}\big [ (\mathbf { a} \mathbf { a}^T)_{ij} \big ] = \mathbb {E}[ a_i a_j ] = \delta _{ij}\sigma _a^2, $$

and therefore we expect $A^T A$ to be close to $m\sigma _a^2 I_n$ for large m.

Making this heuristic argument rigorous is not entirely straightforward, however. Assuming some tail bounds on the distribution $\chi _a$, concentration of measure results can be used to prove that, with high probability, the smallest eigenvalue $\lambda _{\min }(A^T A)$ is not much smaller than $m\sigma _a^2$ (and in particular $A^T A$ is invertible) for m sufficiently large, with a concrete bound on m. This type of bound on the smallest eigenvalue is exactly what we will need in the rest of our analysis.

More precisely, when $\chi _a$ is bounded, one can apply a form of the so-called Matrix Chernoff inequality, such as [47, Corollary 5.2]. However, we would prefer a result that applies to e.g. discrete Gaussian distributions as well, so we only assume a subgaussian tail bound for $\chi _a$. Such result can be derived from the following lemma due to Hsu et al. [26, Lemma 2] (for simplicity, we specialize their statement to $\epsilon _0=1/4$ and to the case of jointly independent vectors).

Lemma 4.1

Let $\chi $ be a $\tau $-subgaussian distribution of variance 1 over $\mathbb {R}$, and consider m random vectors $\mathbf { x}_1,\dots ,\mathbf { x}_m$ in $\mathbb {R}^n$ sampled independently according to $\chi ^m$. For any $\delta \in (0,1)$, we have:

$$ \Pr \left[ \lambda _{\min }\Big (\frac{1}{m} \sum _{i=1}^m \mathbf { x}_i \mathbf { x}_i^T \Big )< 1 - \varepsilon (\delta ,m) \text { or } \lambda _{\max }\Big (\frac{1}{m} \sum _{i=1}^m \mathbf { x}_i \mathbf x_i^T \Big ) > 1 + \varepsilon (\delta ,m) \right] < \delta $$

where the error bound $\varepsilon (\delta ,m)$ is given by:

$$ \varepsilon (\delta ,m) = 4\tau ^2 \left( \sqrt{\frac{8\log 9\cdot n + 8\log (2/\delta )}{m}} + \frac{\log 9\cdot n + \log (2/\delta )}{m} \right) . $$

Using this lemma, one can indeed show that for $\chi _a$ subgaussian, $\lambda _{\min }(A^T A)$ is within an arbitrarily small factor of $m\sigma _a^2$ with probability $1-2^{-\eta }$ for $m=\varOmega (n+\eta )$ (and similarly for $\lambda _{\max }$).

Theorem 4.2

Suppose that $\chi _a$ is $\tau _a$-subgaussian, and let $\tau =\tau _a/\sigma _a$. Let A be an $m\times n$ random matrix sampled from $\chi _a^{m\times n}$. There exist constants $C_1, C_2$ such that for all $\alpha \in (0,1)$ and $\eta \ge 1$, if $m \ge (C_1 n + C_2 \eta )\cdot (\tau ^4/\alpha ^2)$ then

$$\begin{aligned} \Pr \Big [ \lambda _{\min }\big (A^T A\big )< (1-\alpha )\cdot m\sigma _a^2 \text { or } \lambda _{\max }\big (A^T A\big ) > (1+\alpha )\cdot m\sigma _a^2 \Big ] < 2^{-\eta }. \end{aligned}$$

(4.3)

Furthermore, one can choose $C_1 = 2^8\log 9$ and $C_2 = 2^9\log 2$.

Proof

Let $\mathbf { a}_i$ be the i-th row of A, and $\mathbf { x}_i = \frac{1}{\sigma _a}\mathbf { a}_i$. Then the coefficients of $\mathbf { x}_i$ follow a $\tau $-subgaussian distribution of variance 1, and every coefficient of any of the $\mathbf {x}_i$ is independent from all the others, so the $\mathbf {x}_i$’s satisfy the hypotheses of Lemma 4.1. Now:

$$\begin{aligned} \frac{1}{m} \sum _{i=1}^m \mathbf { x}_i \mathbf x_i^T = \frac{1}{m\sigma _a^2} \sum _{i=1}^m \mathbf { a}_i \mathbf a_i^T = \frac{1}{m\sigma _a^2} A^T A. \end{aligned}$$

Therefore, Lemma 4.1 shows that:

$$ \Pr \Big [ \lambda _{\min }\big (A^T A\big )< \big (1-\varepsilon (2^{-\eta },m)\big ) \cdot m\sigma _a^2 \text { or } \lambda _{\max }\big (A^T A\big ) > \big (1+\varepsilon (2^{-\eta },m)\big ) \cdot m\sigma _a^2 \Big ] < 2^{-\eta } $$

with $\varepsilon (\delta ,m)$ defined as above. Thus, to obtain (4.3), it suffices to take m such that $\varepsilon (2^{-\eta },m) \le \alpha $.

The value $\varepsilon (\delta ,m)$ can be written as $4\tau ^2\cdot (\sqrt{8\rho }+\rho )$ where $\rho =\big (\log 9\cdot n + \log (2/\delta )\big )/m$. For the choice of m in the statement of the theorem, we necessarily have $\rho < 1$ since $\sigma _a \le \tau _a$, and hence $\tau ^4 \ge 1$. As a result, $\varepsilon (\delta ,m) \le 16\tau ^2\cdot \sqrt{\rho }$. Thus, to obtain the announced result, it suffices to choose:

$$ m \ge \frac{2^8 \tau ^4}{\alpha ^2} \Big (\log 9\cdot n + \log 2^{1+\eta } \Big ), $$

which concludes the proof. $\square $

Remark 4.3

The ratio $\tau $ between the subgaussian moment $\tau _a$ of $\chi _a$ and the actual standard deviation $\sigma _a$ is typically small (e.g. 1 for Gaussians, $\sqrt{3}$ for uniform distributions in a centered interval, etc.), so it isn’t the important factor in the theorem.

The asymptotic bound saying that $m=\varOmega \big ((n+\eta )/\alpha ^2\big )$ suffices to ensure that $\lambda _{\min }(A^T A)$ is within a factor $\alpha $ of the limit $m\sigma _a^2$ is a satisfactory result, but the implied constant in our theorem is admittedly rather large. This is an artifact of our reliance on Hsu et al.’s lemma. A more refined analysis is carried out by Litvak et al. in [32], and can in principle be used to reduce the constant $C_1$ in our theorem to $1+o(1)$ for sufficiently large n. The authors omit concrete constants, however, and making [32, Theorem 3.1] explicit is nontrivial.

From now on, let us suppose that the assumptions of Theorem 4.2 are satisfied for some $\alpha \in (0,1)$, and $\eta $ equal to the “security parameter”. In particular, $A^T A$ is invertible with overwhelming probability, and we can thus write:

$$\begin{aligned} \tilde{\mathbf {s}} = (A^T A)^{-1}\cdot A^T \mathbf { b}. \end{aligned}$$

As discussed in the beginning of this section, we would like to bound the distance between the estimator $\tilde{\mathbf {s}}$ and the actual solution $\mathbf { s}$ of the ILWE problem in the infinity norm, so as to obtain an inequality of the form (4.2). Since by definition $\mathbf { b} = A\mathbf {s} + \mathbf { e}$, we have:

$$\begin{aligned} \tilde{\mathbf {s}} - \mathbf { s} = (A^T A)^{-1}\cdot A^T \big (A\mathbf { s} + \mathbf {e}\big ) - \mathbf { s} = (A^T A)^{-1}\cdot A^T \mathbf { e} = M\mathbf { e}, \end{aligned}$$

where M is the matrix $(A^T A)^{-1}\cdot A^T$. Now suppose that all the coefficients of $\mathbf { e}$ are $\tau _e$-subgaussian. Since they are also independent, the vector $\mathbf { e}$ is a $\tau _e$-subgaussian random vector in the sense of Definition 2.8. Therefore, it follows from Lemma 2.9 that $\tilde{\mathbf {s}} - \mathbf { s} = M\mathbf {e}$ is $\tilde{\tau }$-subgaussian, where:

$$\begin{aligned} \tilde{\tau }&= \Vert M^T\Vert _2^\text {op}\cdot \tau _e = \tau _e \sqrt{\lambda _{\max }(MM^T)} = \tau _e \sqrt{\lambda _{\max }\big ((A^T A)^{-1}A^T\cdot A(A^T A)^{-1}\big )} \\&= \tau _e \sqrt{\lambda _{\max }\big ((A^T A)^{-1}\big )} = \frac{\tau _e}{\sqrt{\lambda _{\min }(A^T A)}}. \end{aligned}$$

As a result, under the hypotheses of Theorem 4.2, $\tilde{\mathbf {s}} - \mathbf { s}$ is a $\frac{\tau _e}{\sigma _a\sqrt{(1-\alpha )m}}$-subgaussian random vector, except with probability at most $2^{-\eta }$ on the randomness of the matrix A.

This bound on the subgaussian moment can be used to derive a bound with high probability on the infinity norm as follows.

Lemma 4.4

Let $\mathbf { v}$ be a $\tau $-subgaussian random vector in $\mathbb {R}^n$. Then:

$$\begin{aligned} \Pr \big [ \Vert \mathbf { v}\Vert _\infty > t \big ] \le 2n\cdot \exp \Big (-\frac{t^2}{2\tau ^2}\Big ). \end{aligned}$$

Proof

If we write $\mathbf { v} = (v_1,\dots ,v_n)$, we have $\Vert \mathbf { v}\Vert _\infty = \max (v_1,\dots ,v_n,-v_1,\dots ,-v_n)$. Therefore, the union bound shows that:

$$\begin{aligned} \Pr \big [ \Vert \mathbf { v}\Vert _\infty> t \big ] \le \sum _{i=1}^n \Pr [ v_i> t ] + \Pr [ -v_i > t ]. \end{aligned}$$

(4.4)

Now each of the random variables $v_1,\dots ,v_n,-v_1,\dots ,-v_n$ can be written as the scalar product of $\mathbf { v}$ with a unit vector of $\mathbb {R}^n$. Therefore, they are all $\tau $-subgaussian. If X is one of them, the subgaussian tail bound of Lemma 2.6 shows that $\Pr [ X > t ] \le \exp \big (-\frac{t^2}{2\tau ^2}\big )$. Combined with (4.4), this gives the desired result. $\square $

This is all we need to establish a sufficient condition for the least squares approach to return the correct solution to the ILWE problem with good probability.

Theorem 4.5

Suppose that $\chi _a$ is $\tau _a$-subgaussian and $\chi _e$ is $\tau _e$-subgaussian, and let $(A,\mathbf { b} = A\mathbf { s} + \mathbf { e})$ the data constructed from m samples of the ILWE distribution $\mathscr {D}_{\mathbf { s},\chi _a,\chi _e}$, for some $\mathbf { s} \in \mathbb {Z}^n$. There exist constants $C_1, C_2 > 0$ (the same as in the hypotheses of Theorem 4.2) such that for all $\eta \ge 1$, if:

$$ m \ge 4\frac{\tau _a^4}{\sigma _a^4}(C_1 n + C_2 \eta ) \quad \text {and} \quad m \ge 32\frac{\tau _e^2}{\sigma _a^2}\log (2n) $$

then the least squares estimator $\tilde{\mathbf { s}} = (A^T A)^{-1} A^T\mathbf { b}$ satisfies $\Vert \mathbf { s}-\tilde{\mathbf { s}}\Vert _\infty < 1/2$, and hence $\lceil \tilde{\mathbf { s}} \rfloor = \mathbf { s}$, with probability at least $1-\frac{1}{2n}-2^{-\eta }$.

Proof

Applying Theorem 4.2 with $\alpha =1/2$ and the same constants $C_1,C_2$ as introduced in the statement of that theorem, we obtain that for $m \ge \frac{\tau _a^4}{\sigma _a^4}(4C_1 n + 4C_2 \eta )$, we have

$$\begin{aligned} \Pr \Big [ \lambda _{\min }\big (A^T A\big )< m\sigma _a^2/2 \Big ] < 2^{-\eta }. \end{aligned}$$

(4.5)

Therefore, except with probability at most $2^{-\eta }$, we have $\lambda _{\min }\big (A^T A\big ) \ge m\sigma _a^2/2$. We now assume that this condition is satisfied.

We have shown above that $\tilde{\mathbf { s}}-\mathbf { s}$ is a $\tilde{\tau }$-subgaussian random vector with $\tilde{\tau } = \tau _e/\sqrt{\lambda _{\min }(A^T A)}$. Applying Lemma 4.4 with $t=1/2$, we therefore have:

$$\begin{aligned} \Pr \big [ \Vert \tilde{\mathbf { s}}-\mathbf s\Vert _\infty > \frac{1}{2} \big ]&\le 2n\cdot \exp \Big (-\frac{1}{8\tilde{\tau }^2}\Big ) \le 2n\cdot \exp \Big (-\frac{\lambda _{\min }(A^T A)}{8\tau _e^2}\Big ) \\&\le \exp \Big (\log (2n)-\frac{m\sigma _a^2}{16\tau _e^2}\Big ). \end{aligned}$$

Thus, if we assume that $m\ge 32\frac{\tau _e^2}{\sigma _a^2}\log (2n)$, it follows that:

$$\begin{aligned} \Pr \big [ \Vert \tilde{\mathbf { s}}-\mathbf { s}\Vert _\infty > \frac{1}{2} \big ] \le \exp \big (\log (2n) - 2\log (2n)\big ) = \frac{1}{2n}. \end{aligned}$$

This concludes the proof. $\square $

In the typical case when $\tau _a$ and $\tau _e$ are no more than a constant factor larger than $\sigma _a$ and $\sigma _e$, Theorem 4.5 with $\eta = \log (2n)$ says that there are constants $C, C'$ such that whenever

$$\begin{aligned} m \ge Cn \qquad \text {and}\qquad m\ge C'\cdot \frac{\sigma _e^2}{\sigma _a^2}\log n \end{aligned}$$

(4.6)

one can solve the ILWE problem with m samples with probability at least $1-1/n$ by rounding the least squares estimator. The first condition ensures that $A^T A$ is invertible and to control its eigenvalues: a condition of that form is clearly unavoidable to have a well-defined least squares estimator. On the other hand, the second condition gives a lower bound of the form (3.1) on the required number of samples; we see that this bound is only a factor $\log n$ worse than the information-theoretic lower bound, which is quite satisfactory.

We also note that the cost of this approach is equal to the complexity of computing $(A^T A)^{-1} A^T \mathbf { b}$, hence at most $O(n^2 \cdot m)$. This is quite efficient in practice (see Sect. 6 for concrete timings). In practice, arithmetic operations can be implemented using standard floating point instructions, since the almost scalar nature of $A^T A$ ensures that the computations are numerically very stable.

4.2 An Exact-CVP Oracle Will Not Help

One can interpret this approach to solving ILWE by computing a least squares estimator and rounding it as an application of Babai’s rounding algorithm for the closest vector problem (CVP).

More precisely, consider the sublattice $L = A^T A \cdot \mathbb {Z}^n$ of $\mathbb {Z}^n$, which is full-rank when $A^T A$ is invertible (i.e. m large enough). Then, the ILWE problem can be seen as the problem of recovering the lattice vector $\mathbf { v} = A^T A\mathbf { s}\in L$ given the close vector $A^T \mathbf { b} = \mathbf v + A^T\mathbf { e}$ (which is essentially an instance of bounded distance decoding in L). Closeness in this setting is best measured in terms of the infinity norm. Now, since for large m, the matrix $A^T A$ is almost scalar, and hence the corresponding lattice basis of L is somehow already reduced, one can try to solve this problem by applying a CVP algorithm like Babai rounding directly on this basis. It is easy to see that this approach is identical to our least squares approach.

One could ask whether applying another CVP algorithm such as Babai’s nearest plane algorithm could allow solving the problem with asymptotically fewer samples (e.g. reduce the $\log n$ factor in (4.6)). The answer is no. In fact, a much stronger result holds: one cannot improve Condition (4.6) using that strategy even given access to an exact-CVP oracle for any p-norm, $p\in [2,\infty ]$. Given such an oracle, the secret vector $\mathbf { v}$ can be recovered uniquely if and only if the vector of noise $A^T\mathbf { e}$ lies in a ball centered on $\mathbf { v}$ and of radius half the first minimum of L in the p-norm, $\lambda _1^{(p)}(L) = \min _{x\in L} \Vert x\Vert _p$, that is:

$$\begin{aligned} \Vert A^T\mathbf { e}\Vert _p \le \frac{\lambda _1^{(p)}(L)}{2}. \end{aligned}$$

(4.7)

To take advantage of this condition, we need to get sufficiently precise estimates of both sides.

Estimation of the First Minimum. Due to the quasi-scalar shape of the matrix $A^TA$, one can estimate accurately the $\lambda _1^{(p)}(L)$. Indeed, $A^TA$ has a low orthogonality defect, so that it is in a sense already reduced. Hence, the shortest vector of this basis constitutes a very good approximation of the shortest vector of L.

Lemma 4.6

Suppose that $\chi _a$ is $\tau _a$-subgaussian, and let $\tau =\tau _a/\sigma _a$. Let A be an $m\times n$ random matrix sampled from $\chi _a^{m\times n}$. Let L be the lattice generated by the rows of the matrix $A^TA$. There exist constants $C_1, C_2$ (the same as in Theorem 4.2) such that for all $\alpha \in (0,1)$, $p\ge 2$ and $\eta \ge 1$, if $m \ge (C_1 n + C_2 \eta )\cdot (\tau ^4/\alpha ^2)$ then

$$\begin{aligned} \Pr \Big [ \lambda _{1}^{(p)}(L)\big (A^T A\big )> m\sigma _a^2(1+\alpha )\Big ] \le 2^{-\eta }. \end{aligned}$$

(4.8)

Proof

Remark first that by norm equivalence in finite dimension, $\mathbf { x}\in \mathbb {R}^n$ we have $\Vert \mathbf { x}\Vert _p \le \Vert \mathbf { x}\Vert _2$ so that $\lambda _1^{(p)} (L)\le \lambda _1^{(2)}(L)$, this bound being actually sharp. Without loss of generality it then suffices to prove the result in 2-norm. From Theorem 4.2, we can assert that except with probability at most $2^{-\eta }$, $\Vert A^TA\Vert _2^\text {op} \le m\sigma _a^2(1+\alpha )$; for any integral vector $\mathbf { x}\in \mathbb {Z}^n$ we therefore have by definition of the operator norm:

$$\begin{aligned} \Vert A^TA\mathbf { x}\Vert _2 \le m\sigma _a^2\Vert \mathbf {x}\Vert _2 (1+\alpha ). \end{aligned}$$

In particular, for any $\mathbf { x}\in \mathbb {Z}^n$ of unit 2-norm, $\lambda _1^{(2)}(L) \le \Vert A^TA\mathbf { x}\Vert _2 \le (1+\alpha )m\sigma _a^2$. $\square $

Estimation of the ${\varvec{p}}$-norm of ${\varvec{A}}^{\varvec{T}}\mathbf { e}$. Suppose that $\chi _e$ is a centered Gaussian distribution of standard deviation $\sigma _e$. The distribution of $A^T\mathbf {e}$ for $\mathbf { e} \sim \chi _e^n$ is then a Gaussian distribution of covariance matrix $\sigma _e^2 A^T A \approx m\sigma _a^2 \sigma _e^2 I_n$. We deal with the cases $p=\infty $ and $p\le \infty $ separately.

Case $p<\infty $: The expected p-th power of the p-norm of $A^T\mathbf { e}$ satisfies:
$$ \mathbb {E}\Big [ \Vert A^T\mathbf { e}\Vert _p^p \Big ] = n\mathbb {E}[x^p] = n(2m)^{p/2}\sigma _e^{p}\sigma _a^{p} \cdot \frac{\varGamma \left( \frac{p}{2}+\frac{1}{2} \right) }{\sqrt{\pi }}, $$
where x is drawn under the centered gaussian distribution of variance $m\sigma _e^2\sigma _a^2$, and $\varGamma $ is classically the Euler’s Gamma function. But by the partial converse of Jensen’s inequality for norms of Stadje [44] we have:
$$ \mathbb {E}\Big [ \Vert A^T\mathbf { e}\Vert _p^p \Big ] \le 2^p\varGamma \left( \frac{p}{2}+\frac{1}{2}\right) {\sqrt{\pi }}^{(p-1)}\mathbb {E}\Big [ \Vert A^T\mathbf { e}\Vert _p \Big ]^p $$
so that:
$$ n^{1/p}\sigma _e\sigma _a \sqrt{\frac{m}{2\pi }}\le \mathbb {E}\Big [ \Vert A^T\mathbf { e}\Vert _p \Big ] $$
Case $p=\infty $: The estimate is obtained by the order statistic theory of Gaussian distributions (see e.g. [42]):
$$ \begin{aligned} C_\infty \sigma _e\sigma _a\sqrt{m\log n} \le \mathbb {E}\Big [ \Vert A^T\mathbf { e}\Vert _\infty \Big ] \end{aligned}, $$
where $C_\infty = \frac{3}{2}\left( 1-\frac{1}{e}\right) -\frac{1}{\sqrt{2\pi }} \approx 0.23$

Now that we have access to the expected value of the random variable $\Vert A^T\mathbf { e}\Vert _p$, we are going to use the concentration of its distribution around its expected value. Explicitly by the random version of Dvoretzky’s theorem proven in [39], there exist absolute constants $K,c>0$ such that for any $0<\varepsilon <1$:

$$\begin{aligned} \Pr \Big [ \Big |A^T\mathbf { e} - \mathbb {E}\Big [ \Vert A^T\mathbf { e}\Vert _p \Big ]\Big | > \varepsilon \mathbb {E}\Big [ \Vert A^T\mathbf { e}\Vert _p \Big ] \le K e^{-c\beta (n,p,\varepsilon )} \end{aligned}$$

(4.9)

with

$$ \beta (m,p,\varepsilon ) = {\left\{ \begin{array}{ll} \varepsilon ^2n &{}\text {if}\,\, 1<p\le 2\\ \max (\min (2^{-p}\varepsilon ^2 n , (\varepsilon n)^{2/p}), \varepsilon p n^{2/p}) &{}\text {if} \,\,2<p\le c_0\log n\\ \varepsilon \log n &{}\text {if}\,\, p>c_0\log n\\ \end{array}\right. }, $$

for $0<c_0<1$ a fixed absolute constant.

Summing Up. Taking $\varepsilon = 1/2$ in (4.9) ensures that, except with probability $Ke^{-c\beta (n,p,1/2)}$,

$$\begin{aligned} \frac{1}{2}\mathbb {E}\Big [ \Vert A^T\mathbf { e}\Vert _p \Big ] \le \Vert A^Te\Vert _p\le \frac{3}{2}\mathbb {E}\Big [ \Vert A^T\mathbf { e}\Vert _p \Big ]. \end{aligned}$$

(4.10)

For any fixed p, the probability can be made as small as desired for large enough n. We can therefore assume that (4.10) occurs with probability at least $1-\delta $ for some small $\delta >0$.

In that case, Condition (4.7) asserts that if $\mathbb {E}\Big [ \Vert A^T\mathbf { e}\Vert _p \Big ] > \lambda _1^{(p)}(L)$ then $\mathbf { s}$ can’t be decoded uniquely in L. Now using the result of Lemma 4.6 with $\alpha =1/2$ and the previous estimates, we know that this is the case when:

$$ n^{1/p}\sigma _e\sigma _a\sqrt{\frac{m}{2\pi }} > \frac{3}{2} m\sigma _a^2, \qquad \text {that is,}\,\, m<\left( \frac{\sigma _e}{\sigma _a}\right) ^2\frac{2n^{2/p}}{9\pi }, $$

when $p<\infty $, and

$$\begin{aligned} 0.23\sigma _e\sigma _a\sqrt{m\log n} > \frac{3}{2}m\sigma _a^2,\qquad \text {that is,}\,\, m<0.02\left( \frac{\sigma _e}{\sigma _a}\right) ^2\log n, \end{aligned}$$

otherwise. In both cases, it follows that we must have $m = \varOmega \big ((\sigma _e/\sigma _a)^2\log n\big )$ for the CVP algorithm to output the correct secret with probability $>\delta $. Thus, this approach cannot improve upon the least squares bound 4.5 by more than a constant factor.

4.3 Sparse Secret and Compressed Sensing

Up until this point, we have supposed that the number m of samples we have access to is greater than the dimension n. Indeed, without additional information on the secret $\mathbf { s}$, this condition is necessary to get a well-defined solution to the ILWE problem even without noise.

Suppose however that the secret $\mathbf { s}$ is known to be sparse, with only a small number $S\ll n$ of non zero coefficients. Even if the positions of these non zero coefficients are not known, knowledge of the sparsity S may help in determining the secret, possibly even with fewer samples than the ambient dimension n with the sole additional knowledge of its sparsity (though of course more than S samples are necessary!). Such a recovery is made possible by compressed sensing techniques, epitomized by the results of Candes and Tao in [14]. The idea is once again to find an estimator $\tilde{\mathbf {s}} $ such that the infinity norm $\Vert \tilde{\mathbf {s}}-\mathbf {s}\Vert _\infty $ is small enough to fully recover the secret $\mathbf { s}$ from it. This can be done with the Dantzig selector introduced in [14], and efficiently computable as a solution $\tilde{\mathbf {s}} = (\tilde{s}_1,\dots ,\tilde{s}_n)$ of the following linear program with 2n unknowns $\tilde{s}_i, \tilde{u}_i$, $1\le i\le n$:

$$\begin{aligned} \begin{aligned} \min {\sum _{i=1}^n u_i} \qquad \text {such that}\qquad -u_i \le \tilde{s} _i\le u_i \qquad \text {and}\\ -\sigma _e\sigma _a\sqrt{2m \log n} \le \big [AA^T(A^T\mathbf {b} - A^TA\tilde{\mathbf { s}})\big ]_i \le \sigma _e\sigma _a\sqrt{2m\log n}. \end{aligned} \end{aligned}$$

(4.11)

In the case when the distributions $\chi _e$ and $\chi _a$ are Gaussian distributions of respective standard deviations $\sigma _e$ and $\sigma _e$, the quality of the output of the program defined by (4.11) is quantified as follows.

Theorem 4.7

(adapted from [14]). Suppose $\mathbf {s} \in \mathbb {Z}^n$ is any S-sparse vector so that $\log (m\sigma _a^2/n) S\le m$ Then with large probability, $\tilde{\mathbf {s}}$ obeys the relation

$$\begin{aligned} \Vert \tilde{\mathbf {s}}-\mathbf {s}\Vert _2^2 \le 2C_1^2 S\log n \left( \frac{\sigma _e}{\sqrt{m}\sigma _a}\right) ^2 \end{aligned}$$

(4.12)

for some constant $C_1 \approx 4$.

Hence as before, if $\Vert \tilde{\mathbf {s}}-\mathbf {s}\Vert _2^2\le 1/4$, we have $\Vert \tilde{\mathbf {s}}-\mathbf {s}\Vert _\infty \le 1/2$ and one can then decode the coefficients of $\mathbf { s}$ by rounding $\tilde{\mathbf { s}}$. This is satisfied with high probability as soon as:

$$\begin{aligned} 2 C_1^2 \frac{S\log n}{m}\left( \frac{\sigma _e}{\sigma _a}\right) ^2 \le \frac{1}{4}. \end{aligned}$$

Since we aim at solving the ILWE problem in parsimonious sample setting, where $m < n$ we deduce that the compressed sensing methodology can be successfully applied when

$$\begin{aligned} S \le \frac{n}{8C_1^2\log {n}}\left( \frac{\sigma _a}{\sigma _e}\right) ^2. \end{aligned}$$

(4.13)

Table 1. Maximum value of the ratio $\sigma _e/\sigma _a$ to recover a S sparse secret in dimension n with the Dantzig selector

Full size table

Let us discuss the practicality of this approach with regards to the parameters of the ILWE problem. First of all, note that in order to make Condition (4.13) non-vacuous, one needs $\sigma _e$ and $\sigma _a$ to satisfy:

$$\begin{aligned} 2C_1 \sqrt{\frac{2\log n}{n}} \le \frac{\sigma _a}{\sigma _e} \le 2C_1 \sqrt{2\log n}, \end{aligned}$$

where the lower bound follows from the fact that S is a positive integer, and the upper bound from the observation that the right-hand side of (4.13) must be smaller than n to be of any interest compared to the trivial bound $S\le n$. Practically speaking, this means that this approach is only interesting when the ratio $\sigma _e/\sigma _a$ is relatively small; concrete bounds are provided in Table 1 various sparsity levels and dimensions ranging from 128 to 2048.

We note that the required sparsity is much higher than proposed parameters for BLISS, for example. Moreover, the complexity of this linear programming based approach is worse than least squares regression. However, only this method is applicable when only $m < n$ samples are available.

5 Application to the Side-Channel Attack of BLISS

5.1 BLISS Signatures and Rejection Sampling Leakage

The BLISS signature scheme [17] is a lattice-based signature scheme based on the Ring-Learning With Error (RLWE) assumption. Its signing algorithm is recalled in Fig. 1.

The Rejection Sampling. The BLISS signature scheme follows the “Fiat–Shamir with aborts” paradigm of Lyubashevsky [35]. In particular, signature generation involves a rejection sampling step (Step 8 of function Sign in Fig. 1) which is essential for security: in order to ensure that the distribution of signatures is independent of the secret key $\mathbf { s} = (\mathbf { s}_1,\mathbf {s}_2)$, a signature candidate $\big (\mathbf { z}=(\mathbf { z}_1,\mathbf { z}_2),\mathbf { c}\big )$ should be kept with probability

$$\begin{aligned} 1 \Bigg / \Bigg (M\exp \bigg ( -\frac{\Vert \mathbf {sc}\Vert ^2}{2\sigma ^2}\bigg )\cosh \bigg (\frac{\langle \mathbf {z}, \mathbf {sc}\rangle }{\sigma ^2}\bigg ) \Bigg ). \end{aligned}$$

Since it would be impractical to directly compute this expression involving transcendental functions with sufficient precision, all existing implementations of BLISS [18, 41, 45] rely instead on the iterated Bernoulli trials technique described in [17, Sect. 6]. A signature $(\mathbf { z},\mathbf { c})$ is kept if the function calls $\textsc {SampleBernExp}(x_{\exp })$ and $\textsc {SampleBernCosh}(x_{\cosh })$ both return 1, where functions $\textsc {SampleBernExp}$ and $\textsc {SampleBernCosh}$ are described in Fig. 2 and the values $x_{\exp }, x_{\cosh }$ are given respectively by $x_{\exp } = \log M - \Vert \mathbf { s}\mathbf { c}\Vert ^2$ and $x_{\cosh }= 2\cdot \langle \mathbf {z},\mathbf {s}\mathbf {c}\rangle $.

Side-Channel Leakage of the Rejection Sampling. Based on their description in Fig. 2, it is clear that $\textsc {SampleBernExp}$ and $\textsc {SampleBernCosh}$ do not run in constant time. In fact, they iterate over the bits of their input, and part of the code is executed when the bit is 1 and skipped over when the bit is 0. As a result, as observed by Espitau et al. [19, Sect. 3], the inputs $x_{\exp }, x_{\cosh }$ of these functions can be read off directly on a trace of power consumption or electromagnetic emanations, in much the same way as naive square-and-multiply implementations of RSA leak the secret exponent via simple power analysis [28, Sect. 3.1]. As a result, side-channel analysis allows to reliably recover the squared norm $\Vert \mathbf { s} \mathbf { c}\Vert ^2 = \Vert \mathbf { s}_1\mathbf { c}\Vert ^2 + \Vert \mathbf { s}_2\mathbf { c}\Vert ^2$ and the scalar product $\langle \mathbf {z},\mathbf {s}\mathbf {c}\rangle = \langle \mathbf {z}_1,\mathbf {s}_1\mathbf {c}\rangle + \langle \mathbf {z}_2,\mathbf {s}_2\mathbf {c}\rangle $ from generated signatures.

Espitau et al. show that the norm leakage can be leveraged in practice to recover the secret key from a little over $\bar{n}$ signature traces, where $\bar{n}$ is the extension degree of the ring $\mathcal {R}$ ($\bar{n}=512$ for the most common parameters). However, the recovery technique is mathematically quite involved and computationally costly (it is based on the Howgrave-Graham–Szydlo solution to cyclotomic norm equations [25], and takes over a month of CPU time for typical parameters). More importantly, it has the major drawback of relying on the ability to factor this norm and thus only applying to “weak” signing keys satisfying a certain semismoothness condition (around $7\%$ of BLISS secret keys).

It is natural to think that the scalar product leakage, which is linear rather than quadratic in the secret key, is a more attractive target to attack. And indeed, Espitau et al. point out that in a simplified version of BLISS where $\mathbf { z}_2$ is returned in full as part of signatures, it is very easy to recover the secret key from about $2\bar{n}$ side-channel traces using elementary linear algebra. However, in the actual BLISS scheme, the element $\mathbf { z}_2$ is returned in a compressed form $\mathbf { z}_2^\dag $, so that the linear system arising from scalar product leakage is noisy. Solving this linear system amounts to solving a problem analogous to LWE [43] in dimension about $2\bar{n}$, which leads Espitau et al. to conclude that this approach is unlikely to be helpful. In doing so, however, they overlook a crucial difference between standard LWE and the problem that actually arises in this way, namely the lack of modular reduction.

5.2 Description of the Attack

As we have mentioned already, recovering the secret $\mathbf { s}\in \mathbb {Z}^{2\bar{n}} = \mathbb {Z}^n$ from the linear leakage $\langle \mathbf { z}, \mathbf { s}\mathbf { c}\rangle $ essentially amounts to an instance of the ILWE problem. We now describe more precisely in what sense. To do so, we need to write this inner product in terms of the known ring elements $(\mathbf { c},\mathbf {z}_1,\mathbf {z}_2^\dag )$ that appear in the signature on the one hand, and unknown elements on the other hand. This can be done as follows:

$$\begin{aligned} \langle \mathbf { z},\mathbf {sc} \rangle&= \langle \mathbf { z}_1, \mathbf { s}_1\mathbf c \rangle + \langle \mathbf { z}_2, \mathbf { s}_2\mathbf c \rangle = \langle \mathbf { z}_1 \mathbf { c}^*, \mathbf { s}_1 \rangle + \langle 2^d\mathbf {z}^{\dagger }_2, \mathbf {s}_2 \mathbf { c} \rangle + \langle \mathbf { z}_2 - 2^d\mathbf { z}^{\dagger }_2, \mathbf {s}_2 \mathbf {c} \rangle \\&= \langle \mathbf { z}_1 \mathbf { c}^*, \mathbf { s}_1 \rangle + \langle 2^d\mathbf { z}^{\dagger }_2 \mathbf { c}^*, \mathbf {s}_2 \rangle + e = \langle \mathbf {a}, \mathbf {s} \rangle + e, \end{aligned}$$

where we let:

$$\begin{aligned} \mathbf { a}=(\mathbf {z}_1\mathbf { c}^*, 2^d\mathbf { z}_2^{\dagger } \mathbf { c}^*) \in \mathbb {Z}^{2\bar{n}} = \mathbb {Z}^n \quad \text {and} \quad e = \langle \mathbf { z}_2 - 2^d\mathbf { z}^{\dagger }_2 , \mathbf {s}_2 \mathbf {c} \rangle . \end{aligned}$$

The vector $\mathbf { a}$ can be computed from the signature, and is therefore known to the side-channel attacker, whereas e is some unknown value. In these expressions, $\mathbf { c}^*$ is the conjugate of $\mathbf { c}$ with respect to the inner product (i.e. the matrix of multiplication by $\mathbf { c}$ in the polynomial basis of $\mathbb {Z}[x]/(x^{\bar{n}}+1)$ is the transpose of that of $\mathbf { c}$).

Now the rejection sampling ensures that the coefficients of $\mathbf { z}_1$ are independent and distributed according to a discrete Gaussian D of standard deviation $\sigma $. On the other hand, $\mathbf { c}$ is a random vector with coefficients in $\{0,1\}$ and exactly $\kappa $ non zero coefficients; thus, $\mathbf { c}^*$ has a similar shape possibly up to the sign of coefficients. It follows that the coefficients of $\mathbf { z}_1\mathbf c^*$ are all linear combinations with $\pm 1$ coefficients of exactly $\kappa $ independent samples from D and the signs clearly do not affect the resulting distribution.

Therefore, if we denote by $\chi _a$ the distribution $D^{*\kappa }$ obtained by summing $\kappa $ independent samples from D, the coefficients of $\mathbf { z}_1\mathbf c^*$ follow $\chi _a$. It is not exactly correct that $\mathbf { z}_1\mathbf c^*$ as a whole follows $\chi _a^{\bar{n}}$ (as its coefficients are not rigorously independent), but we will heuristically ignore that subtlety and pretend it does. Note that $\chi _a$ is a distribution of variance:

$$\begin{aligned} \sigma _a^2 = {{\mathrm{Var}}}\big (D^{*\kappa }\big ) = \kappa \cdot {{\mathrm{Var}}}(D) = \kappa \sigma ^2. \end{aligned}$$

We have not precisely described how the BLISS signature compression works, but roughly speaking, $\mathbf { z}_2^\dag $ is essentially obtained by keeping the $(\log q - d)$ most significant bits of $\mathbf { z}_2$, and therefore the distribution of $2^d \mathbf { z}_2^\dag $ is close to that of $\mathbf { z}_2$. The distributions cannot coincide exactly, since all the coefficients of $2^d \mathbf { z}_2^\dag $ are multiples of $2^d$ while this normally does not happen for $\mathbf { z}_2$, but the difference will not matter much for our purposes, and we will therefore heuristically assume that the entire vector $\mathbf { a}$ is distributed as $\chi _a^n$.

We now turn our attention to the noise value e, which we write as $\langle \mathbf { w},\mathbf { u}\rangle $ with $\mathbf { w} = \mathbf { z}_2-2^d\mathbf { z}_2^\dag $ and $\mathbf { u} = \mathbf s_2\mathbf { c}$. Now, $\mathbf { w}$ is obtained as the difference between $\mathbf { z}_2$ and $2^d\mathbf { z}_2^\dag $, where again the latter is roughly speaking obtained by zeroing out the d least significant bits of $\mathbf { z}_2$ in a centered way. We can therefore heuristically expect that the coefficients of $\mathbf { w}$ are distributed uniformly in $[-2^{d-1}, 2^{d-1}]\cap \mathbb {Z}$, i.e. $\mathbf { w} \sim \mathscr {U}_{\alpha }^n$ with $\alpha =2^{d-1}$. In particular, these coefficients have variance $\alpha (\alpha +1)/3 \approx 2^{2d}/12$.

As for $\mathbf { u}$, its coefficients are obtained as sums of $\kappa $ coefficients of $\mathbf { s}_2$. Now $\mathbf { s}_2$ itself (ignoring the constant coefficient, which is shifted by 1) is obtained as a random vector with $\delta _1\bar{n}$ coefficients equal to $\pm 2$, $\delta _2\bar{n}$ coefficients equal to $\pm 4$ and all its other coefficients equal to zero. This is a somewhat complicated distribution to describe, but we do not make a large approximation by pretending that all the coefficients are sampled independently in the set $\{-4,-2,0,2,4\}$ with probabilities $\delta _2/2,\delta _1/2,(1-\delta _1-\delta _2),\delta _1/2$ and $\delta _2/2$ respectively. Making that approximation, it follows that the coefficients of $\mathbf { u}$ have variance $\kappa \cdot (4\delta _1 + 16\delta _2)$.

Write $\mathbf { u} = (u_1,\dots ,u_{\bar{n}})$ and $\mathbf { w} = (w_1,\dots ,w_{\bar{n}})$. Under the heuristic approximations above, since $\mathbf { w}$ and $\mathbf { u}$ are independent and their coefficients have mean zero, the error e follows a certain bounded distribution $\chi _e$ of variance $\sigma _e^2$ given by:

$$\begin{aligned} \sigma _e^2&= \mathbb {E}[e^2] = \mathbb {E}\bigg [ \Big (\sum _{i=1}^{\bar{n}} w_i u_i \Big )^2 \bigg ] = \mathbb {E}\bigg [\sum _{i,j} w_iw_ju_iu_j\bigg ] = \mathbb {E}\bigg [\sum _{i=1}^{\bar{n}} w_i^2 u_i^2\bigg ] \\&= \sum _{i=1}^{\bar{n}} \mathbb {E}[w_i^2]\cdot \mathbb {E}[u_i^2] = \bar{n}\cdot {{\mathrm{Var}}}\big (\mathscr {U}_{\alpha }\big )\cdot \kappa (4\delta _1+16\delta _2) \approx \frac{2^{2d}}{3}(\delta _1+4\delta _2) \bar{n}\kappa . \end{aligned}$$

With these various approximations, recovering $\mathbf { s}$ from the leakage exactly becomes an ILWE problem with distributions $\chi _a$ and $\chi _e$, where each side-channel trace provides a sample. It should therefore be feasible to recover the full secret key with least squares regression using $m = O\big ((\sigma _e/\sigma _a)^2\log n\big )$ traces.

5.3 Experimental Distributions

The description of the previous section made a number of heuristic approximations which we know cannot be precisely satisfied in practice. In order to validate those approximations nonetheless, we have carried out numerical simulations comparing in particular our estimates for the standard deviations $\sigma _a$ and $\sigma _e$ of the distributions of $\mathbf { a}$ and e with the standard deviations obtained from the actual rejection sampling leakage in BLISS.

These simulations were carried out in Python using the numpy package. We used 10000 ILWE samples arising from side channel leaks for each BLISS parameter set. Results are collected in Table 2; experimental values for $\sigma _a$ are provided separately for the two halves $(\mathbf { a}_1,\mathbf { a}_2)$ of the vector $\mathbf { a}$, which we have seen are computed differently. As we can see, the experimental values match the heuristic estimates quite closely overall.

Table 2. Parameter estimation for ILWE instances arising from the side channel attack

Full size table

6 Numerical Simulations

In this section, we present simulation results for recovering ILWE secrets using linear regression, first for normal ILWE instances, and then for ILWE instances arising from BLISS side-channel leakage, as described in Sect. 5.2, leading to BLISS secret key recovery. These results are based on simulated leakage data rather than actual side-channel traces. However, we note that the leakage scenario for BLISS is essentially identical to the one described in [19] (namely, a SPA/SEMA setting where each trace reveals the exact value of a certain function of the secret key—in our case, the linear function given by the inner product), and was therefore experimentally validated in that paper.

6.1 Plain ILWE

Recall that the ILWE problem is parametrized by $n,m \in \mathbb {Z}$ and probability distributions $\chi _a$ and $\chi _e$. Samples are computed as $\mathbf {b} = A\mathbf {s} + \mathbf {e}$, where $\mathbf {s} \in \mathbb {Z}^n$, $\mathbf {b} \in \mathbb {Z}^m$, $A \in \mathbb {Z}^{m \times n}$ with entries drawn from $\chi _a$, and $e \in \mathbb {Z}^m$ with entries drawn from $\chi _e$. Choosing $\chi _a$ and $\chi _e$ as discrete gaussian distributions with standard deviations $\sigma _a$ and $\sigma _e$ respectively, we investigated the number of samples, m required to recover ILWE secret vectors $\mathbf {s} \in \mathbb {Z}^n$ for various concrete values of $n, \sigma _a$ and $\sigma _e$. We sampled sparse secret vectors $\mathbf {s}$ uniformly at random from the set of vectors with entries set to $\pm 1$, entries set to $\pm 2$, and the rest zero.

We present two types of experimental results for plain ILWE. In our first experiment, we began by estimating the number of samples m required to recover the secret perfectly with good probability, for different values of $n, \sigma _a$, and $\sigma _e$. Then, fixing m, we measured the probability of recovering $\mathbf {s}$ over the random choices of $\mathbf {s}$, A and e. Our results are displayed in Table 3.

In our second experiment, we investigated the distribution of the minimum value of m required to recover the secret perfectly, over the random choices of $\mathbf {s}$, A, and $\mathbf {e}$, when the linear regression method was run to completion. In other words, for fixed $n, \sigma _a$, and $\sigma _e$, we generated more and more samples until the secret could be perfectly recovered. Our results for $\sigma _e = 2000$ are plotted in Fig. 3. Additional results and some additional notes may be found in the full version of this paper [10]. Each figure plots the dimension n against the mean number of samples m required to recover the secret, for $\sigma _a = 100$, 200, and 500. Here, ‘mean’ refers to the interquartile mean number of samples. The error bars show the upper and lower quartiles for the number of samples required.

The results of our second experiment are consistent with the theoretical results given in Sect. 4.1. According to (4.6), we require

$$\begin{aligned} m \ge C'\cdot \frac{\sigma _e^2}{\sigma _a^2}\log n \end{aligned}$$

samples in order to recover the secret correctly. The dimension n on the horizontal axis of each graph is plotted on a logarithmic scale. Therefore, theory predicts that we should observe a straight line, which the graphs confirm.

The gradient of the graph corresponds to the constant $C'$ giving the number of samples required for secret-recovery in practice. Note that in this case, where $\chi _a$ and $\chi _e$ follow the discrete Gaussian distribution, Theorem 4.5 gives $C' = 32$ for a small failure probability of $\frac{1}{2n}$. However, in this experiment, we are likely to succeed much sooner, with a smaller number of samples. For example, in any particular trial, as soon as m is such that the failure probability is at least one half, we are likely to recover the secret. This explains why the gradient is much lower than given by Theorem 4.5. Computing the gradients of the lines of best fit and dividing by $(\sigma _e / \sigma _a)^2$ gives an estimate for the observed value of the constant $C'$. See the full version of this paper [10] for details.

Table 3. Practical results of the experiments on ILWE

Full size table

6.2 BLISS Side-Channel Attack

Having obtained an instance of the ILWE problem from BLISS side-channel leakage as described in Sect. 5.2, we used linear regression to recover BLISS secret keys. We performed several trials. For each trial, we generated ILWE samples using side-channel leakage until we could recover the secret key. For BLISS–0, we simply used regression to recover the entire secret key. For BLISS–I and BLISS–II, we usually ran into memory issues before being able to successfully recover the entire secret key. However, we noticed that in practice, we could recover the first half of the secret key correctly using far fewer samples. Since the two halves of the secret key are related by the public key, this is sufficient to compute the entire secret key. Therefore, for BLISS–I and BLISS–II, we stopped generating samples as soon as the least-squares estimator correctly recovered the first half of the secret.

For these two different scenarios, we obtain the results displayed on Table 4, which gives information on the range, quartiles, and interquartile mean of the number of samples required. Typical timings for the side-channel attacks, using SAGEMath, on a laptop with 2.60 GHz processor, are displayed in Table 5. Timings are in the orders of minutes and seconds. By comparison, some of the attacks from [19] may take hours, or even days, of CPU time.

Table 4. Number of samples required to recover the secret key (minimum, lower quartile, interquartile mean, upper quartile, maximum)

Full size table

Table 5. Typical timings for secret key recovery

Full size table

References

Aggarwal, D., Joux, A., Prakash, A., Santha, M.: A new public-key cryptosystem via Mersenne numbers. Cryptology ePrint Archive, Report 2017/481 (2017). http://eprint.iacr.org/2017/481
Albrecht, M.R.: On dual lattice attacks against small-secret LWE and parameter choices in HElib and SEAL. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017, part II. LNCS, vol. 10211, pp. 103–129. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56614-6_4
Chapter Google Scholar
Albrecht, M.R., Faugère, J.-C., Fitzpatrick, R., Perret, L.: Lazy modulus switching for the BKW algorithm on LWE. In: Krawczyk, H. (ed.) PKC 2014. LNCS, vol. 8383, pp. 429–445. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54631-0_25
Chapter Google Scholar
Alwen, J., Krenn, S., Pietrzak, K., Wichs, D.: Learning with rounding, revisited. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, part I. LNCS, vol. 8042, pp. 57–74. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40041-4_4
Chapter Google Scholar
Applebaum, B., Cash, D., Peikert, C., Sahai, A.: Fast cryptographic primitives and circular-secure encryption based on hard learning problems. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 595–618. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03356-8_35
Chapter Google Scholar
Arora, S., Ge, R.: New algorithms for learning in presence of errors. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011, part I. LNCS, vol. 6755, pp. 403–415. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22006-7_34
Chapter Google Scholar
Bai, S., Galbraith, S.D.: Lattice decoding attacks on binary LWE. In: Susilo, W., Mu, Y. (eds.) ACISP 2014. LNCS, vol. 8544, pp. 322–337. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08344-5_21
Chapter Google Scholar
Banerjee, A., Peikert, C., Rosen, A.: Pseudorandom functions and lattices. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 719–737. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29011-4_42
Chapter Google Scholar
Bogdanov, A., Guo, S., Masny, D., Richelson, S., Rosen, A.: On the hardness of learning with rounding over small modulus. In: Kushilevitz, E., Malkin, T. (eds.) TCC 2016, part I. LNCS, vol. 9562, pp. 209–224. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49096-9_9
Chapter MATH Google Scholar
Bootle, J., Delaplace, C., Espitau, T., Fouque, P.A., Tibouchi, M.: LWE without modular reduction and improved side-channel attacks against BLISS. Cryptology ePrint Archive, Report 2018/822 (2018). http://eprint.iacr.org/2018/822. Full version of this paper
Bootle, J., Tibouchi, M., Xagawa, K.: Cryptanalysis of compact-LWE. In: Smart, N.P. (ed.) CT-RSA 2018. LNCS, vol. 10808, pp. 80–97. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76953-0_5
Chapter Google Scholar
Brakerski, Z., Langlois, A., Peikert, C., Regev, O., Stehlé, D.: Classical hardness of learning with errors. In: Boneh, D., Roughgarden, T., Feigenbaum, J. (eds.) 45th ACM STOC, pp. 575–584. ACM Press, June 2013
Google Scholar
Groot Bruinderink, L., Hülsing, A., Lange, T., Yarom, Y.: Flush, gauss, and reload – a cache attack on the BLISS lattice-based signature scheme. In: Gierlichs, B., Poschmann, A.Y. (eds.) CHES 2016. LNCS, vol. 9813, pp. 323–345. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53140-2_16
Chapter Google Scholar
Candes, E., Tao, T.: The Dantzig selector: statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35(6), 2313–2351 (2007)
Article MathSciNet Google Scholar
Dodis, Y., Goldwasser, S., Tauman Kalai, Y., Peikert, C., Vaikuntanathan, V.: Public-key encryption schemes with auxiliary inputs. In: Micciancio, D. (ed.) TCC 2010. LNCS, vol. 5978, pp. 361–381. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11799-2_22
Chapter Google Scholar
Döttling, N., Müller-Quade, J.: Lossy codes and a new variant of the learning-with-errors problem. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 18–34. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38348-9_2
Chapter Google Scholar
Ducas, L., Durmus, A., Lepoint, T., Lyubashevsky, V.: Lattice signatures and bimodal Gaussians. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, part I. LNCS, vol. 8042, pp. 40–56. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40041-4_3
Chapter Google Scholar
Ducas, L., Lepoint, T.: BLISS: Bimodal Lattice Signature Schemes, June 2013. http://bliss.di.ens.fr/bliss-06-13-2013.zip. (proof-of-concept implementation)
Espitau, T., Fouque, P.A., Gérard, B., Tibouchi, M.: Side-channel attacks on BLISS lattice-based signatures: exploiting branch tracing against strongSwan and electromagnetic emanations in microcontrollers. In: Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D. (eds.) ACM CCS 2017, pp. 1857–1874. ACM Press, October/November 2017
Google Scholar
Galbraith, S.D.: Space-efficient variants of cryptosystems based on learning with errors. On-Line (2012). https://www.math.auckland.ac.nz/~sgal018/compact-LWE.pdf
Goldwasser, S., Kalai, Y.T., Peikert, C., Vaikuntanathan, V.: Robustness of the learning with errors assumption. In: Yao, A.C.C. (ed.) ICS 2010, pp. 230–240. Tsinghua University Press, January 2010
Google Scholar
Hamburg, M.: Post-Quantum Cryptography Proposal: ThreeBears (2017). https://csrc.nist.gov/projects/post-quantum-cryptography/round-1-submissions
Herold, G., May, A.: LP solutions of vectorial integer subset sums – cryptanalysis of galbraith’s binary matrix LWE. In: Fehr, S. (ed.) PKC 2017, part I. LNCS, vol. 10174, pp. 3–15. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54365-8_1
Chapter Google Scholar
Hoffstein, J., Pipher, J., Whyte, W., Zhang, Z.: A signature scheme from learning with truncation. Cryptology ePrint Archive, Report 2017/995 (2017). http://eprint.iacr.org/2017/995
Howgrave-Graham, N., Szydlo, M.: A method to solve cyclotomic norm equations $f * \bar{f}$. In: Buell, D. (ed.) ANTS 2004. LNCS, vol. 3076, pp. 272–279. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24847-7_20
Chapter Google Scholar
Hsu, D., Kakade, S., Zhang, T.: Tail inequalities for sums of random matrices that depend on the intrinsic dimension. Electron. Commun. Probab. 17(14), 1–13 (2012)
MathSciNet MATH Google Scholar
Kahane, J.P.: Propriétés locales des fonctions à séries de Fourier aléatoires. Stu. Math. 19, 1–25 (1960)
Article Google Scholar
Kocher, P.C., Jaffe, J., Jun, B., Rohatgi, P.: Introduction to differential power analysis. J. Cryptogr. Eng. 1(1), 5–27 (2011)
Article Google Scholar
Langlois, A., Stehlé, D.: Worst-case to average-case reductions for module lattices. Des. Codes Cryptogr. 75(3), 565–599 (2015)
Article MathSciNet Google Scholar
Li, H., Liu, R., Pan, Y., Xie, T.: Cryptanalysis of Compact-LWE submitted to NIST PQC project. Cryptology ePrint Archive, Report 2018/020 (2018). https://eprint.iacr.org/2018/020
Ling, S., Phan, D.H., Stehlé, D., Steinfeld, R.: Hardness of k-LWE and applications in traitor tracing. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014, part I. LNCS, vol. 8616, pp. 315–334. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44371-2_18
Chapter MATH Google Scholar
Litvak, A., Pajor, A., Rudelson, M., Tomczak-Jaegermann, N.: Smallest singular value of random matrices and geometry of random polytopes. Adv. Math. 195(2), 491–523 (2005)
Article MathSciNet Google Scholar
Liu, D.: Compact-LWE for lightweight public key encryption and leveled IoT authentication. In: Pierpzyk, J., Suriadi, S. (eds.) ACISP 2017, part I. LNCS, vol. 10342, p. XVI. Springer, Heidelberg (2017)
Google Scholar
Liu, D., Li, N., Kim, J., Nepal, S.: Compact-LWE (2017). https://csrc.nist.gov/projects/post-quantum-cryptography/round-1-submissions
Lyubashevsky, V.: Fiat-Shamir with aborts: applications to lattice and factoring-based signatures. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 598–616. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10366-7_35
Chapter Google Scholar
Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 1–23. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5_1
Chapter Google Scholar
Lyubashevsky, V., Peikert, C., Regev, O.: A toolkit for ring-LWE cryptography. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 35–54. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38348-9_3
Chapter Google Scholar
Micciancio, D., Peikert, C.: Hardness of SIS and LWE with small parameters. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, part I. LNCS, vol. 8042, pp. 21–39. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40041-4_2
Chapter Google Scholar
Paouris, G., Valettas, P., Zinn, J.: Random version of Dvoretzky’s theorem in $\ell _p^n$. Stoch. Process. Their Appl. 127(10), 3187–3227 (2017)
Article Google Scholar
Pessl, P., Bruinderink, L.G., Yarom, Y.: To BLISS-B or not to be: attacking strongSwan’s implementation of post-quantum signatures. In: Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D. (eds.) ACM CCS 2017, pp. 1843–1855. ACM Press, October/November 2017
Google Scholar
Pöppelmann, T., Oder, T., Güneysu, T.: High-performance ideal lattice-based cryptography on 8-bit ATxmega microcontrollers. In: Lauter, K.E., Rodríguez-Henríquez, F. (eds.) LATINCRYPT 2015. LNCS, vol. 9230, pp. 346–365. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-22174-8_19
Chapter Google Scholar
van Handel, R.: Probability in high dimension. Princeton University, Technical report (2014)
Google Scholar
Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. In: Gabow, H.N., Fagin, R. (eds.) 37th ACM STOC, pp. 84–93. ACM Press, May 2005
Google Scholar
Stadje, W.: An inequality for $\ell _p$-norms with respect to the multivariate normal distribution. J. Math. Anal. Appl. 102(1), 149–155 (1984)
Article MathSciNet Google Scholar
Steffen, A., et al.: strongSwan: the Open Source IPsec-Based VPN Solution (version 5.5.2), March 2017. https://www.strongswan.org/
Stehlé, D., Steinfeld, R., Tanaka, K., Xagawa, K.: Efficient public key encryption based on ideal lattices. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 617–635. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10366-7_36
Chapter Google Scholar
Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4), 389–434 (2012)
Article MathSciNet Google Scholar
Xiao, D., Yu, Y.: Cryptanalysis of Compact-LWE and related lightweight public key encryption. In: Security and Communication Networks 2018 (2018)
Google Scholar

Download references

Acknowledgments

This work has been supported in part by the European Union’s H2020 Programme under grant agreement number ERC-669891.

Author information

Authors and Affiliations

University College London, London, UK
Jonathan Bootle
Univ Rennes, Rennes, France
Claire Delaplace & Pierre-Alain Fouque
Univ Lille, Lille, France
Claire Delaplace
Sorbonne Université, Paris, France
Thomas Espitau
NTT Secure Platform Laboratories, Tokyo, Japan
Mehdi Tibouchi

Authors

Jonathan Bootle
View author publications
You can also search for this author in PubMed Google Scholar
Claire Delaplace
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Espitau
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-Alain Fouque
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Tibouchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehdi Tibouchi .

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore, Singapore
Thomas Peyrin
University of Auckland, Auckland, New Zealand
Steven Galbraith

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bootle, J., Delaplace, C., Espitau, T., Fouque, PA., Tibouchi, M. (2018). LWE Without Modular Reduction and Improved Side-Channel Attacks Against BLISS. In: Peyrin, T., Galbraith, S. (eds) Advances in Cryptology – ASIACRYPT 2018. ASIACRYPT 2018. Lecture Notes in Computer Science(), vol 11272. Springer, Cham. https://doi.org/10.1007/978-3-030-03326-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-03326-2_17
Published: 27 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03325-5
Online ISBN: 978-3-030-03326-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the International Association for Cryptologic Research (opens in a new tab)

LWE Without Modular Reduction and Improved Side-Channel Attacks Against BLISS

Abstract

Similar content being viewed by others

Coded-BKW with Sieving

Solving LWR via BDD Strategy: Modulus Switching Approach

Provable Dual Attacks on Learning with Errors

1 Introduction

2 Preliminaries

2.1 Notation

2.2 LWE over the Integers

Definition 2.1

Definition 2.2

2.3 Subgaussian Probability Distributions

Definition 2.3

Lemma 2.4

Proof

Lemma 2.5

Proof

Lemma 2.6

Proof

Lemma 2.7

Proof

Definition 2.8

Lemma 2.9

Proof

3 Information-Theoretic Analysis

Lemma 3.1

Proof

Lemma 3.2

Proof

Theorem 3.3

Proof

Remark 3.4

4 Solving the ILWE Problem

4.1 Least Squares Method

Lemma 4.1

Theorem 4.2

Proof

Remark 4.3

Lemma 4.4

Proof

Theorem 4.5

Proof

4.2 An Exact-CVP Oracle Will Not Help

Lemma 4.6

Proof

4.3 Sparse Secret and Compressed Sensing

Theorem 4.7

5 Application to the Side-Channel Attack of BLISS

5.1 BLISS Signatures and Rejection Sampling Leakage

5.2 Description of the Attack

5.3 Experimental Distributions

6 Numerical Simulations

6.1 Plain ILWE

6.2 BLISS Side-Channel Attack

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships