Keywords

1 Introduction

The Learning With Errors (LWE) problem [40] has become central to the security of several cryptosystems. Most notably, Kyber (public-key encryption) and Dilithium (signature) have been selected by the NIST for the Post-Quantum Cryptography (PQC) Standardization and rely on algebraic version of LWE for their security proofs. Other advanced cryptographic primitives such as FHE can be built with LWE [15]. This makes LWE security estimates critical for the future of PQC. The search \({{\,\textrm{LWE}\,}}\) problem asks to recover the secret \(\textbf{s}\) given \((\textbf{A},\textbf{b})\) where \(\textbf{b}=\textbf{A}\textbf{s}+\textbf{e}\), \(\textbf{A}\) is a matrix chosen uniformly at random and \(\textbf{e}\) has small entries (more details in Sect. 2.1).

There are two main approaches to attack the LWE problem: so-called primal and dual attacks. In this paper, we will exclusively focus on dual attacks which have recently attracted some interest due to significant claimed improvements in their complexity. Both primal and dual attacks rely on the BKZ lattice reduction algorithm [43] to obtain short vectors in lattices. The fundamental idea of dual attacks is to use short vectors in the dual of the lattice to detect whether points are close to the lattice or not, an idea that can be traced back to [5]. This allows us to solve the distinguishing \({{\,\textrm{LWE}\,}}\) problem where one is asked to detect whether a sample comes from an LWE distribution, or a uniform distribution [35]. In conjunction with some guessing step, this allows one to recover part of the secret by trying several values until we get a point close to the lattice. By repeating this operation a couple of times, we can solve the search \({{\,\textrm{LWE}\,}}\) problem.

Originally, the main limiting factor (on the complexity) of dual attacks was the need to compute one short vector (a very expensive operation) for every few LWE samples (more details in Sect. 3) and compute a score for each secret guess. Since then, a series of improvements have found their way into these attacks. First, a series of works on lattice sieving have shown [13, 36, 38] that those algorithms produce not only one but in fact exponentially many short vectors “for free”. [11] suggested that this idea could be used in dual attacks but it appears that [23] was the first paper to try to analyze it. Independently, [7] used a “re-randomization” technique to produce many short vectors from a single BKZ reduced basis. All those techniques claim to reduce the complexity of attacks although the correctness relies on an unproven assumption about the quality of those many short vectors. Then [25] noted that instead of computing the score for each secret guess separately, all the scores can be computed at once using a discrete Fourier transform (DFT), essentially reducing the cost to that of a single guess. Following this work, a technical report by the MATZOV group [32] has claimed further improvements by the use of a “modulus switching” techniqueFootnote 1 that significantly reduces the size of the DFT. Two recent work have modified this attack to include a quantum [10] and lattice coding [16] speed up.

One issue with the papers above is that the number of statistical assumptions that are necessary to justify the correctness of the algorithms has grown significantly, notably in [32]. While certain assumptions could probably be justified (almost) formally, others are subject to more controversy [20]. In particular, the most controversial aspect of [25, 32] is that the attack only uses a few LWE samples and that all the (exponentially-many) short vectors are derived from those samples which therefore are not statistically independent. When using a small number of LWE samples, the problem becomes very close to the Bounded Distance Decoding which has been extensively studied. The status of [7] is unclear because it computes exponentially many short vectors from exponentially many samples, but the ratio of the number of short vectors to the number of samples is also exponential so the issue of the statistical independence remains but it does not seem as problematic. This makes it unclear whether an argument like that of [20] applies to such a case.

The purpose of this paper is to encourage a more rigorous analysis of dual attacks on LWE to better understand under what set of parameters they provably work. We note in that regard that a recently accepted paper at TCC 2023 [33] has focused on similar problems in statistical decoding/“dual attacks” in coding theory. The authors claim in the conclusion that at least part of their results apply to lattice dual attack. We believe that it would indeed be interesting to see what this approach yields for lattices, however we point out that the notion of dual attack that the authors have in mind looks quite different from the one in this paper. In short, and with our notations, the “dual attack” of [33] would be akin to splitting \(\textbf{A}\) horizontally instead of vertically. This splitting would not correspond anymore to a decomposition of \(L_q(\textbf{A})\) as \(L_q(\textbf{A}_\textrm{guess})+L_q(\textbf{A}_\textrm{dual})\) and therefore looks incompatible with existing works on dual attacks on LWE. Furthermore, our understanding of [33] is that generating parity check vectors \(\textbf{h}\) corresponds to generating many short dual vectors in \(L_q^\perp (\textbf{A})\), independently of the splitting of \(\textbf{A}\). This is completely at odds with lattice dual attacks where we split \(\textbf{A}\) to generate dual vectors in \(L_q^\perp (\textbf{A}_\textrm{dual})\) which is much cheaper. Overall it looks like [33] might be a completely different kind of dual attack. See [39, Appendix A] for more details.

1.1 Contributions

The main contribution of this paper is to provide a completely formal, non-asymptotic analysis of a simplified dual attack. To simplify the presentation, we do not include elements such as the guessing complexity and modulus switchingFootnote 2 to focus on the most controversial element, namely the fact that the attack only uses m LWE samples (with m not much bigger than the dimension n of the samples) and that all the short vectors are derived from those m samples.

Our approach completely departs from the existing statistics-based attacks and is instead rooted in geometry. This allows us to obtain a relatively short proof and leverage existing results on the geometry of lattices.

One of the most important technical contribution of this paper is to make completely clear (Theorem 5) under what choice of parameters the attack works, without any statistical assumption. As far as we are aware, no other dual attack has been formally analyzed in this way. We believe that this is important since virtually all algorithms in the literature rely on statistical assumptions that clearly cannot hold for all parameter regimes but without a proper analysis, it is impossible to tell when and why they hold.

We also provide some new results on random q-ary lattices in a similar spirit to that of Siegel, Rogers and Macbeath [31, 41, 44]. This allows us to obtain some sharper bounds on \(\lambda _1\) for random q-ary lattices and show that the Gaussian Heuristic is quite tight for such lattices. This heuristic is usually considered valid for “random” lattices and has been extensively tested. Up to our knowledge, the only formal analysis of \(\lambda _1\) for random q-ary lattice is in [47, Lemma 7.9.2] which only analyzes the expected value and therefore provides a much weaker bound on \(\lambda _1\). We refer to Sect. 2.3 for more details.

Finally, we give a quantum version of our algorithm to speed up the computation. The algorithm is inspired by [10] and reuses some technical lemmas to speed up the computation of sums of cosines that appear in the algorithm. Similarly to our classical algorithm, we prove that our quantum algorithm is correct without relying on any heuristics.

1.2 Comparison with [20]’s Contradictory Regime

A recent paper [20] has claimed that virtually all recent dual attacks rely on an incorrect statistical assumption and that they are, therefore, probably incorrect. They do so by formalizing what they claim to be the key statistical assumption of those paper, and show that for the parameter regime of the attacks, it falls into what they call the “contradictory regime”, a regime where this assumption can be proven not to hold.

As a byproduct of our analysis, we are able to compare the regime in which our analysis works with the contradictory regime of [20]. Interestingly, the two are essentially complementary with a small gap inbetween. This suggests that our analysis and that of [20] are quite tight and provide an almost complete characterization of when dual attacks work in our simplified setting. However, we nuance this conclusion by noting that the statistical model used in [20] to argue about the contradiction does not seem to match what happens in our algorithm. We refer to Sect. 6 for more details.

1.3 Organisation of the Paper

In Sect. 2, we introduce the various technical elements that are necessary to analyse the dual attack. In Sect. 3, we first present a basic dual attack whose purpose is to introduce the reader to the ideas of dual attacks without overwhelming them with technical details. This dual attack is very naive and computes one short vector per LWE sample, in the spirit of [5]. We emphasize that this attack and Theorem 4 are not new but that our analysis is significantly simpler than in previous papers. In Sect. 4, we introduce our simplified dual attack in the spirit of [32] and formally analyse its correctness without assumption. We provide some rough estimates on the complexity of our attack on Kyber using a Monte Carlo Markov Chain discrete Gaussian sampler. In Sect. 5, we give a quantum version of the algorithm from Sect. 4 and prove its correctness. In Sect. 6, we compare our regime with that of [20]. Finally, in Sect. 7, we describe what we believe is the main obstacle to develop a formal analysis of the full algorithm in [32].

2 Preliminaries

We denote vectors and matrices in bold case. We denote by \(\textbf{x}^T\) the transpose of the (column) vector \(\textbf{x}\), which is therefore a row vector. We denote by \(\textbf{I}_n\) the identity matrix of size \(n\times n\). For any vector \(\textbf{x}\in \mathbb {R}^n\), we denote by \(\left\| \textbf{x}\right\| \) its Euclidean norm. We denote by \(\left\langle \textbf{x},\textbf{y}\right\rangle \) the scalar product between two vectors \(\textbf{x}\) and \(\textbf{y}\). For any function \(f:\mathbb {R}^n\rightarrow \mathbb {C}\), we denote by \(\widehat{f}\) its Fourier transform over \(\mathbb {R}^n\) defined by \( \widehat{f}(\textbf{x})=\int _{\mathbb {R}^n}f(\textbf{y})e^{-2i\pi \left\langle \textbf{x},\textbf{y}\right\rangle }\,\textrm{d}\textbf{x}. \) For any \(n\in \mathbb {N}\) and \(R>0\), we denote by \(B_n(R)\) (resp. \(\overline{B}_n(R)\)) the open (resp. closed) ball of radius R in \(\mathbb {R}^n\). We also let \(B^\mathbb {Z}_n(R)=B_n(R)\cap \mathbb {Z}^n\) be the set of integers points in this ball, and similarly for \(\overline{B}^\mathbb {Z}_n(R)\). For any two distributions P and Q, we denote by \({{\,\mathrm{\textrm{d}_{\textrm{TV}}}\,}}(P,Q)\) the statistical distance (or total variation distance) between P and Q. For any finite set X, we denote by \({{\,\mathrm{\mathcal {U}}\,}}(X)\) the uniform distribution over X.

2.1 LWE

Let \(n,m,q\in \mathbb {N}\) and let \(\chi _e\) be a distribution over \(\mathbb {Z}_q\), which we call the noise distribution. For every vector \(\textbf{s}\in \mathbb {Z}_q^n\), we denote by \({{\,\textrm{LWE}\,}}(m,\textbf{s},\chi _e)\) the probability distribution on \(\mathbb {Z}_q^{m\times n}\times \mathbb {Z}_q^m\) obtained by sampling a matrix \(\textbf{A} \in \mathbb {Z}_q^{m\times n}\) uniformly at random, sampling a vector \(\textbf{e} \in \mathbb {Z}_q^m\) according to \(\chi _e^m\), and outputting \((\textbf{A},\textbf{b})\) where \(\textbf{b} {:}{=}\textbf{A}\textbf{s}+\textbf{e}\). This is the “matrix form” for the LWE distribution where each pair \((\textbf{A},\textbf{b})\) encodes m LWE samples \(\textbf{b}_i=\left\langle \textbf{A}_i,\textbf{s}\right\rangle +\textbf{e}_i\) in the sense of [40]. We have chosen this formalism because it is simpler for dual attacks. The value of m is typically in the order of n and depends on the cryptosystem.

The search LWE problem is to find \(\textbf{s}\) given oracle access to a sampler for \({{\,\textrm{LWE}\,}}(m,\textbf{s},\chi _e)\). The decision LWE problem is to decide, given oracle access to either \({{\,\textrm{LWE}\,}}(m,\textbf{s},\chi _e)\) or \({{\,\mathrm{\mathcal {U}}\,}}(\mathbb {Z}_q^{m\times n}\times \mathbb {Z}_q^m)\), which one it is. In practical scenarios, the attacker may not have access to the sampler but rather only possess a limited number LWE samples. In this case, the search LWE problem asks, given those LWE samples, to recover \(\textbf{s}\) if possible.

The LWE secret \(\textbf{s}\) is usually generated according to a distribution \(\chi _s\) over \(\mathbb {Z}_q^n\). One can therefore, in principle, analyse the success probability of an algorithm for search/decision LWE on a distribution \({{\,\textrm{LWE}\,}}(m,\textbf{s},\chi _e)\) where . In this paper, we will not need to make any assumption on the distribution of the secret since our algorithms work for every secret.

2.2 Discrete Gaussian Distribution

Let \(n\in \mathbb {N}\) and \(s>0\). For any \(\textbf{x}\in \mathbb {R}^n\), we let \(\rho _s(\textbf{x}){:}{=}e^{-\pi \left\| \textbf{x}\right\| ^2/s^2}\). As usual, we extend to \(\rho _s\) to sets by \(\rho _s(X)=\sum _{\textbf{x}\in X}\rho _s(\textbf{x})\) for any set X. For any lattice \(L\subset \mathbb {R}^n\), we denote the discrete Gaussian distribution over L by \(D_{L,s}(\textbf{x})=\frac{\rho _s(\textbf{x})}{\rho _s(L)}\) for any \(\textbf{x}\in L\). We denote \(D_{L,1}\) by \(D_L\) for simplicity.

In general, the smaller s is, the harder it is to construct a sampler for \(D_{L,s}\). The notion of smoothing parameter [34] captures the idea that sampling for a valuer of s above this threshold is significantly easier than sampling below because the distribution looks more like a continuous Gaussian. There are many algorithms to sample above the smoothing parameter [14, 24, 28], including a time-space trade-off [3]. Sampling below the smoothing parameter is much more challenging and usually inefficient [4]. At the extreme, sampling for sufficiently small values of s allows one to solve the Shortest Vector problem (SVP) [4] which is known to be NP-hard under randomized reduction [6]. The Monte Carlo Markov Chain based algorithm of [46] works for all values of s but the complexity significantly depends on s. We will use this algorithm in this paper.

Theorem 1

([46, Theorem 1, (8), (23) and (24)Footnote 3]). There is an algorithm that given a basis \(\textbf{B}\) of a lattice \(L\subset \mathbb {R}^n\), any \(\varepsilon >0\) and any \(s>0\), returns a sample according to some distribution \(\mathcal {D}_{L,s,\varepsilon }\) such that \({{\,\mathrm{\textrm{d}_{\textrm{TV}}}\,}}(\mathcal {D}_{L,s,\varepsilon },D_{L,s})\leqslant \varepsilon \). This algorithm runs in time \(\ln \left( \tfrac{1}{\varepsilon }\right) \cdot \tfrac{1}{\varDelta }\cdot \textsf{poly}\!\!\left( n\right) \) where \(\frac{1}{\varDelta }=\frac{\prod _{i=1}^n\rho _{s/\left\| \widetilde{\textbf{b}}_i\right\| }(\mathbb {Z})}{\rho _{s}(L)}\) and \(\widetilde{\textbf{b}}_1,\ldots ,\widetilde{\textbf{b}}_n\) are the Gram-Schmidt vectors of \(\textbf{B}\).

For any \(q\in \mathbb {N}\), we denote by \(D_{\mathbb {Z}_q^n,s}\) the modular discrete Gaussian distribution over \(\mathbb {Z}_q^n\) defined by \(D_{\mathbb {Z}_q^n,s}(\textbf{x})=\frac{\rho _s(\textbf{x}+q\mathbb {Z}^n)}{\rho _s(\mathbb {Z}^n)}\) for any \(\textbf{x}\in \mathbb {Z}_q^n\). We define the periodic Gaussian function \(f_{L,s}:\mathbb {R}^n\rightarrow \mathbb {R}\) by \(f_{L,s}(\textbf{t})=\frac{\rho _s(L+\textbf{t})}{\rho _s(L)}\). We have \(f_{L/s,1}(\textbf{t}/s)=f_{L,s}(\textbf{t})\). In the following, we denote \(f_{L,1}\) as \(f_{L}\).

Lemma 1

([17, Lemma 2.14]). For any L, \(s>0\), \(\textbf{x}\in \mathbb {R}^n\), \(f_{L,s}(\textbf{x})\geqslant \rho _s(\textbf{x})\).

Lemma 2

([12, Lemma 7], see also [45, Theorem 1.3.4]). For any lattice \(L\subset \mathbb {R}^n\), \(\textbf{x}\in \mathbb {R}^n\) and \(u\geqslant 1/\sqrt{2\pi }\), \( \rho _s((L-\textbf{x})\setminus B_n(us\sqrt{n})) \leqslant \left( u\sqrt{2\pi e}e^{-\pi u^2}\right) ^{n} \rho _s(L). \)

Corollary 1

([45, Corollary 1.3.5]). For any lattice \(L\subset \mathbb {R}^n\), \(\textbf{t}\in \mathbb {R}^n\) and \(r\geqslant \delta {:}{=}s\sqrt{n/2\pi }\), \( \rho _s((L-\textbf{t})\setminus B_n(r)) \leqslant \rho _s(r-\delta )\rho _s(L). \)

Lemma 3

([5, Claim 4.1]). For any lattice L and \(s>0\), we have \(\widehat{f_{L,s}}=D_{\widehat{L},1/s}\) which is a probability measure over the dual lattice \(\widehat{L}\).

2.3 Lattices

We denote by \( \widehat{L}=\!\!\left\{ \textbf{x}\in {{\,\textrm{span}\,}}(L):\forall \textbf{y}\in L,\, \left\langle \textbf{y},\textbf{x}\right\rangle \in \mathbb {Z}\right\} \) the dual of a lattice \(L\subset \mathbb {R}^n\). We denote by \(L^*=L\setminus \!\!\left\{ \textbf{0}\right\} \) the set of nonzero vectors of a lattice L. We denote by \(\lambda _1(L)\) the length a shortest nonzero vector in L.

Let \(n\in \mathbb {N}\), \(1\leqslant k\leqslant n\) and q be a prime power. We say that a lattice L is a n-dimensional q-ary lattice if \(q\mathbb {Z}^n\subseteq L\subseteq \mathbb {Z}^n\). Given a matrix \(\textbf{A}\in \mathbb {Z}^{n\times k}\), we consider the following n-dimensional q-ary lattices:

$$\begin{aligned} L_q(\textbf{A}) &=\!\!\left\{ \textbf{x}\in \mathbb {Z}^{n}:\exists \textbf{s}\in \mathbb {Z}^{k},\,\textbf{A}\textbf{s}=\textbf{x}\bmod q\right\} ,\\ L_q^\perp (\textbf{A}) &=\!\!\left\{ \textbf{x}\in \mathbb {Z}^{n}:\textbf{A}^T\textbf{x}=\textbf{0}\bmod q\right\} . \end{aligned}$$

We refer the reader to [22, 47, Section 2.5.1] or [35] for more details on those constructions. Note that, equivalently, we can write \(L_q(\textbf{A})=\textbf{A}\mathbb {Z}_q^{k}+q\mathbb {Z}^{n}\). It is well-know that for any q-ary lattice L, there exists \(\textbf{A}\) and \(\textbf{B}\) such that \(L=L_q(\textbf{A})=L_q^\perp (\textbf{B})\), and that \(\widehat{L_q^\perp (\textbf{A})}=\frac{1}{q}L_q(\textbf{A})\). Furthermore \(\det (L_q(\textbf{A}))=q^{n-{{\,\textrm{rk}\,}}{\textbf{A}}}\geqslant q^{n-k}\) and therefore \(\det (L_q^\perp (\textbf{A}))=q^{{{\,\textrm{rk}\,}}{\textbf{A}}}\leqslant q^{k}\). Finally, since \(\mathbb {Z}_q\) is a field, a random matrix \(\textbf{A}\in \mathbb {Z}_q^{n\times k}\) has full rank (equal to k) with probability at least \(1-kq^{k-1-n}\). We will consider the distributions \(\mathcal {L}_{n,k,q}\) and \(\mathcal {L}^\perp _{n,k,q}\) of q-ary lattices defined over the set of integer lattices by

figure b

In other words, the distribution is obtained by taking a matrix \(\textbf{A}\in \mathbb {Z}_q^{n\times k}\) with uniform and i.i.d entries, and looking at the q-ary lattice generated by \(\textbf{A}\); and similarly for the orthogonal version. Note that contrary to the Loeliger ensemble \(\mathbb {L}_{n,k,q,1}\), we do not have the rescaling factor \(q^{1-k/n}\), see e.g. [47, Definition 7.9.2]. It will be more convenient to use \(\mathcal {L}^\perp _{n,k,q}\) for proofs, but we often want to apply them for \(\mathcal {L}_{n,k,q}\). Whenever neither k nor \(n-k\) are too small, those two distributions are very close. The following lemma was inspired by [19, Lemma 2] which does not contain any proof.

Lemma 4

([39, Appendix C.1]). Let \(n\in \mathbb {N}\), \(1\leqslant k\leqslant n\) and q be a prime power. Then \({{\,\mathrm{\textrm{d}_{\textrm{TV}}}\,}}(\mathcal {L}^\perp _{n,k,q},\mathcal {L}_{n,k,q})\leqslant \textsf{poly}\!\!\left( n,k\right) q^{-\min (k,n-k)}\).

Those distributions satisfy good uniformity properties when q goes to infinity. In particular, the following theorem shows that we can compute statistical properties of lattices sampled according to \(\mathcal {L}^\perp _{n,k,q}\). The first part of this theorem is close to [30, Theorem 1]. This result is in some sense the q-ary version of the result by Siegel on random (real) lattices and its generalization by Rogers and Macbeath [31, 41, 44].

Theorem 2

([39, Appendix C.2]). Let \(n\in \mathbb {N}\), \(1\leqslant k\leqslant n\) and q be a prime power. Let \(1\leqslant p\leqslant n\) and \(f:(\mathbb {Z}_q^n)^p\rightarrow \mathbb {R}\), then

figure c

where \(r(\mathbf {x_1},\ldots ,\mathbf {x_p}):={{\,\textrm{rk}\,}}_{\mathbb {Z}_q^n}(\textbf{x}_1,\ldots ,\textbf{x}_p)\) is the rank of the \(\textbf{x}_i\bmod q\) over \(\mathbb {Z}_q^n\).

We can apply this theorem to bound the expected number of lattice points in a ball, and therefore obtain bounds on \(\lambda _1\).

Theorem 3

([39, Appendix C.3]). Let \(n\in \mathbb {N}\), \(1\leqslant k\leqslant n\) and q be a prime power. For any \(0<r\leqslant q\),

figure d

In particular, if \(|B^\mathbb {Z}_n(r)|\leqslant q^{n-k}\), then

Recall that the Gaussian heuristic says that for a “random” lattice L, \(\lambda _1(L)\) is approximately

$$ {{\,\textrm{GH}\,}}(L){:}{=}\left( \frac{{{\,\textrm{vol}\,}}(B_n)}{\det (L)}\right) ^{-1/n} =\frac{\det (L)^{1/n}\varGamma (1+\tfrac{n}{2})^{1/n}}{\sqrt{\pi }} \approx \det (L)^{1/n}\sqrt{\frac{n}{2\pi e}}. $$

This heuristic is usually considered valid for “random” lattices and has been extensively tested. Up to our knowledge, the only formal analysis of \(\lambda _1\) for random q-ary lattice is in [47, Lemma 7.9.2] which only analyzes the expected value and not the variance. The following corollary shows that this heuristic is indeed very sharp for random q-ary lattices.

Corollary 2

(Informal, [39, Appendix C.4]). Let \(n\in \mathbb {N}\), \(1\leqslant k\leqslant n\) and q be a prime power. Let \(\alpha \in [0,1]\) and \(r=q^{1-k/n}{{\,\textrm{vol}\,}}(B_n)^{-1/n}\). Under the assumption that \(|B_n^\mathbb {Z}(\alpha r)|\approx {{\,\textrm{vol}\,}}(B_n(\alpha r))\), which holds when \(\alpha r\gg \sqrt{n}\), we have

figure f

Lemma 5

(The Pointwise Approximation Lemma [5, Lemma 1.3)], modified). Let \(L\subset \mathbb {R}^n\) be a lattice, and \(h:\mathbb {R}^n\rightarrow \mathbb {R}\) a L-periodic function whose Fourier series \(\hat{h}\) is a probability measure over \(\widehat{L}\). Let \(N\in \mathbb {N}\), \(\delta >0\) and \(X\subseteq \mathbb {R}^n\) a finite set. Let \(W=(\textbf{w}_1,\cdots ,\textbf{w}_N)\) be a list of vectors in the dual lattice chosen randomly and independently from the distribution \(\hat{h}\). Then with probability at least \(1-|X|2^{-\varOmega (N\delta ^2)}\), \(h_W(\textbf{x}){:}{=}\frac{1}{N}\sum _{i=1}^N \cos (2\pi \left\langle \textbf{w}_i,\textbf{x}\right\rangle )\) satisfies that \(|h_W(\textbf{x})-h(\textbf{x})|\leqslant \delta \) for all \(\textbf{x}\in L+X\).

Proof

The proof is the one in [5] with the following modifications. Let \(\delta >0\). For any \(\textbf{x}\in \mathbb {R}^n\), Hoeffding’s inequality guarantees that the mean of N samples is not within a window of \(\delta \) of the correct expectation with probability at most \(2^{-\varOmega (N\delta ^2)}\). Since f is periodic over the lattice L, it suffices to check that the inequality that we want holds for all \(\textbf{x}\in X\). Hence, by a union bound, the probability that the approximation is within a window \(\delta \) of the correct expectation for all \(\textbf{x}\in X\) simultaneously is at least \(1-|X|2^{-\varOmega (N\delta ^2)}\).    \(\square \)

2.4 Short Vector Sampling

For the purpose of this paper, we will only need to know that there is a way to sample relatively short vectors (SV) in a lattice and we will treat such an algorithm as a black box. Since such an algorithm would typically be parametrized (see below), we introduce an integer parameter \(\beta \) to capture this fact.

Black Box 1

For any integers \(n\leqslant m\), \(\beta \) and prime power q, there exists a deterministic algorithm \(\mathcal {B}\) and two functions \(T_\textrm{SV}\) and \(\ell _\textrm{SV}\) such that when \(\mathcal {B}\) is given \(\textbf{A}\in \mathbb {Z}_q^{m\times n}\), it returns a nonzero vector in \(L_q^\perp (\textbf{A})\) in time \(T_\textrm{SV}(m,\beta ,q^n)\) and .

One way to implement this black box is to use lattice reduction algorithms such as BKZ: they provide a very flexible way to take a basis of lattice and compute relatively short vectors in this lattice. Since the literature on this topic is quite extensive and there are many cost models associated to that task, we refer the reader to e.g. [25] for more details. For simplicity, we assume that the algorithm is deterministic but we could make it probabilistic by adding random coins to the input of the algorithm and take those into account in the expected value. In the case of BKZ, the parameter \(\beta \) is the block size.

3 Basic Dual Attack

In this section, we present a basic dual attack whose purpose is to introduce the reader to the ideas of dual attacks without overwhelming them with technical details. This dual attack is very naive and assumes that we access to essentially an unlimited number of samples. It computes one short vector per m LWE samples, in the spirit of [5]. We emphasize that this attack and Theorem 4 are not new but that our analysis is significantly simpler than in previous papers.

Fix \(\textbf{s}\in \mathbb {Z}_q^n\) an unknown secret and \((\textbf{A},\textbf{b})\) some LWE samples. Recall that \(\textbf{b}=\textbf{A}\textbf{s}+\textbf{e}\) for some unknown \(\textbf{e}\in \mathbb {Z}_q^m\). We split the secret \(\textbf{s}\) into two parts \(\textbf{s}_\textrm{guess}\in \mathbb {Z}_q^{n_\textrm{guess}}\) and \(\textbf{s}_\textrm{dual}\in \mathbb {Z}_q^{n_\textrm{dual}}\) where \(n=n_\textrm{guess}+n_\textrm{dual}\). The matrix \(\textbf{A}\in \mathbb {Z}_q^{m\times n}\) is correspondingly split into two parts:

$$\begin{aligned} \textbf{A}=\begin{bmatrix} \textbf{A}_\textrm{guess}& \textbf{A}_\textrm{dual}\end{bmatrix}, \qquad \textbf{s}=\begin{bmatrix} \textbf{s}_\textrm{guess}\\ \textbf{s}_\textrm{dual}\end{bmatrix}. \end{aligned}$$
(1)

Therefore, \( \textbf{b}=\textbf{A}_\textrm{guess}\textbf{s}_\textrm{guess}+\textbf{A}_\textrm{dual}\textbf{s}_\textrm{dual}+\textbf{e}. \) The algorithm now makes a guess \(\tilde{\textbf{s}}_\textrm{guess}\in \mathbb {Z}_q^{n_\textrm{guess}}\) on the value of \(\textbf{s}_\textrm{guess}\) and tries to check whether this guess is correct. Consider the lattice

$$\begin{aligned} L_q^\perp (\textbf{A}_\textrm{dual})=\!\!\left\{ \textbf{x}\in \mathbb {Z}^m:\textbf{x}^T\textbf{A}_\textrm{dual}=\textbf{0}\bmod q\right\} . \end{aligned}$$
(2)

By the inequalities of Sect. 2.3, we have that \( \det (L_q^\perp (\textbf{A}_\textrm{dual}))\leqslant q^{n_\textrm{dual}}. \) Check that for any \(\textbf{x}\in L_q^\perp (\textbf{A}_\textrm{dual})\),

$$ \textbf{x}^T\textbf{b} =\textbf{x}^T\textbf{A}_\textrm{guess}\textbf{s}_\textrm{guess}+\textbf{x}^T\textbf{A}_\textrm{dual}\textbf{s}_\textrm{dual}+\textbf{x}^T\textbf{e} =\textbf{x}^T\textbf{A}_\textrm{guess}\textbf{s}_\textrm{guess}+\textbf{x}^T\textbf{e} \pmod q. $$

Therefore, \( \textbf{x}^T(\textbf{b}-\textbf{A}_\textrm{guess}\tilde{\textbf{s}}_\textrm{guess}) =\textbf{x}^T\textbf{A}_\textrm{guess}(\textbf{s}_\textrm{guess}-\tilde{\textbf{s}}_\textrm{guess})+\textbf{x}^T\textbf{e} \pmod q. \) The main observation is now that:

  • if the guess is correct (\(\tilde{\textbf{s}}_\textrm{guess}=\textbf{s}_\textrm{guess}\)) then \(\textbf{x}^T(\textbf{b}-\textbf{A}_\textrm{guess}\tilde{\textbf{s}}_\textrm{guess})=\textbf{x}^T\textbf{e}\pmod q\) follows roughly a modular Gaussian distribution,

  • if the guess is incorrect (\(\tilde{\textbf{s}}_\textrm{guess}\ne \textbf{s}_\textrm{guess}\)) then it follows a uniform distribution because \(\textbf{x}\ne \textbf{0}\) and \(\textbf{A}\) was chosen uniformly at random.

A crucial ingredient in the reasoning above is the length of \(\textbf{x}\). Indeed, the scalar product \(\textbf{x}^T\textbf{e}\) will follow a modular Gaussian whose deviation is proportional to \(\left\| \textbf{x}\right\| \). This is where the BKZ lattice reduction algorithm usually comes in: from a basis of \(L_q^\perp (\textbf{A}_\textrm{dual})\), we compute a short vector \(\textbf{x}\) using Black box 1.

The algorithm for this attack is described in Algorithm 1. We group many LWE samples in N tuples of m samples which we write in matrix form. We then compute one dual vector for each tuple of m LWE samples as explained above. In this attack, the value of m can be chosen arbitrarily and there usually is an optimal value of m that can be computed based on the complexity of computing a short vector, i.e. it depends on the specific instantiation of Black box 1.

While this kind of attack is already known to be correct, we reprove it for several reasons. First, we are not satisfied with the informal treatement of the proof in the literature. Second, our proof does not use any assumption whereas most papers in the literature use the Central Limit Theorem or approximate sums of Gaussian as a Gaussian at some point (see [39, Section 2.4]). Figure 1 gives a high level view of the variable involved and their dependencies.

Theorem 4

([39, Appendix B]). Let \(n,m,\beta \) be integers, q be a prime power, \(n_\textrm{guess}+n_\textrm{dual}=n\), \(\textbf{s}\in \mathbb {Z}_q^n\), \(\sigma _e>0\) and \(N\in \mathbb {N}\). Let \(0<\delta <\varepsilon \) where \(\varepsilon {:}{=}\exp \left( -\pi \sigma _e^2\ell _\textrm{SV}(m,\beta ,q^{n_\textrm{dual}})^2/q^2\right) \) and \(\ell _\textrm{SV}\) comes from Black box 1. Let \((\textbf{A}^{(1)},\textbf{b}^{(1)})\), \(\ldots \), \((\textbf{A}^{(N)},\textbf{b}^{(N)})\) be samples from \({{\,\textrm{LWE}\,}}(m,\textbf{s},D_{\mathbb {Z}_q,\sigma _e})\), then Algorithm 1 on \((m,n_\textrm{guess},n_\textrm{dual},q,\delta ,N,(\textbf{A}^{(i)},\textbf{b}^{(i)})_i)\) runs in time \(\textsf{poly}\!\!\left( m,n\right) \cdot (N\cdot T_\textrm{SV}(m,\beta ,q^n)+q^{n_\textrm{guess}})\) and returns \(\textbf{s}_\textrm{guess}\) with probability at least \( 1-\exp \left( -\frac{N(\varepsilon -\delta )^2}{2}\right) -(q^{n_\textrm{guess}}-1)\exp \left( -\frac{N\delta ^2}{2}\right) \) over the choice of the \((\textbf{A}^{(i)},\textbf{b}^{(i)})\).

Remark 1

As expected, we recover the well-known fact that for the attack to succeed with constant probability, we can take \(\delta =\varepsilon /2\) and then we need at least \(N=\frac{8n_\textrm{guess}\log (q)+\varOmega (1)}{\varepsilon ^2}\) samples. Furthermore, a careful look at the proof shows that Black box 1 can be weakened even further to only require an inequality on the moment-generating function of \(\left\| \mathcal {B}(\textbf{A})\right\| ^2\).

figure h
Fig. 1.
figure 1

Conceptual representation of the variables involved in Algorithm 1.

4 Modern Dual Attack

The main limitation of the basic dual attack is the requirement to compute one short vector for each tuple of m LWE samples. Looking at Fig. 1, this is necessary to ensure the statistical independence of the variables that go into the distinguisher. However, computing a short vector is an expensive operation that we have to repeat many times. Another issue is that the attack requires an exponential number of LWE samples, something which is not always realistic.

As explained in the introduction, a series of work have progressively introduced the idea of generating all short vectors from a limited number of LWE sample, i.e. a single \((\textbf{A},\textbf{b})\). This is the case in [7, 23, 25], and [32] and it dramatically reduces the complexity of the attack. Unfortunately, the statistical analysis of these attacks has been lacking in the literature: [7, 23]Footnote 4 and [25] offer no real proof of correctness to speak of. Only [32] tries to provide a complete proof of correctness, which is very detailed, but has to rely on statistical assumptions. Those assumptions have been called into question [20], and more importantly are extremely difficult to verify. Stepping back, we believe that the reason for this situation is that they try to analyse their attacks using a similar proof strategy to that of our basic dual attack (Sect. 3). However, the basic dual attack requires the independence of many variables to work. Since those variables become dependent in their attack, these papers inevitably have to assume or prove that non-independent quantities are “independent enough”.

In this section, we start completely from scratch: we design and analyze without any assumption a modern dual attack. Our proof scheme is completely different from the basic one and shows that those attacks do work. The main outcome of this proof is that we can finally understand the constraints on the various parameters that are necessary for the attack to work.

4.1 Intuition

Fix \(\textbf{s}\in \mathbb {Z}_q^n\) an unknown secret and \((\textbf{A},\textbf{b})\) some LWE samples. Recall that \(\textbf{b}=\textbf{A}\textbf{s}+\textbf{e}\) for some unknown \(\textbf{e}\in \mathbb {Z}_q^m\). As in the basic dual attack, we split the secret \(\textbf{s}\) into two parts \(\textbf{s}_\textrm{guess}\in \mathbb {Z}_q^{n_\textrm{guess}}\) and \(\textbf{s}_\textrm{dual}\in \mathbb {Z}_q^{n_\textrm{dual}}\) where \(n=n_\textrm{guess}+n_\textrm{dual}\). The matrix \(\textbf{A}\in \mathbb {Z}_q^{m\times n}\) is correspondingly split into two parts:

$$\begin{aligned} \textbf{A}=\begin{bmatrix} \textbf{A}_\textrm{guess}& \textbf{A}_\textrm{dual}\end{bmatrix}, \qquad \textbf{s}=\begin{bmatrix} \textbf{s}_\textrm{guess}\\ \textbf{s}_\textrm{dual}\end{bmatrix}. \end{aligned}$$
(3)

The algorithm now makes a guess \(\tilde{\textbf{s}}_\textrm{guess}\in \mathbb {Z}_q^{n_\textrm{guess}}\) on the value of \(\textbf{s}_\textrm{guess}\) and tries to check whether this guess is correct. Check that

$$\begin{aligned} \textbf{b}-\textbf{A}_\textrm{guess}\cdot \tilde{\textbf{s}}_\textrm{guess}=\textbf{A}_\textrm{guess}\cdot (\textbf{s}_\textrm{guess}-\tilde{\textbf{s}}_\textrm{guess}) +\textbf{A}_\textrm{dual}\cdot \textbf{s}_\textrm{dual}+\textbf{e}. \end{aligned}$$
(4)

Consider the lattice

$$\begin{aligned} L_q^\perp (\textbf{A}_\textrm{dual})=\!\!\left\{ \textbf{x}\in \mathbb {Z}^m:\textbf{x}^T\textbf{A}_\textrm{dual}=\textbf{0}\bmod q\right\} . \end{aligned}$$
(5)

Fix \(N\in \mathbb {N}\) and \(s>0\), and let \(W=(\textbf{w}_1,\ldots ,\textbf{w}_N)\in L_q^\perp (\textbf{A}_\textrm{dual})^N\) be sampled according to \(D_{L_q^\perp (\textbf{A}_\textrm{dual}),qs}^N\). For any \(\textbf{x}\in \mathbb {R}^m\), define

$$\begin{aligned} g_{W}(\textbf{x})=\frac{1}{N}\sum _{j=1}^{N}\cos (2\pi \left\langle \textbf{x},\textbf{w}_j\right\rangle /q) \end{aligned}$$
(6)

for all \(\textbf{x}\in \mathbb {R}^m\). We will evaluate \(g_W\) at \(\textbf{b}-\textbf{A}_\textrm{guess}\cdot \tilde{\textbf{s}}_\textrm{guess}\) for all \(\tilde{\textbf{s}}_\textrm{guess}\in \mathbb {Z}_q^{n_\textrm{guess}}\) and keep the highest value. We now explain the intuition for this. Let \(L=L_q(\textbf{A}_\textrm{dual})\) to simplify notations. Recall that in Sect. 2.2, we have defined the standard periodic Gaussian function \( f_{L,1/s}(\textbf{x})=\frac{\rho _{1/s}(\textbf{x}+L)}{\rho _{1/s}(L)} \) for any \(\textbf{x}\in \mathbb {R}^m\) and \(s>0\). The important fact is that for large N, with high probability on the choice of the \(\textbf{w}_j\), \(g_W\) and \(f_{L,1/s}\) are close everywhere for integer vectors (Lemma 6). This fact essentially comes from [5]. Therefore, it suffices to analyse the behaviour of \(f_{L,1/s}\). For this, we rely on standard Gaussian tailbounds (Lemma 7) to get that for any \(s>0\) and \(\textbf{x}\in \mathbb {R}^m\), we essentially have

$$\begin{aligned} f_{L,1/s}(\textbf{x})\approx \rho _{1/s}({{\,\textrm{dist}\,}}(\textbf{x},L)). \end{aligned}$$
(7)

In other words, \(f_{L,1/s}\) measures the distance to the lattice L.

We are now ready to see what makes the attack work. The intuition is that for most choices of \(\textbf{A}\) and \(\textbf{e}\), for all \(\tilde{\textbf{s}}_\textrm{guess}\in \mathbb {Z}_q^{n_\textrm{guess}}\setminus \!\!\left\{ \textbf{s}_\textrm{guess}\right\} \),

$$\begin{aligned} {{\,\textrm{dist}\,}}(\textbf{b}-\textbf{A}_\textrm{guess}\cdot \textbf{s}_\textrm{guess},L) < {{\,\textrm{dist}\,}}(\textbf{b}-\textbf{A}_\textrm{guess}\cdot \tilde{\textbf{s}}_\textrm{guess},L) \end{aligned}$$
(8)

and therefore

$$ f_{L,1/s}(\textbf{b}-\textbf{A}_\textrm{guess}\cdot \textbf{s}_\textrm{guess}) >f_{L,1/s}(\textbf{b}-\textbf{A}_\textrm{guess}\cdot \tilde{\textbf{s}}_\textrm{guess}) $$

and the same will be true for \(g_W\), which means that the algorithm will correctly output \(\textbf{s}_\textrm{guess}\). This is the main idea of our analysis but making it formal requires some care. The first step (Lemma 8) is to show that essentially

$$\begin{aligned} \text {if }2\left\| \textbf{e}\right\| < \lambda _1(L_q(\textbf{A})) \text { then }f_{L,\tfrac{1}{s}}(\textbf{e})>f_{L,\tfrac{1}{s}}(\textbf{e}+\textbf{x}) \text { for all }\textbf{x}\in L_q(\textbf{A}_\textrm{guess})\setminus L. \end{aligned}$$
(9)

This requires some explanations. Going back to (8), we have that

$$\begin{aligned} {{\,\textrm{dist}\,}}(\textbf{b}-\textbf{A}_\textrm{guess}\cdot \textbf{s}_\textrm{guess},L) &={{\,\textrm{dist}\,}}(\textbf{e}+\textbf{A}_\textrm{dual}\cdot \textbf{s}_\textrm{dual},L) \\ &={{\,\textrm{dist}\,}}(\textbf{e},L) &\text { since }\textbf{A}_\textrm{dual}\cdot \textbf{s}_\textrm{dual}\in L\\ &=\left\| \textbf{e}\right\| &\text {if }\left\| \textbf{e}\right\| <\lambda _1(L)/2. \end{aligned}$$

On the other hand, if \(\tilde{\textbf{s}}_\textrm{guess}\ne \textbf{s}_\textrm{guess}\) then

$$\begin{aligned} {{\,\textrm{dist}\,}}(\textbf{b}-\textbf{A}_\textrm{guess}\cdot \tilde{\textbf{s}}_\textrm{guess},L) &={{\,\textrm{dist}\,}}(\textbf{e}+\textbf{A}_\textrm{dual}\cdot \textbf{s}_\textrm{dual}+\textbf{A}_\textrm{guess}(\textbf{s}_\textrm{guess}-\tilde{\textbf{s}}_\textrm{guess}),L)\\ &={{\,\textrm{dist}\,}}(\textbf{e}+\textbf{A}_\textrm{guess}(\textbf{s}_\textrm{guess}-\tilde{\textbf{s}}_\textrm{guess}),L) \quad \text { since }\textbf{A}_\textrm{dual}\cdot \textbf{s}_\textrm{dual}\in L\\ &={{\,\textrm{dist}\,}}(\textbf{e}+\textbf{x},L) \end{aligned}$$

where

$$ \textbf{x}=\textbf{A}_\textrm{guess}(\textbf{s}_\textrm{guess}-\tilde{\textbf{s}}_\textrm{guess}) \in L_q(\textbf{A}_\textrm{guess}). $$

Assume for now that \(\textbf{x}\in L_q(\textbf{A}_\textrm{guess})\setminus L\) which we will see below is not always true but holds with probability exponentially close to 1 over the choice of \(\textbf{A}\). Then

$$\begin{aligned} {{\,\textrm{dist}\,}}(\textbf{b}-\textbf{A}_\textrm{guess}\cdot \tilde{\textbf{s}}_\textrm{guess},L) &={{\,\textrm{dist}\,}}(\textbf{e}+\textbf{x},L) =\min \!\!\left\{ \left\| \textbf{e}+\textbf{x}+\textbf{z}\right\| :\textbf{z}\in L\right\} \\ &\geqslant \min \!\!\left\{ \left\| \textbf{e}+\textbf{y}+\textbf{z}\right\| : \textbf{z}\in L,\textbf{y}\in L_q(\textbf{A}_\textrm{guess})\setminus L\right\} \\ &\geqslant \min \!\!\left\{ \left\| \textbf{y}+\textbf{z}\right\| : \textbf{z}\in L,\textbf{y}\in L_q(\textbf{A}_\textrm{guess})\setminus L\right\} - \left\| \textbf{e}\right\| \\ &\geqslant \lambda _1(L+L_q(\textbf{A}_\textrm{guess})) - \left\| \textbf{e}\right\| . \end{aligned}$$

The last step holds because \(\textbf{y}+\textbf{z}\ne \textbf{0}\) for all \(\textbf{z}\in L\) and \(\textbf{y}\in L_q(\textbf{A}_\textrm{guess})\setminus L\). This is where our assumption that \(\textbf{x}\in L_q(\textbf{A}_\textrm{guess})\setminus L\) is crucial. The condition in (8) now becomes

$$ \left\| \textbf{e}\right\| < \lambda _1(L+L_q(\textbf{A}_\textrm{guess})) - \left\| \textbf{e}\right\| $$

and this gives us (9) because \(L+L_q(\textbf{A}_\textrm{guess})=L_q(\textbf{A}_\textrm{dual})+L_q(\textbf{A}_\textrm{guess})=L_q(\textbf{A})\).

Now that we have (9), the second step is to apply it to \(\textbf{A}\). Recall that we made a crucial assumption above: it only applies to \(\textbf{e}+\textbf{x}\) for \(\textbf{x}\in L_q(\textbf{A}_\textrm{guess})\setminus L\) where \(\textbf{x}=\textbf{A}_\textrm{guess}(\textbf{s}_\textrm{guess}-\tilde{\textbf{s}}_\textrm{guess})\) and \(\textbf{s}_\textrm{guess}\ne \tilde{\textbf{s}}_\textrm{guess}\). This condition is equivalent to \(\textbf{x}\notin \textbf{A}_\textrm{dual}\mathbb {Z}_q^{n_\textrm{dual}}+q\mathbb {Z}^m\) since \(L=L_q(\textbf{A}_\textrm{dual})\). A sufficient condition for this to hold is that \(\textbf{A}\) has full rank over \(\mathbb {Z}_q\) which happens with probability exponentially close to 1 over the choice of \(\textbf{A}\). This allows us to conclude (Theorem 5) that Algorithm 2, which essentially performs the steps highlighted above, works for almost all \(\textbf{A}\) and \(\textbf{e}\) that satisfy roughly \(2\left\| \textbf{e}\right\| <\lambda _1(L_q(\textbf{A}))\). At this point, one can make two interesting observations:

  • It tells us that if \(2\left\| \textbf{e}\right\| < \lambda _1(L_q(\textbf{A}))\) then we can distinguish \(\textbf{e}\) from any \(\textbf{e}+\textbf{x}\) by using \(f_{L,1/s}\). This makes intuitive sense since this condition guarantees that \(\textbf{e}\) is the closest vector to \(\textbf{0}\) in \(L_q(\textbf{A})\) which is a necessary condition for the algorithm to work unconditionallyFootnote 5

  • Even though we take short vectors in the dual lattice \(L_q(\textbf{A}_\textrm{dual})\), it looks like only the length of the shortest vectors in \(\textbf{A}\) matters for the analysis! This is just a result of the simplifications that we have made above to give the intuition. The length of the dual vectors does play a role in Lemma 8 and the subsequent lemmas.

4.2 Formal Analysis

This section gives a formal analysis of the intuitions from the previous section. We will reuse the notation defined there. Our first lemma formalizes that \(g_W\), defined in (6) and used in the algorithm to compute the “score” of a guess, is very close to the periodic Gaussian function \(f_{L_q(\textbf{A}_\textrm{dual})}\).

Lemma 6

Let \(\textbf{B}\in \mathbb {Z}_q^{m\times n}\), \(s,\delta >0\) and \(N\in \mathbb {N}\). With probability at least \(1-q^m\cdot 2^{-\varOmega (N\delta ^2)}\) over the choice of \(W=(\textbf{w}_1,\ldots ,\textbf{w}_N)\) from \(D_{L_q^\perp (\textbf{B}),qs}^N\), we have \(|g_W(\textbf{x})-f_{L_q(\textbf{B}),1/s}(\textbf{x})|\leqslant \delta \) for all \(\textbf{x}\in \mathbb {Z}^m\), where \(g_W\) is defined in (6) and \(f_{L_q(\textbf{B})}\) is defined in Sect. 2.2.

Proof

Let \(L=L_q(\textbf{B})\) and for any j, let \(\textbf{w}_j'=\tfrac{1}{q}\textbf{w}_j\) and \(W'=(\textbf{w}_j')_{j}\). Since \(\widehat{L}=\tfrac{1}{q}L_q^\perp (\textbf{B})\) and \(D_{L_q(\textbf{B}),qs}=D_{q\widehat{L},qs}=D_{\widehat{L},s}\), we indeed have that \(W'\) is sampled from \(D_{\widehat{L},s}^N\) which is a probability distribution over \(\widehat{L}\). Let \(h=f_{L,1/s}\) which is L-periodic, then \(\widehat{h}=D_{\widehat{L},s}\) by Lemma 3. For any \(\textbf{x}\in \mathbb {R}^m\), \(g_W(\textbf{x})=h_{W'}(\textbf{x})\) where \(h_{W}\) is defined in Lemma 5. Apply Lemma 5 to h with \(X=\!\!\left\{ 0,\ldots ,q-1\right\} ^m\) to get that with probability at least \(1-|X|2^{-\varOmega (N\delta ^2)}\) over the choice of \(W'\), we have \(|h(\textbf{x})-h_{W'}(\textbf{x})|\leqslant \delta \) for all \(\textbf{x}\in L+X\). But \(L=L_q(\textbf{B})\) is a q-ary lattice, i.e. \(q\mathbb {Z}^m\subset L\) so \(L+X\supset q\mathbb {Z}^m+\!\!\left\{ 0,\ldots ,q-1\right\} ^m=\mathbb {Z}^m\) which concludes.    \(\square \)

The next lemma formalizes the idea that the periodic Gaussian function \(f_L\) estimates the distance of its argument and the lattice L.

Lemma 7

Let \(L\subset \mathbb {R}^m\) and \(s>0\), then for any \(\textbf{x}\in \mathbb {R}^m\):

  • \(f_{L,1/s}(\textbf{x})\geqslant \rho _{1/s}({{\,\textrm{dist}\,}}(\textbf{x},L))\),

  • if \({{\,\textrm{dist}\,}}(\textbf{x},L)\geqslant \tau {:}{=}\tfrac{1}{s}\sqrt{m/2\pi }\) then \(f_{L,1/s}(\textbf{x})\leqslant \rho _{1/s}({{\,\textrm{dist}\,}}(\textbf{x},L)-\tau )\).

Proof

The first fact is a direct consequence of Lemma 1. Indeed, write \(\textbf{x}=\textbf{z}+\textbf{t}\) where \(\textbf{z}\in L\) and \(\textbf{t}\in \mathbb {R}^m\) are such that \({{\,\textrm{dist}\,}}(\textbf{x},L)=\left\| \textbf{t}\right\| \). Since \(f_{L,1/s}\) is L-periodic and \(\textbf{z}\in L\), \( f_{L,1/s}(\textbf{x}) =f_{L,1/s}(\textbf{x}-\textbf{z}) =f_{L,1/s}(\textbf{t}) \geqslant \rho _{1/s}(\textbf{t}) =\rho _{1/s}(\left\| \textbf{t}\right\| ). \) For the second fact, let \(\ell ={{\,\textrm{dist}\,}}(\textbf{x},L)\) and observe that by definition \((L-\textbf{x})\setminus B_m(\ell )=L-\textbf{x}\). By assumption, \(\ell \geqslant \tau {:}{=}\tfrac{1}{s}\sqrt{m/2\pi }\), so we can apply Corollary 1 to get that \( \rho _{1/s}((L-\textbf{x})\setminus B_m(\ell )) \leqslant \rho _{1/s}(\ell -\tau )\rho _{1/s}(L) \) and therefore

$$ f_{L,1/s}(\textbf{x}) =\frac{\rho _{1/s}(L-\textbf{x})}{\rho _{1/s}(L)} =\frac{\rho _{1/s}((L-\textbf{x})\setminus B_m(\ell ))}{\rho _{1/s}(L)} \leqslant \rho _{1/s}(\ell -\tau ). $$

   \(\square \)

Lemma 8

Let \(\textbf{B}\in \mathbb {Z}_q^{m\times n}\), \(L\subset \mathbb {Z}^m\) a lattice, \(\textbf{e}\in \mathbb {Z}^m\), \(s, \delta > 0\) and \(N\in \mathbb {N}\). Let \(\tau =\tfrac{1}{s}\sqrt{m/2\pi }\) and \(\eta \geqslant 0\) and assume that \(\lambda _1(L+L_q(\textbf{B}))\geqslant \tau +\left\| \textbf{e}\right\| \) and

$$ \rho _{1/s}(\textbf{e}) -\rho _{1/s}(\lambda _1(L+L_q(\textbf{B}))-\left\| \textbf{e}\right\| -\tau ) >2\delta +\eta . $$

Then, with probability at least \(1-q^m\cdot 2^{-\varOmega (N\delta ^2)}\) over the choice of \(W=(\textbf{w}_1,\ldots ,\textbf{w}_N)\) from \(D_{L_q^\perp (\textbf{B}),qs}^N\), we have

$$ g_W(\textbf{e}) \geqslant \rho _{1/s}(\textbf{e})-\delta > \rho _{1/s}(\lambda _1(L+L_q(\textbf{B}))-\left\| \textbf{e}\right\| -\tau )+\delta +\eta \geqslant g_W(\textbf{e}+\textbf{x})+\eta $$

for all \(\textbf{x}\in L\setminus L_q(\textbf{B})\), where \(g_W\) is defined in (6).

Proof

Apply Lemma 6 to get that with probability at least \(1-q^m\cdot 2^{-\varOmega (N\delta ^2)}\) over the choice of \(\textbf{w}_1,\ldots ,\textbf{w}_N\) i.i.d. from \(D_{L_q^\perp (\textbf{B}),qs}\), we have \(|g_W(\textbf{y})-f_{L_q(\textbf{B}),1/s}(\textbf{y})|\leqslant \delta \) for all \(\textbf{y}\in \mathbb {Z}^m\). By Lemma 7, we have \( g_W(\textbf{e}) \geqslant f_{L_q(\textbf{B}),1/s}(\textbf{e})-\delta \geqslant \rho _{1/s}(\textbf{e})-\delta . \)

Let \(\textbf{x}\in L\setminus L_q(\textbf{B})\), then \(\textbf{z}-\textbf{x}\in L+L_q(\textbf{B})\) and \(\textbf{z}-\textbf{x}\ne \textbf{0}\) for any \(\textbf{z}\in L_q(\textbf{B})\). As a result, \(L_q(\textbf{B})-\textbf{x}\subseteq (L+L_q(\textbf{B}))\setminus \!\!\left\{ \textbf{0}\right\} \). Hence,

$$\begin{aligned} {{\,\textrm{dist}\,}}(\textbf{x},L_q(\textbf{B})) =\min _{\textbf{z}\in L_q(\textbf{B})}\left\| \textbf{x}+\textbf{z}\right\| \geqslant \min _{\textbf{y}\in (L+L_q(\textbf{B}))\setminus \!\!\left\{ \textbf{0}\right\} }\left\| \textbf{y}\right\| =\lambda _1(L+L_q(\textbf{B})) \geqslant \tau +\left\| \textbf{e}\right\| . \end{aligned}$$
(10)

But then

$$\begin{aligned} {{\,\textrm{dist}\,}}(\textbf{e}+\textbf{x},L_q(\textbf{B})) \geqslant {{\,\textrm{dist}\,}}(\textbf{x},L_q(\textbf{B}))-\left\| \textbf{e}\right\| \geqslant \tau . \end{aligned}$$
(11)

We can therefore apply Lemma 7 to get that for any \(\textbf{x}\in L\setminus \!\!\left\{ \textbf{0}\right\} \),

$$ g_W(\textbf{e}+\textbf{x}) \leqslant f_{L_q(\textbf{B}),1/s}(\textbf{e}+\textbf{x})+\delta \leqslant \rho _{1/s}({{\,\textrm{dist}\,}}(\textbf{e}+\textbf{x},L_q(\textbf{B}))-\tau )+\delta . $$

Since \(\rho _{1/s}:[0,\infty )\rightarrow \mathbb {R}\) is decreasing, and reusing (10) and (11) we further have

$$\begin{aligned} \rho _{1/s}({{\,\textrm{dist}\,}}(\textbf{e}+\textbf{x},L_q(\textbf{B}))-\tau ) &\leqslant \rho _{1/s}({{\,\textrm{dist}\,}}(\textbf{x},L_q(\textbf{B}))-\left\| \textbf{e}\right\| -\tau )\\ &\leqslant \rho _{1/s}(\lambda _1(L+L_q(\textbf{B}))-\left\| \textbf{e}\right\| -\tau ). \end{aligned}$$

Putting everything together, and using our assumption, we have

$$ g_W(\textbf{e})-g_W(\textbf{e}+\textbf{x}) \geqslant \rho _{1/s}(\textbf{e})-\rho _{1/s}(\lambda _1(L+L_q(\textbf{B}))-\left\| \textbf{e}\right\| -\tau ) -2\delta >\eta \quad $$

   \(\square \)

We can now state our main result by putting everything together. It will be useful to note that \(L_q(\textbf{A}_\textrm{guess})+L_q(\textbf{A}_\textrm{dual})=L_q(\textbf{A})\) which is readily verified.

figure i

Theorem 5

Let \(\textbf{A}\in \mathbb {Z}_q^{m\times n}\), \(\textbf{e}\in \mathbb {Z}^m\), \(\textbf{s}\in \mathbb {Z}_q^n\), \(s,\delta >0\) and \(N\in \mathbb {N}\). Let \(\tau =\tfrac{1}{s}\sqrt{m/2\pi }\). Assume that \(m\geqslant n\), \(\textbf{A}\) has full rank, \(\lambda _1(L_q(\textbf{A}))\geqslant \tau +\left\| \textbf{e}\right\| \), and

$$ \rho _{1/s}(\textbf{e}) -\rho _{1/s}(\lambda _1(L_q(\textbf{A}))-\left\| \textbf{e}\right\| -\tau ) >2\delta . $$

Let \(\textbf{b}=\textbf{A}\textbf{s}+\textbf{e}\bmod q\). Let \(W=(\textbf{w}_1,\ldots ,\textbf{w}_N)\) be samples from \(D_{L_q^\perp (\textbf{A}_\textrm{dual}),qs}^N\), then Algorithm 2 on \((m,n_\textrm{guess},n_\textrm{dual},q,N,(\textbf{A},\textbf{b}),W)\) runs in time \(\textsf{poly}\!\!\left( m,n\right) \cdot (N+q^{n_\textrm{guess}})\) and returns \(\textbf{s}_\textrm{guess}\) with probability at least \(1-q^m\cdot 2^{-\varOmega (N\delta ^2)}\) over the choice of W.

Proof

Let \(\textbf{B}=\textbf{A}_\textrm{dual}\) and \(L=L_q(\textbf{A}_\textrm{guess})\). Then \(L+L_q(\textbf{B})=L_q(\textbf{A})\). Our assumptions are therefore exactly that of Lemma 8 for \(\eta =0\) which we can apply to get that with probability at least \(1-q^m\cdot 2^{-\varOmega (N\delta ^2)}\) over the choice of \(W=(\textbf{w}_1,\ldots ,\textbf{w}_N)\) from \(D_{L_q^\perp (\textbf{B}),qs}^N=D_{L_q^\perp (\textbf{A}_\textrm{dual}),qs}^N\), we have

$$\begin{aligned} g_W(\textbf{e}) > g_W(\textbf{e}+\textbf{x}) \end{aligned}$$
(12)

for all \(\textbf{x}\in L\setminus L_q(\textbf{A}_\textrm{dual})\), where \(g_W\) is defined in (6). Furthermore, \(\textbf{A}\) has full rank and \(m\geqslant n\) so its columns are linearly independent over \(\mathbb {Z}_q\) and

$$\begin{aligned} L\setminus L_q(\textbf{A}_\textrm{dual}) =L_q(\textbf{A}_\textrm{guess})\setminus L_q(\textbf{A}_\textrm{dual}) =L_q(\textbf{A}_\textrm{guess})\setminus q\mathbb {Z}^m. \end{aligned}$$
(13)

Assume that we are in the case where W satisfies the above inequalities and consider the run of Algorithm 2 on \((m,n_\textrm{guess},n_\textrm{dual},q,N,(\textbf{A},\textbf{b}),W)\). The algorithm tests all possible values of \(\mathbf {\tilde{s}}_\textrm{guess}\in \mathbb {Z}_q^{n_\textrm{guess}}\) and returns the one that maximizes S. Let \(\mathbf {\tilde{s}}_\textrm{guess}\in \mathbb {Z}_q^{n_\textrm{guess}}\) and \(\varDelta \mathbf {\tilde{s}}_\textrm{guess}=\textbf{s}_\textrm{guess}-\mathbf {\tilde{s}}_\textrm{guess}\). First note that

$$\begin{aligned} \textbf{b}-\textbf{A}_\textrm{guess}\tilde{\textbf{s}}_\textrm{guess}&=(\textbf{A}\textbf{s}+\textbf{e}\bmod q)-\textbf{A}_\textrm{guess}\tilde{\textbf{s}}_\textrm{guess}\\ &=\textbf{A}_\textrm{dual}\textbf{s}_\textrm{dual}+\textbf{A}_\textrm{guess}\varDelta \mathbf {\tilde{s}}_\textrm{guess}+\textbf{e}\bmod q. \end{aligned}$$

For any j, let \(y_j(\mathbf {\tilde{s}}_\textrm{guess})\) be the value computed at Line 3. Note that

figure j

Let \(S(\mathbf {\tilde{s}}_\textrm{guess})\) be the value computed at Line 4 and check that

$$\begin{aligned} S(\mathbf {\tilde{s}}_\textrm{guess}) &=\sum \nolimits _{j=1}^{N}\cos (2\pi y_j(\mathbf {\tilde{s}}_\textrm{guess})/q)\\ &=\sum \nolimits _{j=1}^{N}\cos (2\pi \textbf{w}_j^T(\textbf{A}_\textrm{guess}\varDelta \mathbf {\tilde{s}}_\textrm{guess}+\textbf{e})/q) \qquad \text {by periodicity of }\cos \\ &=Ng_W(\textbf{A}_\textrm{guess}\varDelta \mathbf {\tilde{s}}_\textrm{guess}+\textbf{e}). \end{aligned}$$

There are two cases to distinguish:

  • If \(\mathbf {\tilde{s}}_\textrm{guess}=\textbf{s}_\textrm{guess}\) then \(S(\mathbf {\tilde{s}}_\textrm{guess})=Ng_W(\textbf{e})\).

  • If \(\mathbf {\tilde{s}}_\textrm{guess}\ne \textbf{s}_\textrm{guess}\) then \(S(\mathbf {\tilde{s}}_\textrm{guess})=Ng_W(\textbf{e}+\textbf{x})\) where \(\textbf{x}=\textbf{A}_\textrm{guess}\varDelta \mathbf {\tilde{s}}_\textrm{guess}\in L_q(\textbf{A}_\textrm{guess})=L\). But \(\textbf{A}\) (and hence \(\textbf{A}_\textrm{guess}\)) has full rank by assumption and \(\varDelta \mathbf {\tilde{s}}_\textrm{guess}\ne \textbf{0}\) so \(\textbf{x}\ne \textbf{0}\bmod q\). It follows by (13), \(\textbf{x}\in L_q(\textbf{A}_\textrm{dual})\setminus q\mathbb {Z}^m=L\setminus L_q(\textbf{A}_\textrm{dual})\). Hence, by (12), \(S(\mathbf {\tilde{s}}_\textrm{guess})<Ng_W(\textbf{e})=S(\textbf{s}_\textrm{guess})\).

This shows that \(S(\textbf{s}_\textrm{guess})>S(\mathbf {\tilde{s}}_\textrm{guess})\) for all \(\mathbf {\tilde{s}}_\textrm{guess}\ne \textbf{s}_\textrm{guess}\). Therefore, Algorithm 2 correctly returns \(\textbf{s}_\textrm{guess}\). Note that the entire argument was under the assumption that (12) holds for W, which we already argued holds with probability at least \(1-q^m\cdot 2^{-\varOmega (N\delta ^2)}\).

The naive analysis of the complexity is straightforward and gives \( q^{n_\textrm{guess}}\cdot \textsf{poly}\!\!\left( m,n\right) \cdot N. \) By using the DFT trick as we did in the proof of Theorem 4 we can improve the running time to \( \textsf{poly}\!\!\left( m,n\right) \cdot (N+q^{n_\textrm{guess}}). \)

4.3 Informal Application

Choosing the parameters in order to apply Theorem 5 is not immediately obvious. In this section, we explain how to do so in a concrete case of interest. In order to simplify things, we will neglect some factors and point out the various lemmas that can be used to make this reasoning completely formal.

Fix nm and let q be a prime power. Let \(\textbf{s}\in \mathbb {Z}_q^n\) be a secret and \(\sigma _e>0\). Let \((\textbf{A},\textbf{b})\) be sampled from \({{\,\textrm{LWE}\,}}(m,\textbf{s},D_{\mathbb {Z}_q,\sigma _e})\), and \(\textbf{e}\) so that \(\textbf{b}=\textbf{A}\textbf{s}+\textbf{e}\). By Corollary 1, we have

$$\begin{aligned} \left\| \textbf{e}\right\| \lessapprox \sigma _e\sqrt{m/2\pi } \end{aligned}$$
(14)

with high probability. Let \(s>0\) to be defined later. We choose \(\delta \) to be quite smaller than the smallest possible value \(\rho _{1/s}(\left\| \textbf{e}\right\| )\), for example

$$\begin{aligned} \delta =\tfrac{1}{100}\rho _{1/s}(\sigma _e\sqrt{m/2\pi }) =\tfrac{1}{100}e^{-ms^2\sigma _e^2/2}. \end{aligned}$$
(15)

We choose N accordingly so that the success probability is very high, i.e.

$$\begin{aligned} N=\frac{\textsf{poly}\!\!\left( m\right) +n\log _2(q)}{\delta ^2}. \end{aligned}$$
(16)

\(\textbf{A}\) has full rank with high probability and therefore \(\det (L_q(\textbf{A}))=q^{m-n}\). By Theorem 3, and the informal Corollary 2, we have

$$ \lambda _1(L_q(\textbf{A}))\gtrapprox GH(L_q(\textbf{A})) ={{\,\textrm{vol}\,}}(B_m)^{-1/n}q^{1-m/n} \approx \sqrt{\frac{m}{2\pi e}}q^{1-n/m}. $$

Let \(\tau =\tfrac{1}{s}\sqrt{m/2\pi }\). In order to apply Theorem 5, we need to satisfy the conditions

$$ \lambda _1(L_q(\textbf{A}))\geqslant \tau +\left\| \textbf{e}\right\| \quad \text {and}\quad \rho _{1/s}(\textbf{e}) -\rho _{1/s}(\lambda _1(L_q(\textbf{A}))-\left\| \textbf{e}\right\| -\tau ) >2\delta . $$

Since we have chosen \(\delta \) to be very small compared to \(\rho _{1/s}(\textbf{e})\), those inequalities can be shown (see [39, Appendix D]) to be essentially equivalent to

$$ \lambda _1(L_q(\textbf{A}))\geqslant \tau +2\left\| \textbf{e}\right\| . $$

This condition will be satisfied when \( \sqrt{\frac{m}{2\pi e}}q^{1-n/m}\geqslant \tfrac{1}{s}\sqrt{m/2\pi }+2\sigma _e\sqrt{m/2\pi } \) that is

$$\begin{aligned} q^{1-n/m}\geqslant (\tfrac{1}{s}+2\sigma _e)\sqrt{e}. \end{aligned}$$
(17)

In other words, we have a lower bound on s. We observe that there is a trade-off between the cost of sampling from \(D_{L_q^\perp (\textbf{A}_\textrm{dual}),qs}\) and the cost of running Algorithm 2 since a large value of s:

  • makes it easy to sample from \(D_{L_q^\perp (\textbf{A}_\textrm{dual}),qs}\),

  • but makes \(\delta =\frac{1}{100}\rho _{1/s}(\sigma _e\sqrt{m/2\pi })\) small and therefore \(N=\varOmega (\delta ^{-2})\), and the complexity, gigantic.

We note that the total complexity of the attack, including the cost of generating the small dual vectors, is a highly nontrivial function of the parameters. Consequently, it is not at all clear that the optimal choice of s is the lower bound identified above. We will analyze the complexity in greater detail in the next section.

4.4 Complexity Estimates

In this section, we describe how to concretely estimate the complexity of the attack described in Sect. 4 and provide numbers for Kyber. We continue with the setup from the previous section (Sect. 4.3) which we do not repeat. Recall that by Theorem 5, the complexity of the attack, to which we add the cost \(T_{\textrm{sampling}}(N,qs)\) of sampling N independent Gaussian vectors according to \(D_{L_q^\perp (\textbf{A}_\textrm{dual}),qs}\) is

$$\begin{aligned} \textsf{poly}\!\!\left( m,n\right) \cdot (N+q^{n_\textrm{guess}})+T_{\textrm{sampling}}(N,qs) \end{aligned}$$
(18)

and it succeeds with very high probabability given the choice of the parameters above. For the sampling of the dual vectors, we propose the following approach: given a block size \(2\leqslant \beta \leqslant m\),

  1. 1.

    compute a basis of \(L_q^\perp (\textbf{A}_\textrm{dual})\),

  2. 2.

    run BKZ with block size \(\beta \) on this basis to obtain a reduced basis \(\textbf{B}\),

  3. 3.

    use the Markov chain Monte Carlo (MCMC) based Gaussian sampler from [46] (Theorem 1) for parameter qs with basis \(\textbf{B}\) to generate N independent samples.

The complexity of this procedure is

$$\begin{aligned} T_{\textrm{sampling}}(N)=T_{\textrm{BKZ}}(m,\beta )+N\cdot T_{\textrm{MCMC}}(L_q^\perp (\textbf{A}_\textrm{dual}),qs) \end{aligned}$$
(19)

where \(T_{\textrm{BKZ}}(m,\beta )\) is the cost of BKZ and \(T_{\textrm{MCMC}}(L,s)\) is the cost of producing one sample from \(D_{L,s}\). We apply Theorem 1 to get that

$$\begin{aligned} T_{\textrm{MCMC}}(L_q^\perp (\textbf{A}_\textrm{dual}),qs)=\ln \left( \tfrac{1}{\varepsilon }\right) \cdot \tfrac{1}{\varDelta }\cdot \textsf{poly}\!\!\left( n\right) , \qquad \varDelta =\frac{\rho _{qs}(L_q^\perp (\textbf{A}_\textrm{dual}))}{\prod _{i=1}^n\rho _{qs/\left\| \widetilde{\textbf{b}}_i\right\| }(\mathbb {Z})} \end{aligned}$$
(20)

where \(\widetilde{\textbf{b}}_1,\ldots ,\widetilde{\textbf{b}}_n\) are the Gram-Schmidt vectors of the BKZ-\(\beta \)-reduced basis \(\textbf{B}\) of \(L_q^\perp (\textbf{A}_\textrm{dual})\) and \(\varepsilon >0\). Note that the output distribution of the algorithm is \(\varepsilon \)-close to the discrete Gaussian. Since we are going to use N samples, and by the data processing inequality, this translates into a failure probability of \(N\varepsilon \) for the algorithm, so we need to choose \(\varepsilon \) to be quite small, e.g. \(\varepsilon \ll 1/N\). Putting (18) and (19) together we get that the total complexity of the attack is

$$\begin{aligned} \textsf{poly}\!\!\left( m,n\right) \cdot (N+q^{n_\textrm{guess}})+T_{\textrm{BKZ}}(m,\beta ) +N\cdot T_{\textrm{MCMC}}(L_q^\perp (\textbf{A}_\textrm{dual}),qs) \end{aligned}$$
(21)

subject to the constraints (14), (15), (16), (17) which we summarize below:

$$\begin{aligned} \delta &=\tfrac{1}{100}e^{-ms^2\sigma _e^2/2}, & N &=\frac{\textsf{poly}\!\!\left( m\right) +n\log _2(q)}{\delta ^2}, \\ q^{1-n/m} &\geqslant (\tfrac{1}{s}+2\sigma _e)\sqrt{e}, & \left\| \textbf{e}\right\| &\lessapprox \sigma _e\sqrt{m/2\pi }, \\ \varepsilon &\ll 1/N. \end{aligned}$$

In practice, computing \(\varDelta \) with (20) is nontrivial. One can show that (see [39, Appendix E])

$$\begin{aligned} \frac{1}{\varDelta }\leqslant \prod _{i=1}^m\rho _{\left\| \widetilde{\textbf{b}}_i\right\| /qs}(\mathbb {Z}) \end{aligned}$$
(22)

which is easier to estimate but still requires to estimate the \(\left\| \widetilde{\textbf{b}}_i\right\| \). For this, we can assume that the Geometric Series Assumption (GSA) [42] holds for BKZ-\(\beta \) reduced basis. The GSA is known to be reasonably accurate when \(\beta \ll m\) and \(\beta \gg 50\) which is the case in our experiments, but it does not correctly model what happens in the last \(m-\beta \) coordinates [1]. For our purpose, we consider the GSA to be enough to obtain credible estimates on the complexity.

Independently of the GSA, however, formula (22) is expensive to compute due the product of m terms. Indeed, we will need to compute this quantity many times in our optimizer to find a good set of parameters (see below). For the estimates below, we use (22) and the GSA to compute the final complexity estimate but we use the approximate formula below in the parameter optimizer which is very cheap to compute:

$$\begin{aligned} \text {if}\quad \left\| \textbf{b}_1\right\| \leqslant 2qs \quad \text {then} \quad \frac{1}{\varDelta }\lessapprox \exp \left( \log \left( 1+2e^{-\pi \alpha }\right) +\frac{2}{\ln (H_\beta ^4)}E_1\left( \pi \alpha \right) \right) \end{aligned}$$
(23)

where \(\alpha =(qs)^2/\left\| \textbf{b}_1\right\| ^2\), \(E_1\) is the generalized exponential integral and \(H_\beta \) is the Hermite factor for BKZ-\(\beta \) reduced basis. See [39, Appendix E] for more details.

In order to find a good set of parameters, we wrote an optimizer that tries all reasonable values of m, \(\beta \) and \(n_\textrm{guess}\), and sets s to

$$ s=\max \left( \frac{\sqrt{e}}{q^{1-n/m} - 2\sqrt{e}\sigma _e}, \frac{\left\| \textbf{b}_1\right\| }{2q} \right) $$

so that we can use (23). We also limit the range of m to \([\tfrac{3}{2}n,2n]\) so that the ratio m/n is not too close to 0 and 1.

In Table 1, we give the complexity estimates of our algorithm, computed by our optimizer. The first set of columns corresponds to the algorithm analyzed above, including the use of the GSA to estimate the complexity of the Gaussian sampling as described in [39, Appendix E]. To estimate the complexity of BKZ, we use the cost estimates in [8, 32] using [13] as the sieving oracle; specifically, we rely on the “lattice estimator” of [9].

Those cost are not competitive with the state of the art because our algorithm does not include modulus switching. Modulus switching is a critical component to reduce the complexity but its formal analysis is nontrivial and therefore we decided not to include it in this paper. In order to get an idea of what our algorithm extended with modulus switching would give, we include a second set of columns where we simply replace \(q^{n_\textrm{guess}}\) by \(2^{n_\textrm{guess}}\) in (21) which would correspond to switching the modulus to 2 in the guessing part. We emphasize that this is only a very rough estimate and not a formal analysis. The real complexity with modulus switching will most likely be higher than what we report. Furthermore, all our complexity estimates ignore the polynomial factors.

Table 1. Dual attack cost estimates and their parameters as described in Sect. 4.4. All costs are logarithms in base two. Note that the cost of attacks with modulus switching are estimates of what an algorithm with modulus switching could give if the algorithm of Sect. 4 was extended with modulus switching.

5 Quantum Dual Attack

In this section, we present a quantum version of Algorithm 2 and show that we can obtain a speed-up on the complexity. The technique is inspired by [10] which was never published and is a quantum variant of [32].

5.1 Algorithm and Analysis

We will need a quantum algorithm which estimates the mean value of \(\cos (2\pi (\langle \textbf{w}_i,\textbf{b}\rangle )/q)\) where the \(\textbf{w}_i\) are vectors accessible via a quantum oracle. This mean value can be used to compute the DFT sums in the algorithm much faster than with a classical computer. The idea is inspired by [2, Theorem 47] and can be seen as a special case of quantum speedup of Monte Carlo methods [37]. For more background on quantum algorithms, we refer the readers to [10, Sections 2.4 and 4].

Theorem 6

([10, Theorem 5]). Let \(N\) be a positive integer and \(W\) be a list of \(N\) vectors in \(\mathbb {Z}^n\): \(\textbf{w}_0,\dots ,\textbf{w}_{N-1}\). Let \(f_{W}(\textbf{b})=\tfrac{1}{N}\sum _{i=0}^{N-1} \cos (2\pi (\langle \textbf{w}_i,\textbf{b}\rangle )/q)\), where \(\textbf{b}\in \mathbb {Z}_q^n\). Let \(\mathcal {O}_W\) be defined by \( \mathcal {O}_W: |{j}\rangle |{0}\rangle \mapsto |{j}\rangle |{\textbf{w}_j}\rangle . \) For any \(\epsilon ,\delta >0\), there exists a quantum algorithm \(\mathcal {A}\) that given \(\textbf{b}\in \mathbb {Z}_q^n\) and oracle access to \(\mathcal {O}_W\) outputs \(\mathcal {A}^{\mathcal {O}_W}(\textbf{b})\) which satisfies \(|\mathcal {A}^{\mathcal {O}_W}(\textbf{b})-f_{W}(\textbf{b})|\le \epsilon \) with probability \(1-\delta \). The algorithm makes \(\mathcal {O}(\epsilon ^{-1}\cdot \log \frac{1}{\delta })\) queries to \(\mathcal {O}_W\), and requires \(O(\log (\frac{1}{\epsilon })+\textsf{poly}\!\!\left( \log (n)\right) )\) qubits.

We will have to search for a minimum element in a collection but the oracle that computes the value of each element is probabilistic and may return a wrong result with small probability. We say that a (probabilistic) real function f has bounded error if there exists \(x\in \mathbb {R}\) such that f() returns x with probability at least 9/10. The problem of finding the minimum in a collection (without errors) has been studied in [21, Theorem 1]. On the other hand, the problem of searching for a marked element in a collection with bounded-error oracle has been studied in [26]. This idea can easily be used to adapt the algorithm of [21] to bounded-error oracles. Indeed, the algorithm in [21] simply performs a constant number of Grover searches by marking nodes that are bigger than the current value. Therefore it suffices to replace this Grover search by the algorithm of [26].

Theorem 7

([26]\(\mathbf {+}\)[21]). Given n algorithms, quantum or classical, each computing some real value with bounded error probability, there is a quantum algorithm that makes an expected \(O(\sqrt{n})\) queries and with probability at least 9/10 returns the index of the minimum among the n values. This algorithm uses \(\textsf{poly}\!\!\left( \log (n)\right) \) qubits.

figure k

Theorem 8

([39, Appendix F.1]). Let \(\textbf{A}\in \mathbb {Z}_q^{m\times n}\), \(\textbf{e}\in \mathbb {Z}^m\), \(\textbf{s}\in \mathbb {Z}_q^n\), \(s,\delta >0\) and \(N\in \mathbb {N}\). Let \(\tau =\tfrac{1}{s}\sqrt{m/2\pi }\) and \(\eta >0\). Assume that \(m\geqslant n\), \(\textbf{A}\) has full rank, \(\lambda _1(L_q(\textbf{A}))\geqslant \tau +\left\| \textbf{e}\right\| \), and

$$ \rho _{1/s}(\textbf{e}) -\rho _{1/s}(\lambda _1(L_q(\textbf{A}))-\left\| \textbf{e}\right\| -\tau ) >2\delta +\eta . $$

Let \(\textbf{b}=\textbf{A}\textbf{s}+\textbf{e}\bmod q\). Let \(W=(\textbf{w}_1,\ldots ,\textbf{w}_N)\) be samples from \(D_{L_q^\perp (\textbf{A}_\textrm{dual}),qs}^N\) and \(\mathcal {O}_W\) an oracle for W in the sense of Theorem 6. Then Algorithm 3 on \(m,n_\textrm{guess},n_\textrm{dual},q\), \(N,\eta /2,(\textbf{A},\textbf{b}),\mathcal {O}_W\) makes an expected \(O\left( \eta ^{-1}\cdot q^{n_\textrm{guess}/2}\right) \) calls to \(\mathcal {O}_W\) and returns \(\textbf{s}_\textrm{guess}\) with probability at least \(1-q^m\cdot 2^{-\varOmega (N\delta ^2)}\) over the choice of W. The algorithm uses \(O(\log (\eta ^{-1})+\textsf{poly}\!\!\left( \log (N)\right) \) qubits.

In terms of proofs, the correctness of the quantum algorithm is very similar to the classical one. The main difference is that we use Theorem 6 to compute \(g_W\) which only returns an approximation. This adds an additional error term that we can take into account in Lemma 8 using \(\eta \).

5.2 Applications

In order to apply Theorem 8, one needs to provide an oracle \(\mathcal {O}_W\) to access the samples. The implementation of this oracle has a significant impact on the complexity since it is queried an exponential number of times by the algorithm. We outline two possible implementations. Before that, note that in practice we will usually choose \(\eta \) to be a small value compared to \(\delta \) in Theorem 8, say \(\eta =\delta /100\). This way, \(\eta \) has almost no influence on the maximum length of the errors \(\textbf{e}\) that we can handle.

BKZ Preprocessing with a Quantum Klein Sampler. For a value of s that is not too small, one can first compute a basis \(L_q^\perp (\textbf{A}_\textrm{dual})\) of the dual lattice and reduce it using BKZ with block size \(\beta \) to obtain a new basis \(\textbf{M}\). One then creates a quantum circuit that implements the Klein sampler [24] with \(\textbf{M}\) hard-coded in the circuit. This circuit will be the oracle \(\mathcal {O}_W\). In the details, the Klein sampler is a probabilistic algorithm so we can view it as a deterministic algorithm that takes random coins (and \(\textbf{M}\)) as input. We can see the input j of the oracle as the value of the random coins so that the outputs \(\textbf{w}_1,\ldots ,\textbf{w}_N\) that correspond to inputs \(1,\ldots ,N\) are distributed according to the Gaussian distribution. Since the Klein sampler runs in polynomial time, each call to \(\mathcal {O}_W\) takes polynomial time. The BKZ preprocessing is purely classical and done only once before the quantum algorithm runs. This means that the total runtime will beFootnote 6, per Theorem 8

$$ T_{\text {BKZ}}(\beta )+\sqrt{N}\cdot q^{n_\textrm{guess}/2}\cdot \textsf{poly}\!\!\left( \log (m)\right) . $$

This is always better than the classical complexity since \(\sqrt{N\cdot q^{n_\textrm{guess}}}\leqslant N+q^{n_\textrm{guess}}\). Note that when using a Klein sampler, the value of s is a function of the quality of the basis \(\textbf{M}\) and therefore depends on \(\beta \). Furthermore, it is impossible for s to be smaller than the smoothing parameter of the lattice this way. Alternatively, one could also use the MCMC sampler that we used in Sect. 4.4: although its running time is not polynomial, it only uses polynomial memory so it would still only require a polynomial number of qubits, and it allows one to choose smaller values of s which seems to be quite beneficial. Note that in both cases (Klein and MCMC), we get no quantum speed up on the sampling.

Classical Sampler with a Quantum Memory. A feature of the Klein sampler is that it can output an arbitrary number of samples and the running time is proportional to the number of samples. This is not the case of all samplers. For example, [4] describes Gaussian samplers that works for smaller values of s than the smoothing parameter and produces \(2^{n/2}\) samples but runs in time \(2^{n}\), even if we only require one sample. [3] contains another such algorithm with a time-space trade-off. Using such samplers with our quantum algorithm is problematic because the samples are produced and stored in a classical memory, but the algorithm requires quantum oracle access to those samples. We have two options:

  • We can assume that we have access to a QRACM (classical memory with quantum random access) [29]. A QRACM of size N is a special quantum memory holding N classical values but providing \(O(\log (N))\)-time quantum access to those values. Such a QRACM directly implements the oracle \(\mathcal {O}_W\) so the total execution time becomes

    $$ T_{\text {sampler}}+\sqrt{N}\cdot q^{n_\textrm{guess}/2}\cdot \log (N)\cdot \textsf{poly}\!\!\left( \log (m)\right) . $$

    We note however that practical realizability of QRACM is debated and is potentially a strong assumption. We refer the readers to [27] for more details.

  • We can replace \(\mathcal {A}\) in the algorithm by a very large circuit containing all N hard-coded samples that computes the sum \(g_W\) in a naive way (without Theorem 6). This circuit will take time \(N\textsf{poly}\!\!\left( \log (m)\right) \) to evaluate, therefore the total complexity will be

    $$ T_{\text {sampler}}+N\cdot q^{n_\textrm{guess}/2}\cdot \textsf{poly}\!\!\left( \log (m)\right) . $$

    Note that this might be worse than the classical algorithm if the value of N is larger than \(q^{n_\textrm{guess}/2}\).

Finally, we note that presently samplers such as [3] are still too expensive to be useful in dual attacks but future samplers might get more efficient.

6 Comparison with [20]’s Contradictory Regime

In [20], the authors claim that [32] falls into what they call the “contradictory regime” and conclude that the result is most likely incorrect. They similarly conclude the recent derivative works [10, 16], as well as [25] are flawed. They do so by reconstructing the key heuristic claim of [32] and showing, both by theoretical arguments and experiments, that this heuristic is incorrect. We copy this heuristic below, slightly adjusted to our notations. In the heuristic, the function \(f_{\mathcal {W}}\) is the same as \(h_{\mathcal {W}}\) in Lemma 5, which is the same as \(g_{\mathcal {W}}\) defined in (6) up to a factor 1/q in the cosine.

Heuristic 1

([20, Heuristic Claim 3]). Let \(\varLambda \subseteq \mathbb {R}^n\) be a random lattice of determinant 1, \(\mathcal {W}\subseteq \widehat{\varLambda }\) be the set consisting of the \(N=(4/3)^{n/2}\) shortest vectors of \(\widehat{\varLambda }\). For some \(\sigma >0\) and \(T\geqslant 1\), consider and i.i.d where \(i\in \!\!\left\{ 1,\ldots ,T\right\} \). LetFootnote 7 \(\ell =\sqrt{4/3}\cdot {\text {GH}}(n)\), \(\varepsilon =\exp (-2\pi ^2\sigma ^2\ell ^2)\). If \(\ln T\leqslant N\varepsilon ^2\),

$$ {\text {Pr}}\!\!\left[ f_{\mathcal {W}}(\textbf{t}_{BDD}) >f_{\mathcal {W}}(\textbf{t}_{unif}^{(i)}) \text { for all }i\in \!\!\left\{ 1,\ldots ,T\right\} \right] \geqslant 1-O\left( \frac{1}{\sqrt{\ln T}}\right) $$

where \(\mathcal {N}(0,\sigma ^2)\) denotes the normal distribution.

There are several obvious (minor) problems about this heuristic since [32] works with integer lattices and discrete Gaussians. As a first step, we rewrite this heuristic in a way that is closer to [32] and we also change the notations to ours (see [39, Appendix G.1] for details about the rewrite).

Heuristic 2

([20, Heuristic Claim 3adapted). Let \(\textbf{A}\in \mathbb {Z}_q^{m\times n}\) with i.i.d. coefficients. Let \(L=L_q(\textbf{A})\subseteq \mathbb {Z}^m\) and \(W\subseteq L_q^\perp (\textbf{A})\) be the set consisting of the \(N=(4/3)^{d/2}\) shortest vectors of \(L_q^\perp (\textbf{A})\). For some \(\sigma _e>0\) and \(T\geqslant 1\), consider and i.i.d where \(i\in \!\!\left\{ 1,\ldots ,T\right\} \). Let \(\ell =\sqrt{4/3}\cdot {\text {GH}}(L)\), \(\varepsilon =\exp (-\pi \sigma _e^2\ell ^2)\). If \(\ln T\leqslant N\varepsilon ^2\),

$$ {\text {Pr}}\!\!\left[ g_W(\textbf{e}) >g_W(\textbf{t}_{unif}^{(i)}) \text { for all }i\in \!\!\left\{ 1,\ldots ,T\right\} \right] \geqslant 1-O\left( \frac{1}{\sqrt{\ln T}}\right) . $$

In [20, Section 4.2 and 4.3], the authors argue by theoretical arguments that Heuristic 1 does not hold. Although [20] did not define what they mean by “random lattice” in the heuristic, they in fact use random q-ary lattices in their experiments and also the theoretical properties of “random lattices” that they use hold for q-ary lattices. Therefore, their analysis holds also for Heuristic 2.

Their reasoning is as follows: assume that we have a large number of random candidates (the \(\textbf{t}_{unif}^{(i)}\)) and one point close to the lattice L (the point \(\textbf{e}\)), then Heuristic 2 says that we can always distinguish \(\textbf{e}\) from the candidates (since it has maximum value of \(g_W\)). The contradiction comes from the fact that in reality, for T large enough, many of candidates will be closer to L than \(\textbf{e}\) and therefore no algorithm can distinguish them [18]. This gives rise to what [20] calls the “contradictory regime” where an algorithm would somehow be able to distinguish indistinguishable distributions.

We first compare this regime to that of our algorithm and we then discuss the statistical model chosen by [20] in Heuristic 1.

6.1 Almost Complementary Regimes

In Sect. 4.3, we have applied our main theorem to a concrete instance and derived thatFootnote 8 for a typical LWE problem where the ratio m/n is fixed (and not too close to 0 or 1), q is large and the error follows a discrete Gaussian of parameter \(\sigma _e\), our algorithm works as soon as

$$\begin{aligned} q^{1-n/m}\geqslant (\tfrac{1}{s}+2\sigma _e)\sqrt{e} \end{aligned}$$
(24)

where

$$ N=\frac{\textsf{poly}\!\!\left( m\right) +n\log _2(q)}{\delta ^2}, \qquad \delta =\tfrac{1}{10}e^{-ms^2\sigma _e^2/2}. $$

In our attack, T is the number of guesses that the algorithm makes, that is \(T=q^{n_\textrm{guess}}\). In order to match [20, page 21], we will choose s so that \(\ln T=N\varepsilon ^2\):

$$\begin{aligned} \ln T=N\varepsilon ^2 \quad &\Leftrightarrow \quad n_\textrm{guess}\ln (q)=\frac{\textsf{poly}\!\!\left( m\right) +n\log _2(q)}{\delta ^2}\varepsilon ^2\\ \quad &\Leftrightarrow \quad n_\textrm{guess}\ln (q)=(\textsf{poly}\!\!\left( m\right) +n\log _2(q))100e^{2ms^2\sigma _e^2/2}e^{-2\pi \sigma _e^2\ell ^2}\\ \quad &\Leftrightarrow \quad \frac{n_\textrm{guess}\ln (q)}{100(\textsf{poly}\!\!\left( m\right) +n\log _2(q))}=e^{(ms^2-2\pi \ell ^2)\sigma _e^2}. \end{aligned}$$

Note that \(n_\textrm{guess}<n<m\) so for large enough value of m, the left-hand side of this expression is smaller than 1 (recall that \(\textsf{poly}\!\!\left( m\right) \) comes from the choice of N so we can always make it slightly bigger to artificially increase the denominator if we want). It follows that we can always choose s such that \(\ln T=N\varepsilon ^2\) in such a way that (24) holds (see [39, Appendix G.2]) and therefore Theorem 5 ensures that our algorithm works in this regime.

We will now compare this with [20]’s contradictory regime. This regime, defined in [20, page 21] is whenFootnote 9

$$\begin{aligned} r{{\,\textrm{GH}\,}}(L_q(\textbf{A}_\textrm{dual}))<\sqrt{\frac{m}{2\pi }}\sigma _e, \qquad \text {where }r=T^{-1/m}. \end{aligned}$$
(25)

Note here that the lattice is \(\textbf{A}_\textrm{dual}\) because [20] modularizes the algorithm by separating the lattice in which dual-distinguishing is done, with the part of the lattice that is enumerated over (see Sect. 6.2). Indeed, this regime comes from Heuristic 1 and the lattice in question is the one where dual vectors are generated.

Recall that for the algorithm to work, \(\textbf{A}\) and therefore \(\textbf{A}_\textrm{dual}\) must have full rank, so \(\det (L_q(\textbf{A}_\textrm{dual}))=q^{m-n_\textrm{dual}}\). Now observe that

$$ \frac{r{{\,\textrm{GH}\,}}(L_q(\textbf{A}_\textrm{dual}))}{\sqrt{\frac{m}{2\pi }}\sigma _e} =\frac{T^{-1/m}\sqrt{\frac{m}{2\pi e}}q^{1-n_\textrm{dual}/m}}{\sqrt{\frac{m}{2\pi }}\sigma _e} =\frac{q^{-n_\textrm{guess}/m}q^{1-n_\textrm{dual}/m}}{\sqrt{e}\sigma _e}. $$

Recall that \(n=n_\textrm{dual}+n_\textrm{guess}\) so the contradictory regimes corresponds to

$$\begin{aligned} q^{1-n/m}<\sigma _e\sqrt{e}. \end{aligned}$$
(26)

Comparing between the working regime (24) and the contradictory one (26), and recalling that we can choose s as large as we want, we observe that they do not overlap and the bounds only differ by a factor of two. This suggest that, for our algorithm, the “theoretically working” regime and the contradictory regime almost characterize whether the dual attack will work or not. However, the next section will explain that those regimes are based on different distributions of targets.

6.2 On the Distribution of Targets

The authors of [20] decided to modularize the algorithm by separating the lattice in which dual-distinguishing is done (\(L_q(\textbf{A}_\textrm{dual})\)) from the part of the lattice that is enumerated over (\(L_q(\textbf{A}_\textrm{guess})\)). In fact, Heuristic 1 only mentions the dual-distinguishing and not the enumeration. This however, poses a difficulty because it is clear that the “targets” (\(\textbf{b}-\textbf{A}_\textrm{guess}\tilde{\textbf{s}}_\textrm{guess}\) in our terminology, \(\textbf{t}_{unif}^{(i)}\) in Heuristic 1) are not arbitrary but have some structure.

The authors of [20] decided to model the statistics of the targets in a way that is independent of the actual choice of \(\textbf{A}_\textrm{guess}\): they chose the uniform distribution over the fundamental domain of \(L_q(\textbf{A}_\textrm{dual})\). In the case of [32] and our algorithm, the algorithm exclusively works over integers which is why we propose Heuristic 2 as an integer-version of Heuristic 1. This means that we now have two different settings:

  • In Heuristic 2, \(\textbf{t}_{unif}^{(i)}\) is sampled uniformly in \(\mathbb {Z}^m/L\).

  • In reality, \(\textbf{t}_{unif}^{(i)}=\textbf{e}+\textbf{x}^{(i)}\) where \(\textbf{x}^{(i)}\) can be any vector in \(L'\setminus q\mathbb {Z}^m\) where \(L'\) is another random q-ary lattice, chosen independently of L but fixed in the algorithm. In our algorithm, \(L=L_q(\textbf{A}_\textrm{dual})\) and \(L'=L_q(\textbf{A}_\textrm{guess})\).

Indeed, a key point in the proof of Theorem 5 is to show that points of the form \(\textbf{e}+\textbf{x}^{(i)}\) as described are always far away from L, a fact that does not hold for completely uniform targets. As a result, with high probability over the choice of \(\textbf{A}\), the targets (except for the correct guess) are all bounded away from 0 in the dual lattice. For uniform targets, the argument of [20] is statistical in nature: while there can be very short vectors, they are unlikely and the contradiction comes from the fact that if we try too many targets, we will eventually find a short one and get a false-positive. On the other hand, our algorithm and analysis is not statistical: for the vast majority of choices of \(\textbf{A}\), all targets satisfy the bound unconditionally and we can safely look at all targets without the risk of any false-positive.

In conclusion of this section, it seems that the contradictory regime of [20] nicely complements the working regime of our algorithm. On the other hand, the statistical model that underlines this contradictory regime and what happens in our algorithm are different. We leave it as an open question to explain exactly why the two regimes seem to align perfectly.

7 Open Questions

We have analysed formally a dual attack in the spirit of [32]. However, as noted in [20], the algorithm used by [32] produces many short dual vectors in a sublattice \(L''\) of \(L_q^\perp (\textbf{A}_\textrm{dual})\) (instead of the entire \(L_q^\perp (\textbf{A}_\textrm{dual})\)). In other words, W is roughly the set of vectors of \(L''\) in a ball and therefore \(g_W\) does not exactly measure the distance to L but rather to a more complicated lattice. This fact makes the analysis of \(g_W\) considerably more challenging and we believe that more research is needed to understand how this affects the choice of the parameters.

Another issue that we have avoided is that of modulus switching. Indeed, while [32] claims that this techniques bring significant improvements in the complexity, [20] claims that geometric arguments contradicts this statement. We leave as an open problem the study of a modification of our algorithm that would include modulus switching. We believe that a formal analysis would be the best way to resolve this issue. A priori, we do not see any major reason why this could not be analysed formally but it may prove to be a nontrivial technical challenge due to the effects of rounding modulo p on the uniform distribution modulo q. We note in this direction that the approach of [16] of using lattice codes instead of modulus switching might be a better fit for a formal analysis.

Finally, we have analyzed the case where the algorithm has access to m LWE samples in dimension n, and our algorithm typically requires \(m\approx 2n\) to have a good complexity. In practice, however, it is common to only have n samples, something that our algorithm cannot handle. While there is a standard technique to deal with this, namely sampling in the lattice

$$ \{(\textbf{x},\textbf{y})\in \mathbb {Z}^m\times \mathbb {Z}^{n_\textrm{dual}}:\textbf{x}^T\textbf{A}_\textrm{dual}=\textbf{y}\bmod q\}, $$

we leave it as future work to include this improvement to our analysis.