Parallel Implementation of BDD Enumeration for LWE

Kirshanova, Elena; May, Alexander; Wiemer, Friedrich

doi:10.1007/978-3-319-39555-5_31

Elena Kirshanova¹⁶,
Alexander May¹⁶ &
Friedrich Wiemer¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9696))

Included in the following conference series:

International Conference on Applied Cryptography and Network Security

2240 Accesses
7 Citations

Abstract

One of the most attractive problems for post-quantum secure cryptographic schemes is the LWE problem. Beside combinatorial and algebraic attacks, LWE can be solved by a lattice-based Bounded Distance Decoding (BDD) approach. We provide the first parallel implementation of an enumeration-based BDD algorithm that employs the Lindner-Peikert and Linear Length pruning strategies. We ran our algorithm on a large variety of LWE parameters, from which we derive the following interesting results. First, our parallel enumeration achieves almost perfect speed-up, which allows us to provide for the first time practical cryptanalytic results on standard LWE parameters of meaningful size. Second, we conclude that lattice-based attacks perform better than recent advanced BKW-type algorithms even for small noise, while requiring way less samples. Third, we experimentally show weaknesses for a binary matrix LWE proposal of Galbraith.

You have full access to this open access chapter, Download conference paper PDF

An Experimental Study of the BDD Approach for the Search LWE Problem

Solving BDD by Enumeration: An Update

Concrete Analysis of Quantum Lattice Enumeration

Keywords

1 Introduction

Estimating the hardness of the Learning with Errors Problem (LWE) is of great importance in cryptography since its introduction by Regev [1]. Nowadays, the standard way to check concrete hardness of an LWE instance is by comparison with tables in LWE cryptanalysis papers (see [2–4] for lattice-based attacks, [5–7] for combinatorial attacks of BKW-type, [8] for an algebraic attack). Also, [9] provides a publicly available LWE-estimator that collects all known attacks and predicts their running-times on given LWE parameters. Due to the large memory- and sample-complexity of combinatorial algorithms, the lattice-based approach seems more practical. This belief was questioned by a recent result on BKW of Kirchner and Fouque [7], where an LWE instance of dimension 128 was solved in 13 hours. Currently, this is the record for combinatorial attacks on LWE. So it is reasonable to ask whether a similar result can be achieved by lattice-based attacks.

In this paper we present results on a parallel implementation of lattice-based attacks on LWE. We view the LWE problem as a BDD instance on a q-ary lattice. From here there are two approaches to go for: one can solve a BDD instance either via Kannan’s embedding [10], or via reducing a lattice basis first and then solving a CVP problem on a reduced basis (reduce-then-decode). While Kannan’s embedding performs well for small dimensions [11], its complexity grows with the dimension since the algorithm calls an SVP solver as a subroutine.

We take the reduce-then-decode approach because the decoding part contains a tree-traversal algorithm that can be almost perfectly parallelized.

Our main contribution is a parallelization of BDD enumeration [3, 4]. From our experiments we conclude that:

1.
BDD enumeration can be almost perfectly parallelized, i.e. with n processors the achieved speed-up is roughly n.
2.
For standard LWE-settings (e.g. uniform secret) instances with dimension of order $n=100$ can be broken in several hours (see Sect. 5)
3.
Lattice-based techniques are more efficient than current combinatorial algorithms even for binary secret.
4.
Small error rates in BDD (binary or ternary error-vectors) allow for a much more efficient decoding.
5.
A concrete instance of a space-efficient LWE variant of Galbraith [12] is weaker than previously thought (see Sect. 4)

To the best of our knowledge, our implementation provides the first results for lattice-based enumeration attacks on concrete LWE instances. Our attack is carried out in combination with the BKZ algorithm implemented in the NTL library [13]. Further improvements of lattice reduction (like in [14]) would in combination with our parallel BDD implementation certainly speed-up the attacks even further. Our code will be made available online.^{Footnote 1}

The remainder of this paper is organized as follows. Section 2 covers notations and background. In Sect. 3 we describe Babai’s enumeration algorithm and its generalization. Our main algorithm, the parallelized BDD enumeration, is described in Sect. 3. Section 4 discusses variants of LWE and differences to the standard BDD attack. Our implementation results are presented in Sect. 5.

2 Background

We use bold lower-case letters for vectors $\mathbf{b}$ and we let $\Vert \mathbf{b}\Vert $ denote their Euclidean norm. For vectors $(\mathbf{b}_1, \ldots , \mathbf{b}_k)$, we construct a basis matrix $\mathbf B $ consisting of rows $\mathbf{b}_i$. For linearly independent $(\mathbf{b}_1, \ldots , \mathbf{b}_k) \in {{\mathbb R}}^m$, the fundamental domain ${\mathcal {P}_{\text{1/2 }}}(\mathbf B )$ is $\left\{ \sum _{i=1}^k c_i \mathbf{b}_i:c_i \in [-\frac{1}{2}, \frac{1}{2} ) \right\} $. The Gram-Schmidt orthogonalization $\widetilde{\mathbf{B }} = (\widetilde{\mathbf{b}}_1, \ldots , \widetilde{\mathbf{b}}_k)$ is obtained iteratively by setting ${\widetilde{\mathbf{b}}_1 = \mathbf{b}_1}$ and ${\widetilde{\mathbf{b}}_i}$ as the orthogonal projection of $\mathbf{b}_i$ on ${(\mathbf{b}_1, \ldots , \mathbf{b}_{i-1})}^{\perp }$ for $i=2, \ldots , k$. This orthogonalization process can be described via matrix-decomposition $\mathbf B = \mu \widetilde{\mathbf{B }}$, where $\mu $ is a lower-triangular matrix with $\mu _{i,j} = \langle \mathbf{b}_i, \widetilde{\mathbf{b}}_j \rangle / \Vert \widetilde{\mathbf{b}}_j \Vert ^2$ for $i \ge j$.

We deal with a q-ary lattice with basis $\mathbf B $:

$$\begin{aligned} \varLambda _q (\mathbf B ) = \Bigl \{\mathbf{y}\in {{\mathbb Z}}^m {:}\,\mathbf{y}= \sum _{i=1}^k z_i \cdot \mathbf{b}_i \mod q :z_i \in {{\mathbb Z}}\Bigr \}. \end{aligned}$$

Vectors from this lattice are in $\text {Im}(\mathbf B )$. The kernel of matrix $\mathbf B $ forms another lattice $\varLambda _q^{\perp }(\mathbf B ) = \{\mathbf{x} \in {{\mathbb Z}}^k {:}\, \mathbf{x} \mathbf B = 0 \mod q \}$. For a lattice $\varLambda (\mathbf B )$, the first successive minimum $\lambda _1 (\varLambda (\mathbf B ))$ is the length of its shortest vector.

In this paper we describe an algorithm to solve the so-called Bounded Distance Decoding Problem ($\textsf {BDD} $) and the most cryptographically relevant instance of it, the Learning with Errors Problem ($\textsf {LWE} $). BDD asks to find a lattice point $\mathbf{v}$ closest to a given point $\mathbf{t}\in {{\mathbb R}}^m$ under the promise that $\Vert \mathbf{v}-\mathbf{t}\Vert = \Vert \mathbf{e}\Vert \le R$, where $R$ is usually much smaller than the lattice’s packing radius. In the LWE case, we know in addition that the error-vector $\mathbf{e}$ is distributed as a discrete Gaussian i.e. its probability distribution, denoted $D_s$, is proportional to $\exp (-\pi \Vert \mathbf{e}\Vert ^2 / s^2)$. In LWE it suffices to consider the integer lattice ${{\mathbb Z}}$ as a support for the error distribution, so we used the Ziggurat Algorithm implemented in [15] for the sampling. A discrete Gaussian sampler over any lattice can be found in [16].

Apart from the scaled standard deviation s, the LWE problem is parametrized by a dimension $n \ge 1$, an integer modulus $q=\mathrm {poly}(n)$ and the number of LWE samples m. For secret $\mathbf{s}\in {{\mathbb Z}}_q^n$, an $\textsf {LWE} $ sample is obtained by choosing a vector $\mathbf{a}\in {{\mathbb Z}}_q^n$ uniformly at random, an error $e \leftarrow D_s$, and outputting m pairs ${(\mathbf{a}, t = \langle \mathbf{a}\mathbin {,}\mathbf{s}\rangle + e \bmod q) \in {{\mathbb Z}}_q^n \times {{\mathbb Z}}_q}$. Typically a cryptosystem reveals $m = \varTheta (n)$ samples (commonly as a public key) and for lattice-based attack we consider $m \le 2n$.

We write the obtained m pairs as ${(\mathbf A , \mathbf{t}= \mathbf{s}\mathbf A + \mathbf e \bmod q) \in {{\mathbb Z}}^{n \times m} \times {{\mathbb Z}}^m}$ for $\mathbf{t}= (t_1, \ldots , t_m)$, $\mathbf e =(e_1, \ldots , e_m)$ and the columns of matrix $\mathbf A $ are composed of the $\mathbf{a}_i$. From this it is easy to see that (the search version of) the $\textsf {LWE} $ problem is an average-case hard Bounded Distance Decoding problem for the q-ary lattice ${\varLambda (\mathbf A ) = \{ \mathbf z \in {{\mathbb Z}}^m:\exists \mathbf{s}\in {{\mathbb Z}}_q^n\ \text {s.t.}\ \mathbf z = \mathbf{s}\mathbf A \bmod q \}}$, i.e. $\mathbf{t}$ is close to a linear combination of rows of $\mathbf A $. Assuming $\mathbf A $ is full-rank (which is the case w.h.p.), its determinant is $\det (\varLambda (\mathbf A )) = q^{m-n}$ and the rows of the matrix below form its basis over ${{\mathbb Z}}^m$

$$\begin{aligned} \mathbf B = \begin{pmatrix} \mathbf A ' &{} \mathbf I _{m-n} \\ q \mathbf I _{m-n} &{} \mathbf 0 \end{pmatrix} \in {{\mathbb Z}}^{m \times m}, \end{aligned}$$

(1)

where $\mathbf A = (\mathbf A ' | \mathbf I _{m-n})$ and $\mathbf A ' \in {{\mathbb Z}}^{n \times n}$ is a row-reduced echelon form of $\mathbf A $.

Reduce-then-decode is our approach to solve LWE in practice. For the reduction step, we $\beta $-BKZ reduce the basis defined in Eq. (1). The reduction’s running time is determined by m and the running time of an SVP-solver on a lattice of dimension $\beta $. Our decoding step is described in the subsequent section.

3 Enumeration Tree

Let us describe our implementation of the tree-traversal algorithm for the BDD enumeration. Recall that a BDD instance is given by a (BKZ-reduced) basis $\mathbf B \in {{\mathbb Z}}^{m \times m}$ and a target $\mathbf{t}\in {{\mathbb Z}}^m$ that is close to a lattice point $\mathbf{v}= \sum _{k=1}^m v_k \mathbf{b}_k$. Our goal is to find the coordinates $v_k$. Knowing that $\mathbf{t}-\mathbf{v}= \mathbf{e}$ is short, we enumerate over all coefficient vectors $(v_m, \ldots , v_1)$ that result in a vector close to $\mathbf{t}$. A way to find the coordinates $v_k$ via iterative projections is the Nearest Plane Algorithm of Babai [17]. In the k-th iteration ($k=m, \ldots , 1$), the target $\mathbf{t}$ is projected onto ${{\mathrm{Span}}}{(\mathbf{b}_1, \ldots , \mathbf{b}_{k-1})}^{\perp }$ choosing the closest translate of the sub-lattice $\varLambda (\mathbf{b}_1, \ldots , \mathbf{b}_{k-1})$ (line 4, Algorithm 1) and the projected vector becomes a new target (line 5). The procedure results in a closest vector $\mathbf{v}$, s.t. $\Vert \mathbf{e}\Vert \le 1/2 \sqrt{\sum _{k=1}^m \Vert \widetilde{\mathbf{b}}_k\Vert ^2}$. An iterative version of the Nearest Plane Algorithm is presented in Algorithm 1.

While the above Nearest Plane procedure is very efficient even for large m, the output $\mathbf{t}^{(0)}$ is the correct one only if $\mathbf{e}\in {\mathcal {P}_{\text{1/2 }}}(\mathbf B )$. As a given basis $\mathbf B $ may be ‘far away’ from being orthogonal, the choice of the closest hyperplane (line 4, Algorithm 1) may not lead to the actual closest vector. On each iteration, the additive factor to the squared error-length can be as large as $\tfrac{1}{2} \Vert \widetilde{\mathbf{b}}\Vert ^2$.

To mitigate the non-orthogonality of the input basis, Lindner and Peikert [3] proposed to project on several close hyperplanes, i.e. in Step 5 of Algorithm 1, $c^{(k)}_i$, $1 \le i \le d_k$ are chosen, resulting in $d_k$ new targets $t^{(k-1)}_i$. To guarantee a constant success probability, $d_k$ must be chosen such that $d_k \cdot \Vert \widetilde{\mathbf{b}}_k\Vert > 2e_k$, i.e. the error-vector $\mathbf{e}$ must be contained in the stretched fundamental parallelepiped ${\mathcal {P}_{\text{1/2 }}}(\mathbf B \cdot \mathrm {diag}(d_1, \ldots d_m))$. For the LWE-case the sequence ${(d_i)}_{i=1,\ldots , m}$ can be computed given ${(\Vert \widetilde{\mathbf{b}}_i \Vert )}_{i=1,\ldots , m}$ and the parameter s.

Our algorithm is implemented as a depth-first tree traversal where each level-k node ($k=m, \ldots , 1$), represents a partial assignment $(c^{(m)}, \ldots , c^{(k)})$ of the target $\mathbf{t}^{(k)} = \mathbf{t}- \sum _{i=k}^{m} c^{(i)} \mathbf{b}_i$. A children-set for this node is generated by projecting $\mathbf{t}^{(k)}$ onto $d_{k-1}$ closest hyperplanes $U^{(k-1)}_i = c^{(k-1)}_i \widetilde{\mathbf{b}}_{k-1}+ {{\mathrm{Span}}}(\mathbf{b}_1, \ldots , \mathbf{b}_{m-k})$, $i=1, \ldots , d_{k-1}$. Each leaf is a candidate-solution $v = \sum _{i=1}^m c^{(i)} \mathbf{b}_i$, whose corresponding error is checked against the currently shortest. Figure 1a. represents the case $m=3$, $d_1=3$, $d_2=2$, $d_1=1$.

Note that the length of an error-vector is not explicitly bounded by the Lindner-Peikert enumeration tree. Instead, one imposes a restriction on its individual coordinates $e_i$. In Liu and Nguyen’s Length Pruning Algorithm [4], the number of children for a node is determined only by the length of the error accumulated so far and hence, as opposed to the Lindner-Peikert strategy, might differ for two nodes on the same level. For Gaussian error, one would expect that on level k the value $e^{(k-1)}$ (line 6, Algorithm 1) satisfies $e^{(k-1)} < R_k \approx s^2 (m-k+1)$ resulting in $e^{(0)} = \Vert \mathbf{e}\Vert = s^2 m$. This strategy is called Linear Pruning and is used in our experiments. We do not consider the so-called Extreme Pruning strategy where the bounds satisfy $R_k \ll s^2 (m-k+1)$ (i.e. the success probability is very low, but boosted via re-randomizing the basis and repeating). While Extreme Pruning proved to be more efficient in the SVP setting [18], in the BDD case re-randomizing an instance causes re-running the expensive BKZ reduction (as the re-randomization distorts the reducedness).

Both enumeration strategies, Lindner-Peikert and Length Pruning, can be generalized by considering a family of bounding functions $B^{(k)}: \mathbb {Q}\rightarrow \mathbb {Q}$, $1 \le k \le m$ that take a squared error-length as input and output the remaining allowed length depending on the chosen strategy. From the value $B^{(k)}(e^{(k)})$, one can compute the number of children for a node on level k (line 6, Algorithm 2). The Lindner-Peikert bounding function ignores the error-length setting $B^{(k)} = {(d_k \Vert \widetilde{\mathbf{b}}_k \Vert )}^2$ by having $d_k$ children for all k-level nodes. For the Length Pruning of [4], we set $B^{(k)} = R_k - e^{(k)}$. Our BDD Enumeration in Algorithm 2 describes the depth-first tree-traversal under this generalization.

Algorithm 2 constructs an enumeration tree with a k-level node storing a target-vector $\mathbf{t}^{(k-1)}$, a coefficient vector $c^{(k)}$ of a candidate-solution $\sum _{k=1}^m c^{(k)} \mathbf{b}_k$ and an accumulated error-length $e^{(k-1)}$ (lines 10–12). A path from a root $(k=m)$ to a leaf ($k=1$) gives one candidate-solution $\mathbf{v}= \sum _{k=1}^m c^{(k)} \mathbf{b}_k$ with error-length $e^{(0)} = \mathbf{t}- \mathbf{v}$. The path with the minimal error-value is the output of the algorithm.

Notice that different paths have different success probabilities: the path corresponding to Babai’s solution $\sum _{k=1}^m c^{(k)} \mathbf{b}_k$ is the most promising one. So instead of choosing the left-most child and traversing its sub-tree, the implemented tree-traversal algorithm chooses Babai’s path first, i.e. a ‘middle’ child of a node, and then examines all nearby paths. This strategy of ordering the paths by decreasing success probability is called Length best first search (see Fig. 1b).

3.1 Parallel Implementation

In Algorithm 2, sub-tree traversals for two different nodes on the same level are independent, so we can parallelize the BDD Enumeration. Let $\mathrm {\#NThreads}$ be the number of threads (processors) available. Our goal is to determine the upper-most level k having at least as many nodes $\mathrm {\#N}(k)$ as $\mathrm {\#NThreads}$. Then we can traverse the $\mathrm {\#N}(k)$ sub-trees in parallel by calling Algorithm 2 on each thread.

We start traversing the enumeration tree in a breadth-first manner using a queue. In a breadth-first traversal, once all the nodes of level k are visited, the queue contains all their children (i.e. all the nodes of level $k+1$), thus their number $\mathrm {\#N}(k+1)$ can be computed. Once a level k with $\mathrm {\#N}(k) \ge c \cdot \mathrm {\#NThreads}$ for some small constant $c \ge 1$ is found, we stop the breadth-first traversal and start Algorithm 2 for each of the $\mathrm {\#N}(k)$ sub-trees in an own thread. The benefit of having $c>1$ is that whenever one of the threads finishes quickly, it can be assigned to traverse another sub-tree. This strategy compensates for imbalanced sizes of sub-trees.

This breadth-first traversal is described in Algorithm 3. At the root we have $\mathrm {\#N}(m)=1$. The associated data to each node are the target $\mathbf{t}^{(m-1)}$, the error-length $e^{(m-1)}$ and the partial solution $\mathbf{s}^{(m-1)}$. We store them in queues $Q_t, Q_e, Q_s$. Traversing the tree down is realized via dequeuing the first element from a queue (line 9) and enqueuing its children into the queue. When Algorithm 3 terminates, we spawn a thread that receives as input a target $\mathbf{t}^{(k)}$ from $Q_t$, an accumulated so far error-length $e^{(k)} \in Q_e$, a partial solution $\mathbf{s}^{(k-1)} \in Q_s$, GSO-lengths $(\Vert \widetilde{\mathbf{b}}_{k-1} \Vert , \ldots , \Vert \widetilde{\mathbf{b}}_{1} \Vert )$ and bounding functions $B^{(i)}$, $1 \le i \le k-1$. Since the number of possible threads is usually a small constant, there is no blow-up in memory usage in the breadth-first traversal.

Note that for a family of bounding functions $B^{(k)}$ that allows to compute the number of children per node without actually traversing the tree, e.g. the Lindner-Peikert bounding strategy, it is easy to find the level where we start parallelization. In case of Lindner-Peikert, $\mathrm {\#N}(k) = \prod _{i=m}^{m-k} d_i$ and hence, we simply compute the largest level k where $\mathrm {\#N}(k) \ge c \cdot \mathrm {\#NThreads}$.

4 Variants of $\textsf {LWE} $

Binary Secret LWE. Recent results on the BKW algorithm for LWE [6, 7] show that BKW ’s running time can be significantly sped up for small LWE secret vectors $\mathbf{s}$. For a binary secret, the complexity drops from fully exponential to $2^{\mathcal {O}(n / \log \log n)}$, and Kirchner and Fouque [7] report on a successful secret-recovery for $n = 128$ within 13 hours using $2^{28}$ LWE samples.

Lattice-based techniques in turn can also profit from the fact that the secret is small (smaller than the error). As described by Bai and Galbraith [2], one transforms a BDD instance $(\varLambda (\mathbf A ), \mathbf{b}= \mathbf{s}\mathbf A + \mathbf{e})$ with error $\mathbf{e}$ into a BDD instance

$$\begin{aligned} \left( {\varLambda _q^{\perp }}\begin{pmatrix} \mathbf I _m \\ \mathbf A \end{pmatrix}, (\mathbf{b}, \mathbf 0 ^n)\right) \end{aligned}$$

(2)

with error $(\mathbf{e}, \mathbf{s})$. The instance is correctly defined since

$$\begin{aligned} ((\mathbf{e}, \mathbf{s}) - (\mathbf{b}, \mathbf 0 ^n)) \begin{pmatrix} \mathbf I _m \\ \mathbf A \end{pmatrix} = 0 \mod q. \end{aligned}$$

The lattice ${\varLambda _q^{\perp }}\begin{pmatrix} \mathbf I _m \\ \mathbf A \end{pmatrix} \in {{\mathbb Z}}^{n+m}$ is generated by the rows of $\mathbf A ^{\perp }$, where

$$\begin{aligned} \mathbf A ^{\perp } = \begin{pmatrix} -\mathbf A | \mathbf I _n \\ q\mathbf I _{n+m} \end{pmatrix}. \end{aligned}$$

We run the BDD Enumeration of Algorithm 2 on instances defined by Eq. (2) (see Sect. 5, Table 1).

Binary Matrix. To implement an LWE-based encryption on lightweight devices, Galbraith [12] proposed not to store the whole random matrix $\mathbf A \in {{\mathbb Z}}_q^{n \times m}$, but to generate the entries of a binary $\mathbf A \in {{\mathbb Z}}_2^{n \times m}$ via some PRNG. Galbraith’s ciphertexts are of the form $(C_1, C_2) = (\mathbf A \mathbf u , \langle \mathbf u , \mathbf{b}\rangle + m \lceil q/2 \rceil \mod q)$ for a message $m \in \{0,1\}$, some random $\mathbf u \in {\{0,1\}}^m$ and a modulus $q \in {{\mathbb Z}}$. The task is to recover $\mathbf u $ given $(\mathbf A , \mathbf A \mathbf u )$.

Let us describe a simple lattice-attack on the instance $(\mathbf A , \mathbf A \mathbf u )$. Notice that $C_1 = \mathbf A \mathbf u $ holds over ${{\mathbb Z}}$ and, hence, over ${{\mathbb Z}}_q$ for large enough modulus q since we expect to have $\mathbf A \mathbf u \approx m/4$. First, we find any solution $\mathbf{w}$ for $\mathbf A \mathbf{w}= C_1 \mod q$. Note that

$$\begin{aligned} (\mathbf{w}- \mathbf u ) \in \ker (\mathbf A ). \end{aligned}$$

So we have a BDD instance $(\varLambda _q^{\perp }(\mathbf A ), \mathbf{w})$, with $\mathbf u $ as the error-vector of length m / 2 and a lattice with $\det (\varLambda _q^{\perp }(\mathbf A ))= q^n$. Since we can freely choose q to be as large as we want, we can guarantee that $\lambda _1 (\varLambda _q^{\perp }(\mathbf A )) \gg m/2$. Such an instance can be solved by first running $\beta $-BKZ for some small constant $\beta $ and then Babai’s $\textsf {CVP} $ algorithm.

As a challenge, Galbraith proposes a parameter-set $(n=256, m=400)$ and estimates that computing $\mathbf u $ from $\mathbf A \mathbf u $ should take around one day. We solve this instance using NTL’s BKZ implementation with $\beta = 4$ and $q = 500009$ in 4.5 hours (see Table 1).

5 Implementation Results

We implemented our BDD enumeration step with Lindner-Peikert’s Nearest Planes and Liu-Nguyen’s Linear Length Pruning. All programs are written in C++ and we used C++11 STL for implementing the threading. Our tests were performed on the Ruhr-University’s “Crypto Crunching Cluster" (C3) which consists of one master node to schedule jobs and four computing nodes. Each computing node has four AMD Bulldozer Opteron 6276 CPUs, and thus 64 cores, running at 2.3 GHz and 256 GByte of RAM. The results of our experiments are presented in Table 1.

Table 1. Running-times of the BDD-decoding attack on LWE.The superscript B indicates that Babai’s Nearest Plane Algorithm already solved the instance. Uniform binary and ternary error distributions are denoted by $s = \left\{ 0, 1\right\} $ and $s = \left\{ -1, 0, 1\right\} $.

Full size table

Our experiments are run on

1.
standard LWE parameters (top part of Table 1),
2.
LWE with binary- and ternary-error (middle part),
3.
binary secret LWE,
4.
the space-efficient proposal of Galbraith (bottom).

Let us describe the results of our experiments in more details.

1.
For the standard LWE case and Gaussian error, the dimensions we successfully attacked in several hours are within the interval $n \in [70, 100]$. We achieve an almost perfect speed-up – the gained factor in the running times is roughly equal to the number of processors ($\mathrm {\#NThreads}$). This shows that our distribution of processors balances the workload. The largest successfully decoded parameters are $(n=100, s=4)$. For comparison, the instance $(n=192, s=9)$ achieves $2^{87}$-security level as estimated in [3].
2.
Not surprisingly, once the error is changed from Gaussian to binary or ternary, the decoding attack performs better, but balancing the BKZ-reduction and BDD steps becomes more subtle, since a smaller error is more favourable for the decoding. Hence, such an instance can be attacked with a less reduced basis than a similar LWE instance with Gaussian noise. To balance the reduction and enumeration steps, we first choose a smaller block-size $\beta $ for the reduction and, second, choose fewer than 2n samples. Our choice for m additionally lowers the running time of BKZ-reduction, while it still guarantees successful decoding. The maximal dimension achieved in this regime is $n=130$. Binary and ternary errors are especially interesting for cryptanalysis of NTRU [21] and for special variants of LWE considered by Micciancio and Peikert [19] and Buchmann et al. [20].
3.
For binary secret we are able to attack dimensions $n \in [100, 140]$. In contrast to the BKW attack of Kirchner and Fouque [7], we choose as few samples as possible to aid the reduction step (while keeping a unique solution). More concretely, for $n=130$, we used only $m=150$ samples, as opposed to $m=2^{28}$ samples required in the BKW attack. Our attack takes only 7.6 h, which is faster than the reported 13h in [7]. Moreover, we are able to attack dimension $n=140$ for which we benefit again from parallelization.
4.
For the space-efficient binary-matrix case of [12], we choose $q=50009$ and solve the instance $(n=256, m=400)$ in 4.5h with $\beta =4$ and Babai’s CVP algorithm.

All our experiments confirm that Linear Length Pruning works much more efficient than Lindner-Peikert Decoding for most of the considered variants of LWE. Another observation is that lowering the number of samples significantly speeds up the reduction in practice and slows down the decoding step. Since the latter can be parallelized, a proper choice of the number of samples leads to a better trade-off between the reduction and enumeration.

Notes

1.
https://github.com/pfasante/cvp-enum.

References

Regev, O.: On lattices, learning with errors, random linear codes, cryptography. In: STOC 2005, pp. 84–93. ACM (2005)
Google Scholar
Bai, S., Galbraith, S.D.: Lattice decoding attacks on binary LWE. In: Susilo, W., Mu, Y. (eds.) ACISP 2014. LNCS, vol. 8544, pp. 322–337. Springer, Heidelberg (2014). https://eprint.iacr.org/2013/839
Google Scholar
Lindner, R., Peikert, C.: Better key sizes (and attacks) for lwe-based encryption. In: Kiayias, A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 319–339. Springer, Heidelberg (2011). https://eprint.iacr.org/2010/613
Chapter Google Scholar
Liu, M., Nguyen, P.Q.: Solving BDD by enumeration: an update. In: Dawson, E. (ed.) CT-RSA 2013. LNCS, vol. 7779, pp. 293–309. Springer, Heidelberg (2013)
Chapter Google Scholar
Albrecht, M.R., Cid, C., Faugère, J., Fitzpatrick, R., Perret, L.: On the complexity of the BKW algorithm on LWE. Des. Codes Crypt. 74(2), 325–354 (2015)
Article MathSciNet MATH Google Scholar
Guo, Q., Johansson, T., Stankovski, P.: Coded-BKW: solving LWE using lattice codes. In: Gennaro, R., Robshaw, M. (eds.) CRYPTO 2015. LNCS, vol. 9215, pp. 23–42. Springer, Heidelberg (2015). doi:10.1007/978-3-662-47989-6_2
Chapter Google Scholar
Kirchner, P., Fouque, P.-A.: An improved BKW algorithm for LWE with applications to cryptography and lattices. In: Gennaro, R., Robshaw, M. (eds.) CRYPTO 2015. LNCS, vol. 9215. Springer, Heidelberg (2015). doi:10.1007/978-3-662-47989-6_3. https://eprint.iacr.org/2015/552
Google Scholar
Arora, S., Ge, R.: New algorithms for learning in presence of errors. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011, Part I. LNCS, vol. 6755, pp. 403–415. Springer, Heidelberg (2011)
Chapter Google Scholar
Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Cryptology 9(3), 169–203 (2015). https://eprint.iacr.org/2015/046
Article MathSciNet MATH Google Scholar
Kannan, R.: Minkowski’s convex body theorem, integer programming. In: Mathematics of Operations Research 12.3 , pp. 415–440. ISSN: 0364765X, 15265471 (1987)
Google Scholar
Luzzi, L., Stehlé, D., Ling, C.: Decoding by embedding: correct decoding radius and DMT optimality. IEEE Trans. Inf. Theory 59(5), 2960–2973 (2013)
Article MathSciNet Google Scholar
Galbraith, S. D.: Space-efficient variants of cryptosystems based on learning with errors. https://www.math.auckland.ac.nz/~sgal018/compact-LWE.pdf
Shoup, V.: Number theory library 9.6.2 (NTL) for C++. http://www.shoup.net/ntl/
Chen, Y., Nguyen, P.Q.: BKZ 2.0: better lattice security estimates. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 1–20. Springer, Heidelberg (2011). https://www.iacr.org/archive/asiacrypt2011/70730001/70730001.pdf
Chapter Google Scholar
Buchmann, J., Cabarcas, D., Göpfert, F., Hülsing, A., Weiden, P.: Discrete ziggurat: a time-memory trade-off for sampling from a gaussian distribution over the integers. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 402–417. Springer, Heidelberg (2014). https://eprint.iacr.org/2013/510
Chapter Google Scholar
Gentry, C., Peikert, C., Vaikuntanathan, V.: Trapdoors for hard lattices, new cryptographic constructions. In: Dwork, C. (ed.) STOC 2008, pp. 197–206. ACM (2008)
Google Scholar
Babai, L.: On Lovász lattice reduction, the nearest lattice point problem. In: Mehlhorn, K. (ed.) STACS 1985. LNCS, vol. 182, pp. 13–20. Springer, Heidelberg (1985)
Google Scholar
Gama, N., Nguyen, P.Q., Regev, O.: Lattice enumeration using extreme pruning. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 257–278. Springer, Heidelberg (2010). https://www.iacr.org/archive/eurocrypt2010/66320257/66320257.pdf
Chapter Google Scholar
Micciancio, D., Peikert, C.: Hardness of SIS and LWE with small parameters. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, Part I. LNCS, vol. 8042, pp. 21–39. Springer, Heidelberg (2013). https://eprint.iacr.org/2013/069
Chapter Google Scholar
Buchmann, J., Göpfert, F., Player, R., Wunderer, T.: On the hardness of LWE with binary error: revisiting the hybrid lattice-reduction and meet-in-the-middle attack. In: Pointcheval, D., Nitaj, A., Rachidi, T. (eds.) AFRICACRYPT 2016. LNCS, vol. 9646, pp. 24–43. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31517-1_2. https://eprint.iacr.org/2016/089
Chapter Google Scholar
Hoffstein, J., Pipher, J., Silverman, J.H.: NTRU: a ring-based public key cryptosystem. In: Buhler, J.P. (ed.) ANTS 1998. LNCS, vol. 1423, pp. 267–288. Springer, Heidelberg (1998)
Chapter Google Scholar

Download references

Acknowledgments

We thank Gottfried Herold and the anonymous reviews for their helpful feedback and valuable suggestions. Elena Kirshanova and Friedrich Wiemer were supported by UbiCrypt, the research training group 1817/1 funded by the DFG.

Author information

Authors and Affiliations

Horst Görtz Institute for IT-Security, Faculty of Mathematics, Ruhr University Bochum, Bochum, Germany
Elena Kirshanova, Alexander May & Friedrich Wiemer

Authors

Elena Kirshanova
View author publications
You can also search for this author in PubMed Google Scholar
Alexander May
View author publications
You can also search for this author in PubMed Google Scholar
Friedrich Wiemer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elena Kirshanova .

Editor information

Editors and Affiliations

University of Surrey, Guildford, United Kingdom
Mark Manulis
Technische Universität Darmstadt, Darmstadt, Hessen, Germany
Ahmad-Reza Sadeghi
University of Surrey, Guildford, United Kingdom
Steve Schneider

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kirshanova, E., May, A., Wiemer, F. (2016). Parallel Implementation of BDD Enumeration for LWE . In: Manulis, M., Sadeghi, AR., Schneider, S. (eds) Applied Cryptography and Network Security. ACNS 2016. Lecture Notes in Computer Science(), vol 9696. Springer, Cham. https://doi.org/10.1007/978-3-319-39555-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-39555-5_31
Published: 09 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39554-8
Online ISBN: 978-3-319-39555-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallel Implementation of BDD Enumeration for LWE

Abstract

Similar content being viewed by others