Abstract
We present an algorithm for the approximate k-List problem for the Euclidean distance that improves upon the Bai-Laarhoven-Stehlé (BLS) algorithm from ANTS’16. The improvement stems from the observation that almost all the solutions to the approximate k-List problem form a particular configuration in n-dimensional space. Due to special properties of configurations, it is much easier to verify whether a k-tuple forms a configuration rather than checking whether it gives a solution to the k-List problem. Thus, phrasing the k-List problem as a problem of finding such configurations immediately gives a better algorithm. Furthermore, the search for configurations can be sped up using techniques from Locality-Sensitive Hashing (LSH). Stated in terms of configuration-search, our LSH-like algorithm offers a broader picture on previous LSH algorithms.
For the Shortest Vector Problem, our configuration-search algorithm results in an exponential improvement for memory-efficient sieving algorithms. For \(k=3\), it allows us to bring down the complexity of the BLS sieve algorithm on an n-dimensional lattice from \(2^{0.4812n+o(n)}\) to \(2^{0.3962n + o(n)}\) with the same space requirement \(2^{0.1887n + o(n)}\). Note that our algorithm beats the Gauss Sieve algorithm with time resp. space of \(2^{0.415n+o(n)}\) resp. \(2^{0.208n + o(n)}\), while being easy to implement. Using LSH techniques, we can further reduce the time complexity down to \(2^{0.3717n + o(n)}\) while retaining a memory complexity of \(2^{0.1887n+o(n)}\).
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
The k-List problem is defined as follows: given k lists \(L_1, \ldots , L_k\) of elements from a set X, find k-tuples \((x_1, \ldots , x_k) \in L_1 \times \ldots \times L_k\) that satisfy some condition C. For example, Wagner [19] considers \(X \subset \{0,1\}^n\), and a tuple \((x_1, \ldots , x_k)\) is a solution if \(x_1 \oplus \ldots \oplus x_n = 0^n\). In this form, the problem has found numerous applications in cryptography [14] and learning theory [6].
For \(\ell _2\)-norm conditions with \(X \subset {{\mathbb R}}^n\) and \(k=2\), the task of finding pairs \(({{\varvec{x}}}_1, {{\varvec{x}}}_2) \in L_1 \times L_2\), s.t. \(\Vert {{\varvec{x}}}_1 + {{\varvec{x}}}_2\Vert < \min \{ \Vert {{\varvec{x}}}_1\Vert , \Vert {{\varvec{x}}}_2\Vert \}\), is at the heart of certain algorithms for the Shortest Vector Problem (SVP). Such algorithms, called sieving algorithms [1, 17], are asymptotically the fastest SVP solvers known so far.
Sieving algorithms look at pairs of lattice vectors that sum up to a short(er) vector. Once enough such sums are found, repeat the search by combining these shorter vectors into even shorter ones and so on. It is not difficult to see that in order to find even one pair where the sum is shorter than both the summands, we need an exponential number of lattice vectors, so the memory requirement is exponential. In practice, due to the large memory-requirement, sieving algorithms are outperformed by the asymptotically slower Kannan enumeration [10].
Naturally, the question arises whether one can reduce the constant in the exponent of the memory complexity of sieving algorithms at the expense of running time. An affirmative answer is obtained in the recently proposed k-list sieving by Bai, Laarhoven, and Stehlé [4] (BLS, for short). For constant k, they present an algorithm that, given input lists \(L_1, \ldots , L_k\) of elements from the n-sphere \(\mathsf {S}^{n}\) with radius 1, outputs k-tuples with the property \(\Vert {{\varvec{x}}}_1 + \ldots + {{\varvec{x}}}_n\Vert < 1\). They provide the running time and memory-complexities for \(k=3,4\).
We improve and generalize upon the BLS k-list algorithm. Our results are as follows:
-
1.
We present an algorithm that on input \(L_1, \ldots , L_k \subset \mathsf {S}^{n}\), outputs k-tuples \(({{\varvec{x}}}_1, \ldots , {{\varvec{x}}}_k), \in L_1 \times \ldots \times L_k\), s.t. all pairs \(({{\varvec{x}}}_i, {{\varvec{x}}}_j)\) in a tuple satisfy certain inner product constraints. We call this problem the Configuration problem (Definition 3).
-
2.
We give a concentration result on the distribution of scalar products of \({{\varvec{x}}}_1, \ldots {{\varvec{x}}}_k \in \mathsf {S}^{n} \) (Theorems 1 and 2), which implies that finding vectors that sum to a shorter vector can be reduced to the above Configuration problem.
-
3.
By working out the properties of the aforementioned distribution, we prove the conjectured formula (Eq. (3.2) from [4]) on the input list-sizes (Theorem 3), s.t. we can expect a constant success probability for sieving. We provide closed formulas for the running times for both algorithms: BLS and our Algorithm 1 (Theorem 4). Algorithm 1 achieves an exponential speed-up compared the BLS algorithm.
-
4.
To further reduce the running time of our algorithm, we introduce the so-called Configuration Extension Algorithm (Algorithm 2). It has an effect similar to Locality-Sensitive Hashing as it shrinks the lists in a helpful way. This is a natural generalization of LSH to our framework of configurations. We briefly explain how to combine Algorithm 1 and the Configuration Extension in Sect. 7. A complete description can be found in the full version.
Roadmap. Section 2 gives basic notations and states the problem we consider in this work. Section 3 introduces configurations – a novel tool that aids the analysis in succeeding Sects. 4 and 5 where we present our algorithm for the k-List problem and prove its running time. Our generalization of Locality Sensitive Hashing – Configuration Extension – is described in Sect. 6 and its application to the k-list problem in Sect. 7. We conclude with experimental results confirming our analysis in Sect. 8. We defer some of the proofs and details on the Configuration Extension Algorithm to the appendices as these are not necessary to understand the main part.
2 Preliminaries
Notations. We denote by \(\mathsf {S}^{n}\subset {{\mathbb R}}^{n+1}\) the n-dimensional unit sphere. We use soft-\(\mathcal {O}\) notation to denote running times: \(T=\widetilde{\mathcal {O}}(2^{cn})\) means that we suppress subexponential factors. We use sub-indices \(\mathcal {O}_k(.)\) in the \(\mathcal {O}\)-notation to stress that the asymptotic result holds for k fixed. For any set \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k\) of vectors in some \({{\mathbb R}}^{n}\), the Gram matrix \(C\in {{\mathbb R}}^{k\times k}\) is given by the set of pairwise scalar products. It is a complete invariant of the \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k\) up to simultaneous rotation and reflection of all \({{\varvec{x}}}_i\)’s. For such matrices \(C\in {{\mathbb R}}^{k\times k}\) and \(I\subset \{1,\ldots ,k\}\), we write \(C[I]\) for the appropriate \(\left| I\right| \times \left| I\right| \)-submatrix with rows and columns from I.
As we consider distances wrt. the \(\ell _2\)-norm, the approximate k-List problem we consider in this work is the following computational problem:
Definition 1
(Approximate k -List problem). Let \(0<t<\sqrt{k}\). Assume we are given k lists \(L_1,\ldots ,L_k\) of equal exponential size, whose entries are iid. uniformly chosen vectors from the n-sphere \(\mathsf {S}^{n}\). The task is to output an \(1-o(1)\)-fraction of all solutions, where solutions are k-tuples \({{\varvec{x}}}_1\in L_1,\ldots , {{\varvec{x}}}_k\in L_k\) satisfying \(\left\| {{\varvec{x}}}_1+\dots +{{\varvec{x}}}_k\right\| ^2\le t^2\).
We consider the case where t, k are constant and the input lists are of size \(\mathsf {c}^n\) for some constant \(\mathsf {c}>1\). We are interested in the asymptotic complexity for \(n\rightarrow \infty \). To simplify the exposition, we pretend that we can compute with real numbers; all our algorithms work with sufficiently precise approximations (possibly losing an o(1)-fraction of solutions due to rounding). This does not affect the asymptotics. Note that the problem becomes trivial for \(t>\sqrt{k}\), since all but an \(1-o(1)\)-fraction of k-tuples from \(L_1\times \dots \times L_k\) satisfy \(\left\| {{\varvec{x}}}_1+\ldots +{{\varvec{x}}}_k\right\| ^2\approx k\) (random \({{\varvec{x}}}_i\in \mathsf {S}^{n}\) are almost orthogonal with high probability, cf. Theorem 1). In the case \(t>\sqrt{k}\), we need to ask that \(\left\| {{\varvec{x}}}_1+\ldots +{{\varvec{x}}}_k\right\| ^2\ge t^2\) to get a meaningful problem. Then all our results apply to the case \(t>\sqrt{k}\) as well.
In our definition, we allow to drop a \(o(1)\)-fraction of solutions, which is fine for the sieving applications. In fact, we will propose an algorithm that drops an exponentially small fraction of solutions and our asymptotic improvement compared to BLS crucially relies on dropping more solutions than BLS. For this reason, we are only interested in the case where the expected number of solutions is exponential.
Relation to the Approximate Shortest Vector Problem. The main incentive to look at the approximate k-List problem (as in Definition 1) is its straightforward application to the so-called sieving algorithms for the shortest vector problem (SVP) on an n-dimensional lattice (see Sect. 7.2 for a more comprehensive discussion). The complexity of these sieving algorithms is completely determined by the complexity of an approximate k-List solver called as main subroutine. So one can instantiate a lattice sieving algorithm using an approximate k-List solver (the ability to choose k allows a memory-efficient instantiations of such a solver). This is observed and fully explained in [4]. For \(k=3\), the running time for the SVP algorithm presented in [4] is \(2^{0.4812n+o(n)}\) requiring \(2^{0.1887n+o(n)}\) memory. Running our Algorithm 1 instead as a k-List solver within the SVP sieving, one obtains a running time of \(2^{0.3962n + o(n)}\) with the same memory complexity \(2^{0.1887n+o(n)}\). As explained in Sect. 7.2, we can reduce the running time even further down to \(2^{0.3717n + o(n)}\) with no asymptotic increase in memory by using a combination of Algorithm 1 and the LSH-like Configuration Extension Algorithm. This combined algorithm is fully described in the full version of the paper.
In the applications to sieving, we have \(t=1\) and actually look for solutions \(\Vert \pm {{\varvec{x}}}_1\pm \dots \pm {{\varvec{x}}}_k\Vert \le 1\) with arbitrary signs. This is clearly equivalent by considering the above problem separately for each of the \(2^k=\mathcal {O}(1)\) choices of signs. Further, the lists \(L_1,\ldots ,L_k\) can actually be equal. Our algorithm works for this case as well. In these settings, some obvious optimizations are possible, but they do not affect the asymptotics.
Our methods are also applicable to lists of different sizes, but we stick to the case of equal list sizes to simplify the formulas for the running times.
3 Configurations
Whether a given k-tuple \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k\) is a solution to the approximate k-List problem is invariant under simultaneous rotations/reflections of all \({{\varvec{x}}}_i\) and we want to look at k-tuples up to such symmetry by what we call configurations of points. As we are concerned with the \(\ell _2\)-norm, a complete invariant of k-tuples up to symmetry is given by the set of pairwise scalar products and we define configurations for this norm:
Definition 2 (Configuration)
The configuration \(C={\mathrm{Conf}}\,({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k)\) of k points \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k\in \mathsf {S}^{n}\) is defined as the Gram matrix \( C_{i,j}=\langle {{\varvec{x}}}_i,{{\varvec{x}}}_j\rangle \).
Clearly, the configuration of the k-tuple \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k\) determines the length of the sum \(\left\| \sum _i{{\varvec{x}}}_i\right\| \):
We denote by
the spaces of all possible configurations resp. those which give a length of at most t. The spaces \(\mathscr {C}\) and \(\mathscr {C}_{\le t}\) are compact and convex. For fixed k, it is helpful from an algorithmic point of view to think of \(\mathscr {C}\) as a finite set: for any \(\varepsilon >0\), we can cover \(\mathscr {C}\) by finitely many \(\varepsilon \)-balls, so we can efficiently enumerate \(\mathscr {C}\).
In the context of the approximate k-List problem with target length t, a k-tuple \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k\) is a solution iff \({\mathrm{Conf}}\,({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k)\in \mathscr {C}_{\le t}\). For that reason, we call a configuration in \(\mathscr {C}_{\le t}\) good. An obvious way to solve the approximate k-List problem is to enumerate over all good configurations and solve the following k-List configuration problem:
Definition 3 (Configuration problem)
On input k exponentially-sized lists \(L_1,\ldots ,L_k\) of vectors from \(\mathsf {S}^{n}\), a target configuration \(C\in \mathscr {C}\) and some \(\varepsilon >0\), the task is to output all k-tuples \({{\varvec{x}}}_1\in L_1,\ldots ,{{\varvec{x}}}_k\in L_k\), such that \(\vert \langle {{\varvec{x}}}_i,{{\varvec{x}}}_j\rangle -C_{ij}\vert \le \varepsilon \) for all i, j. Such k-tuples are called solutions to the problem.
Remark 1
Due to \(\langle {{\varvec{x}}}_i,{{\varvec{x}}}_j\rangle \) taking real values, it does not make sense to ask for exact equality to \(C\), but rather we introduce some \(\varepsilon >0\). We shorthand write \(C\approx _\varepsilon C'\) for \(\vert C_{i,j}-C'_{i,j}\vert \le \varepsilon \). Formally, our analysis will show that for fixed \(\varepsilon >0\), we obtain running times and list sizes of the form \(\widetilde{\mathcal {O}}_\varepsilon (2^{(c+f(\varepsilon ))n})\) for some unspecified continuous f with \(\smash {\lim \limits _{\varepsilon \rightarrow 0}}f(\varepsilon ) = 0\). Letting \(\varepsilon \rightarrow 0\) sufficiently slowly, we absorb \(f(\varepsilon )\) into the \(\widetilde{\mathcal {O}}(.)\)-notation and omit it.
As opposed to the approximate k-List problem, being a solution to the k-List configuration problem is a locally checkable property [12]: it is a conjunction of conditions involving only pairs \({{\varvec{x}}}_i,{{\varvec{x}}}_j\). It is this and the following observation that we leverage to improve on the results of [4].
It turns out that the configurations attained by the solutions to the approximate k-List problem are concentrated around a single good configuration, which is the good configuration with the highest amount of symmetry. So in fact, we only need to solve the configuration problem for this particular good configuration. The following theorem describes the distribution of configurations:
Theorem 1
Let \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k\in \mathsf {S}^{n}\) be independent, uniformly distributed on the n-sphere, \(n>k\). Then the configuration \(C=C({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k)\) follows a distribution \(\mu _\mathscr {C}\) on \(\mathscr {C}\) with density given by
where \(W_{n,k} = \pi ^{-\frac{k(k-1)}{4}}\prod _{i=0}^{k-1} \frac{\varGamma (\frac{n+1}{2})}{\varGamma (\frac{n+1-i}{2})} = \mathcal {O}_k\bigl (n^{\frac{k(k-1)}{4}}\bigr )\) is a normalization constant that only depends on n and k. Here, the reference measure \(\mathrm {d}\mathscr {C}\) is given by \(\mathrm {d}\mathscr {C}=\mathrm {d}C_{1,2}\cdots \mathrm {d}C_{(k-1),k}\) (i.e. the Lebesgue measure in a natural parametrization).
Proof
We derive this by an approximate normalization of the so-called Wishart distribution [20]. Observe that we can sample \(C\leftarrow \mu _\mathscr {C}\) in the following way:
We sample \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k\in {{\mathbb R}}^{n+1}\) iid from spherical \(n+1\)-dimensional Gaussians, such that the direction of each \({{\varvec{x}}}_i\) is uniform over \(\mathsf {S}^{n}\). Note that the lengths of the \({{\varvec{x}}}_i\) are not normalized to 1. Then we set \(A_{i,j}:=\langle {{\varvec{x}}}_i,{{\varvec{x}}}_j\rangle \). Finally, normalize to \(C_{i,j}:=\frac{A_{i,j}}{\sqrt{A_{i,i}A_{j,j}}}\).
The joint distribution of the \(A_{i,j}\) is (by definition) given by the so-called Wishart distribution. [20] Its density for \(n+1>k-1\) is known to be
where the reference density \(\mathrm {d}A\) is given by \(\mathrm {d}A = \prod _{i\le j} \mathrm {d}A_{i,j}\). We refer to [8] for a relatively simple computation of that density. Consider the change of variables on \({{\mathbb R}}^{k(k+1)/2}\) given by
i.e. we map the \(A_{i,j}\)’s to \(C_{i,j}\)’s while keeping the \(A_{i,i}\)’s to make the transformation bijective almost everywhere. The Jacobian \(D\varPhi \) of \(\varPhi \) is a triangular matrix and its determinant is easily seen to be
Further, note that \(A=T C T\), where T is a diagonal matrix with diagonal \(\sqrt{A_{1,1}},\ldots ,\sqrt{A_{k,k}}\). In particular, \(\det (A) = \det (C)\cdot \prod _i A_{i,i}\). Consequently, we can transform the Wishart density into \(\bigl (A_{1,1},\ldots ,A_{k,k},C_{1,2},\ldots ,C_{k-1,k}\bigr )\)-coordinates as
The desired \(\mu _\mathscr {C}\) is obtained from \(\rho _{{\text {Wishart}}}\) by integrating out \(\mathrm {d}A_{1,1}\mathrm {d}A_{2,2}\cdots \mathrm {d}A_{k,k}\). We can immediately see that \(\mu _\mathscr {C}\) takes the form \(\mu _\mathscr {C}= W_{n,k}\det (C)^{\frac{n-k}{2}}\mathrm {d}\mathscr {C}\) for some constants \(W_{n,k}\). We compute \(W_{n,k}\) as
Finally, note that as a consequence of Stirling’s formula, we have \(\frac{\varGamma (n+z)}{\varGamma (n)} = \mathcal {O}_z(n^z)\) for any fixed z and \(n\rightarrow \infty \). From this, we get
The configurations C that we care about the most have the highest amount of symmetry. We call a configuration C balanced if \(C_{i,j}=C_{i',j'}\) for all \(i\ne j\), \(i'\ne j'\). To compute the determinant \(\det (C)\) for such balanced configurations, we have the following lemma:
Lemma 1
Then \(\det (C) = (1-a)^{k-1}(1+(k-1)a)\).
Proof
We have \(C=(1-a)\cdot \mathbbm {1}_k + a \cdot \varvec{1}\cdot {\varvec{1}^{\mathsf {t}}}\), where \(\varvec{1}\in {{\mathbb R}}^{k\times 1}\) is an all-ones vector. Sylvester’s Determinant Theorem [2] gives

For fixed k and C, the probability density \(\widetilde{\mathcal {O}}\bigl (\det (C)^{\frac{n}{2}}\bigr )\) of \(\mu _\mathscr {C}\) is exponential in n. Since \(C\in \mathscr {C}\) can only vary in a compact space, taking integrals will asymptotically pick the maximum value: in particular, we have for the probability that a uniformly random k-tuple \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k\) is good:
We now compute this maximum.
Theorem 2
Let \(0<t<\sqrt{k}\) be some target length and consider the subset \(\mathscr {C}_{\le t}\subset \mathscr {C}\) of good configurations for target length at most t. Then \(\det (C)\) attains its unique maximum over \(\mathscr {C}_{\le t}\) at the balanced configuration \(C_{{\text {Bal}},t}\), defined by \(C_{i,j} = \frac{t^2-k}{k^2-k}\) for all \(i\ne j\) with maximal value

In particular, for \(t=1\), this gives \(C_{i,j} = -\frac{1}{k}\) and \(\det (C)_{\max }=\frac{(k+1)^{k-1}}{k^k}\).
Consequently, for any fixed k and any fixed \(\varepsilon >0\), the probability that a randomly chosen solution to the approximate k-List problem is \(\varepsilon \)-close to \(C_{{\text {Bal}},t}\) converges exponentially fast to 1 as \(n\rightarrow \infty \).
Proof
It suffices to show that \(C\) is balanced at the maximum, i.e. that all \(C_{i,j}\) with \(i\ne j\) are equal. Then computing the actual values is straightforward from (1) and Lemma 1. Assume \(k\ge 3\), as there is nothing to show otherwise.
For the proof, it is convenient to replace the conditions \(C_{i,i}=1\) for all i by the (weaker) condition \({\mathrm{Tr}}\,(C) = k\). Let \(\mathscr {C}'_{\le t}\) denote the set of all symmetric, positive semi-definite \(C\in {{\mathbb R}}^{k\times k}\) with \({\mathrm{Tr}}\,(C)=k\) and \(\sum _{i,j}C_{i,j}\le t^2\). We maximize \(\det (C)\) over \(\mathscr {C}'_{\le t}\) and our proof will show that \(C_{i,i}=1\) is satisfied at the maximum.
Let \(C\in \mathscr {C}'_{\le t}\). Since C is symmetric, positive semi-definite, there exists an orthonormal basis \({{\varvec{v}}}_1,\ldots ,{{\varvec{v}}}_k\) of eigenvectors with eigenvalues \(0\le \lambda _1\le \ldots \le \lambda _k\).
Clearly, \(\sum _i \lambda _i = {\mathrm{Tr}}\,(C) = k\) and our objective \(\det (C)\) is given by \(\det (C)=\prod _i \lambda _i\). We can write \(\sum _{i,j}C_{i,j}\) as \({\varvec{1}^{\mathsf {t}}} C\varvec{1}\) for an all-ones vector \(\varvec{1}\). We will show that if \(\det (C)\) is maximal, then \(\varvec{1}\) is an eigenvector of C. Since
for the smallest eigenvalue \(\lambda _1\) of C, we have \(\lambda _1 \le \frac{t^2}{k}<1\). For fixed \(\lambda _1\), maximizing \(\det (C) = \lambda _1 \cdot \prod _{i=2}^k \lambda _i\) under \(\sum _{i=2}^k\lambda _i = k-\lambda _1\) gives (via the Arithmetic Mean-Geometric Mean Inequality)
The derivative of the right-hand side wrt. \(\lambda _1\) is \(\frac{k(1-\lambda _1)}{k-1}\bigl (\frac{k-\lambda _1}{k-1}\bigr )^{k-2}>0\), so we can bound it by plugging in the maximal \(\lambda _1=\frac{t^2}{k}\):
The inequalities (5) are satisfied with equality iff \(\lambda _2=\ldots =\lambda _k\) and \(\lambda _1=\frac{t^2}{k}\). In this case, we can compute the value of \(\lambda _2\) as \(\lambda _2=\frac{k^2-t^2}{k(k-1)}\) from \({\mathrm{Tr}}\,(C)=k\). The condition \(\lambda _1=\frac{t^2}{k}\) means that (4) is satisfied with equality, which implies that \(\varvec{1}\) is an eigenvector with eigenvalue \(\lambda _1\). So wlog. \({{\varvec{v}}}_1 = \frac{1}{\sqrt{k}}\varvec{1}\). Since the \({{\varvec{v}}}_i\)’s are orthonormal, we have \(\mathbbm {1}_k=\sum _i{{\varvec{v}}}_i{{\varvec{v}}}_i^{\mathsf {t}}\), where \(\mathbbm {1}_k\) is the \(k\times k\) identity matrix. Since we can write C as \(C=\sum _i\lambda _i{{\varvec{v}}}_i{{\varvec{v}}}_i^{\mathsf {t}}\), we obtain
for \(\det (C)\) maximal. From \(C=\frac{\lambda _1-\lambda _2}{k}\varvec{1}{\varvec{1}^{\mathsf {t}}}+\lambda _2 \cdot \mathbbm {1}_k\), we see that all diagonal entries of \(C\) are equal to \(\lambda _2+\frac{\lambda _1-\lambda _2}{k}\) and the off-diagonal entries are all equal to \(\frac{\lambda _1-\lambda _2}{k}\). So all \(C_{i,i}\) are equal with \(C_{i,i}=1\), because \({\mathrm{Tr}}\,(C)=k\), and \(C\) is balanced.
For the case \(t>\sqrt{k}\), and \(\mathscr {C}_{\le t}\) replaced by \(\mathscr {C}_{\ge t}\), the statement can be proven analogously. Note that we need to consider the largest eigenvalue rather than the smallest in the proof. We remark that for \(t=1\), the condition \(\langle {{\varvec{x}}}_i,{{\varvec{x}}}_j\rangle =C_{i,j}=-\frac{1}{k}\) for all \(i\ne j\) is equivalent to saying that \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_k\) are k points of a regular \(k+1\)-simplex whose center is the origin. The missing \(k+1^{\mathrm {th}}\) point of the simplex is \(-\sum _i {{\varvec{x}}}_i\), i.e. the negative of the sum (see Fig. 1).
A corollary of our concentration result is the following formula for the expected size of the output lists in the approximate k-List problem.
Corollary 1
Let k, t be fixed. Then the expected number of solutions to the approximate k-List problem with input lists of length \(\left| L\right| \) is
Proof
By Theorems 1 and 2, the probability that any k-tuple is a solution is given by \(\widetilde{\mathcal {O}}(\det (C_{{\text {Bal}},t})^{\frac{n}{2}})\). The claim follows immediately.
In particular, this allows us to prove the following conjecture of [4]:
Theorem 3
Let k be fixed and \(t=1\). If in the approximate k-List problem, the length \(\left| L\right| \) of each input list is equal to the expected length of the output list, then \(\left| L\right| =\widetilde{\mathcal {O}}\Bigl (\Bigl ( \frac{ k^{\frac{k}{k-1}} }{k+1} \Bigr )^{\frac{n}{2}} \Bigr ).\)
Proof
This follows from simple algebraic manipulation of (6).
Our concentration result shows that it is enough to solve the configuration problem for \(C_{{\text {Bal}},t}\).
Corollary 2
Let k, t be fixed. Then the approximate k-List problem with target length t can be solved in essentially the same time as the k-List configuration problem with target configuration \(C_{{\text {Bal}},t}\) for any fixed \(\varepsilon >0\).
Proof
On input \(L_1,\ldots ,L_k\), solve the k-List configuration problem with target configuration \(C_{{\text {Bal}},t}\). Restrict to those solutions whose sum has length at most t. By Theorem 2, this will find all but an exponentially small fraction of solutions to the approximate k-List problem. Since we only need to output a \(1-o(1)\)-fraction of the solutions, this solves the problem.
4 Algorithm
In this section we present our algorithm for the Configuration problem (Definition 3). On input it receives k lists \(L_1, \ldots , L_k\), a target configuration \(C\) in the form of a Gram matrix \(C_{i,j}=\langle {{\varvec{x}}}_i,{{\varvec{x}}}_j\rangle \in {{\mathbb R}}^{k \times k}\) and a small \(\varepsilon >0\). The algorithm proceeds as follows: it picks an \({{\varvec{x}}}_1 \in L_1\) and filters all the remaining lists with respect to the values \(\langle {{\varvec{x}}}_1,{{\varvec{x}}}_i\rangle \) for all \(2 \le i \le k\). More precisely, \({{\varvec{x}}}_i \in L_i\) ‘survives’ the filter if \(\left| \langle {{\varvec{x}}}_1,{{\varvec{x}}}_i\rangle - C_{1,i}\right| \le \varepsilon \). We put such an \({{\varvec{x}}}_i\) into \(L_i^{(1)}\) (the superscript indicates how many filters were applied to the original list \(L_i\)). On this step, all the k-tuples of the form \(({{\varvec{x}}}_1, {{\varvec{x}}}_2, \ldots , {{\varvec{x}}}_k) \in \{{{\varvec{x}}}_1\} \times L_2^{(1)} \times \ldots \times L_k^{(1)}\) with a fixed first component \({{\varvec{x}}}_1\) partially match the target configuration: all scalar products involving \({{\varvec{x}}}_1\) are as desired. In addition, the lists \(L_i^{(1)}\) become much shorter than the original ones.
Next, we choose an \({{\varvec{x}}}_2 \in L_2^{(1)}\) and create smaller lists \(L_i^{(2)}\) from \(L_i^{(1)}\) by filtering out all the \({{\varvec{x}}}_i \in L_i^{(1)}\) that do not satisfy \(\left| \langle {{\varvec{x}}}_2,{{\varvec{x}}}_i\rangle - C_{2,i}\right| \le \varepsilon \) for all \(3 \le i \le k\). A tuple of the form \(({{\varvec{x}}}_1, {{\varvec{x}}}_2, {{\varvec{x}}}_3, \ldots , {{\varvec{x}}}_k) \in \{{{\varvec{x}}}_1\} \times \{{{\varvec{x}}}_2\} \times L_3^{(2)} \times \ldots \times L_k^{(2)}\) satisfies the target configuration \(C_{i,j}\) for \(i=1,2\). We proceed with this list-filtering strategy until we have fixed all \({{\varvec{x}}}_i\) for \(1\le i \le k\). We output all such k-tuples. Note that our algorithm becomes the trivial brute-force algorithm once we are down to 2 lists to be processed. As soon as we have fixed \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_{k-2}\) and created \(L_{k-1}^{(k-2)},L_{k}^{(k-2)}\), our algorithm iterates over \(L_{k-1}^{(k-2)}\) and checks the scalar product with every element from \(L_{k}^{(k-2)}\).
Our algorithm is detailed in Algorithm 1 and illustrated in Fig. 2a.
k-List algorithms for the configuration problem. Left: Our Algorithm 1. Right: k-tuple sieve algorithm of [4].

5 Analysis
In this section we analyze the complexity of Algorithm 1 for the Configuration problem. First, we should mention that the memory complexity is completely determined by the input list-sizes \(\left| L_i\right| \) (remember that we restrict to constant k) and it does not change the asymptotics when we apply k filters. In practice, all intermediate lists \(L_i^{(j)}\) can be implemented by storing pointers to the elements of the original lists.
In the following, we compute the expected sizes of filtered lists \(L_i^{(j)}\) and establish the expected running time of Algorithm 1. Since our algorithm has an exponential running time of \(2^{cn}\) for some \(c = \varTheta (1)\), we are interested in determining c (which depends on k) and we ignore polynomial factors, e.g. we do not take into account time spent for computing inner products.
Theorem 4
Let k be fixed. Algorithm 1 given as input k lists \(L_1, \ldots , L_k \subset \mathsf {S}^{n}\) of the same size \(\left| L\right| \), a target balanced configuration \(C_{{\text {Bal}},t}\in {{\mathbb R}}^{k \times k}\), a target length \(0< t < \sqrt{k}\), and \(\varepsilon > 0\), outputs the list \(L_{{\text {out}}}\) of solutions to the Configuration problem. The expected running time of Algorithm 1 is
In particular, for \(t=1\) and \(\left| L_{{\text {out}}}\right| = \left| L\right| \) it holds that
Remark 2
In the proof below we also show that the expected running time of the k-List algorithm presented in [4] is (see also Fig. 3 for a comparison) for \(t=1,\left| L_{{\text {out}}}\right| =\left| L\right| \)
Corollary 3
For \(k=3\), \(t=1\), and \(\left| L\right| = \left| L_{{\text {out}}}\right| \) (the most interesting setting for SVP), Algorithm 1 has running time
requiring \(\left| L\right| = 2^{0.1887n+o(n)}\) memory.
Running exponents scaled by \(1{\slash }n\) for the target length \(t=1\). For \(k=2\), both algorithms are the Nguyen-Vidick sieve [18] with \(\log (T)/n = 0.415\) (naive brute-force over two lists). For \(k=3\), Algorithm 1 achieves \(\log (T)/n = 0.3962\).
Proof
(Proof of Theorem 4 ). The correctness of the algorithm is straightforward: let us associate the lists \(L^{(i)}\) with a level i where i indicates the number of filtering steps applied to L (we identify the input lists with the \(0^{{\mathrm {th}}}\) level: \(L_i=L^{(0)}_i\)). So for executing the filtering for the \(i^{{\mathrm {th}}}\) time, we choose an \({{\varvec{x}}}_{i} \in L_{i}^{(i-1)}\) that satisfies the condition \(\left| \langle {{\varvec{x}}}_i,{{\varvec{x}}}_{i-1}\rangle - C_{i,i-1}\right| \le \varepsilon \) (for a fixed \({{\varvec{x}}}_{i-1}\)) and append to a previously obtained \((i-1)\)-tuple \(({{\varvec{x}}}_1, \ldots , {{\varvec{x}}}_{i-1})\). Thus on the last level, we put into \(L_{{\text {out}}}\) a k-tuple \(({{\varvec{x}}}_1, \ldots , {{\varvec{x}}}_{k})\) that is a solution to the Configuration problem.
Let us first estimate the size of the list \(L_i^{(i-1)}\) output by the filtering process applied to the list \(L_i^{(i-2)}\) for \(i>1\) (i.e. the left-most lists on Fig. 2a). Recall that all elements \({{\varvec{x}}}_i \in L_i^{(i-1)}\) satisfy \(\left| \langle {{\varvec{x}}}_i,{{\varvec{x}}}_{j}\rangle - C_{i,j}\right| \le \varepsilon , \; 1 \le j \le i-1 \). Then the total number of i-tuples \(({{\varvec{x}}}_1, {{\varvec{x}}}_2, \ldots , {{\varvec{x}}}_i) \in L_1 \times L_2^{(1)} \times \ldots \times L_i^{(i-1)}\) considered by the algorithm is determined by the probability that in a random i-tuple, all pairs \(({{\varvec{x}}}_j, {{\varvec{x}}}_{j'}), 1 \le j,j' \le i\) satisfy the inner product constraints given by \(C_{j,{j'}}\). This probability is given by Theorem 1 and since the input lists are of the same size \(\left| L\right| \), we haveFootnote 1
where \(\det (C[1 \ldots i])\) denotes the i-th principal minor of \(C\). Using (11) for two consecutive values of i and dividing, we obtain
Note that these expected list sizes can be smaller than 1. This should be thought of as the inverse probability that the list is not empty. Since we target a balanced configuration \(C_{{\text {Bal}},t}\), the entries of the input Gram matrix are specified by Theorem 2 and, hence, we compute the determinants in the above quotient by applying Lemma 1 for \(a = \frac{t^k-k}{k^2-k}\). Again, from the shape of the Gram matrix \(C_{{\text {Bal}},t}\) and the equal-sized input lists, it follows that the filtered list on each level are of the same size: \(\vert L_{i+1}^{(i)}\vert = \vert L_{i+2}^{(i)}\vert = \ldots = \vert L_k^{(i)}\vert \). Therefore, for all filtering levels \(0 \le j \le k-1\) and for all \(j+1 \le i \le k\),
Now let us discuss the running time. Clearly, the running time of Algorithm 1 is (up to subexponential factors in n)
Multiplying out and observing that \(\vert L_k^{(k-2)}\vert >\vert L_k^{(k-1)}\vert \), so we may ignore the very last term, we deduce that the total running time is (up to subexponential factors) given by
where \(\vert L^{(j)}\vert \) is the size of any filtered list on level j (so we omit the subscripts). Consider the value \(i_{{\text {max}}}\) of i where the maximum is attained in the above formula. The meaning of \(i_{{\text {max}}}\) is that the total cost over all loops to create the lists \(L_j^{(i_{{\text {max}}})}\) is dominating the running time. At this level, the lists \(L_j^{(i_{{\text {max}}})}\) become small enough such that iterating over them (i.e. creation of \(L_j^{(i_{{\text {max}}}+1)}\)) does not contribute asymptotically. Plugging in Eqs. (11) and (12) into (14), we obtain
Using Lemma 1, we obtain the desired expression for the running time.
For the case \(t=1\) and \(\vert L_{{\text {out}}}\vert = \vert L\vert \), the result of Theorem 3 on the size of the input lists \(\left| L\right| \) yields a compact formula for the filtered lists:
Plugging this into either (14) or (15), the running time stated in (8) easily follows.
It remains to show the complexity of the BLS algorithm [4], claimed in Remark 2. We do not give a complete description of the algorithm but illustrate it in Fig. 2b. We change the presentation of the algorithm to our configuration setting: in the original description, a vector \({{\varvec{x}}}_i\) survives the filter if it satisfies \(\left| \langle {{\varvec{x}}}_i,{{\varvec{x}}}_1 + \ldots + {{\varvec{x}}}_{i-1}\rangle \right| \ge c_i\) for a predefined \(c_i\) (a sequence \((c_1, \ldots , c_{k-1}) \in {{\mathbb R}}^{k-1}\) is given as input to the BLS algorithm). Our concentration result (Theorem 1) also applies here and the condition \(\left| \langle {{\varvec{x}}}_i,{{\varvec{x}}}_1 + \ldots + {{\varvec{x}}}_{i-1}\rangle \right| \ge c_i\) is equivalent to a pairwise constraint on the \(\langle {{\varvec{x}}}_i,{{\varvec{x}}}_j\rangle \) up to losing an exponentially small fraction of solutions. The optimal sequence of \(c_i\)’s corresponds to the balanced configuration \(C_{{\text {Bal}},t}\) derived in Theorem 2. Indeed, Table 1 in [4] corresponds exactly to \(C_{{\text {Bal}},t}\) for \(t=1\). So we may rephrase their filtering where instead of shrinking the list \(L_i\) by taking inner products with the sum \({{\varvec{x}}}_1+ \ldots + {{\varvec{x}}}_{i-1}\), we filter \(L_i\) gradually by considering \(\langle {{\varvec{x}}}_i,{{\varvec{x}}}_j\rangle \) for \(1 \le j \le i-1\).
It follows that the filtered lists \(L^{(i)}\) on level i are of the same size (in leading order) for both our and BLS algorithms. In particular, Eq. (12) holds for the expected list-sizes of the BLS algorithm. The crucial difference lies in the construction of these lists. To construct the list \(L_i^{(i-1)}\) in BLS, the filteringprocedure is applied not to \(L_i^{(i-2)}\), but to a (larger) input-list \(L_i\). Hence, the running time is (cf. (14)), ignoring subexponential factors
The result follows after substituting (16) into the above product.
6 Configuration Extension
For \(k=2\), the asymptotically best algorithm with running time \(T=\bigl (\frac{3}{2}\bigr )^{\frac{n}{2}}\) for \(t=1\) is due to [5], using techniques from Locally Sensitive Hashing. We generalize this to what we call Configuration Extension. To explain the LSH technique, consider the (equivalent) approximate 2-List problem with \(t=1\), where we want to bound the norm of the difference \(\Vert {{\varvec{x}}}_1-{{\varvec{x}}}_2\Vert ^2\le 1\) rather than the sum, i.e. we want to find points that are close. The basic idea is to choose a family of hash functions \(\mathscr {H}\), such that for \(h\in \mathscr {H}\), the probability that \(h({{\varvec{x}}}_1)=h({{\varvec{x}}}_2)\) is large if \({{\varvec{x}}}_1\) and \({{\varvec{x}}}_2\) are close, and small if they are far apart. Using such an \(h\in \mathscr {H}\), we can bucket our lists according to h and then only look for pairs \({{\varvec{x}}}_1,{{\varvec{x}}}_2\) that collide under h. Repeat with several \(h\in \mathscr {H}\) as appropriate to find all/most solutions. We may view such an \(h\in \mathscr {H}\) as a collection of preimages \(D_{h,z}=h^{-1}(z)\) and the algorithm first determines which elements \({{\varvec{x}}}_1,{{\varvec{x}}}_2\) are in some given \(D_{h,z}\) (filtering the list using \(D_{h,z}\)) and then searches for solutions only among those. Note that, conceptually, we only really need the \(D_{h,z}\) and not the functions h. Indeed, there is actually no need for the \(D_{h,z}\) to be a partition of \(\mathsf {S}^{n}\) for given h, and h need not even exist. Rather, we may have an arbitrary collection of sets \(D^{(r)}\), with r belonging to some index set. The existence of functions h would help in efficiency when filtering. However, [5] (and also [16], stated for the \(\ell _1\)-norm) give a technique to efficiently construct and apply filters \(D^{(r)}\) without such an h in an amortized way.
The natural choice for \(D^{(r)}\) is to choose all points with distance at most d for some \(d>0\) from some reference point \({{\varvec{v}}}^{(r)}\) (that is typically not from any \(L_i\)). This way, a random pair \({{\varvec{x}}}_1,{{\varvec{x}}}_2\in D^{(r)}\) has a higher chance to be close to each other than uniformly random points \({{\varvec{x}}}_1,{{\varvec{x}}}_2\in \mathsf {S}^{n}\). Notationally, let us call (a description of) \(D^{(r)}\) together with the filtered lists an instance, where \(1\le r \le R\) and R is the number of instances.
In our situation, we look for small sums rather than small differences. The above translates to asking that \({{\varvec{x}}}_1\) is close to \({{\varvec{v}}}^{(r)}\) and that \({{\varvec{x}}}_2\) is far apart from \({{\varvec{v}}}^{(r)}\) (or, equivalently, that \({{\varvec{x}}}_2\) is close to \(-{{\varvec{v}}}^{(r)}\)). In general, one may (for \(k>2\)) consider not just a single \({{\varvec{v}}}^{(r)}\) but rather several related \({{\varvec{v}}}_1^{(r)},\ldots ,{{\varvec{v}}}^{(r)}_{m}\). So an instance consists of \(m\) points \({{\varvec{v}}}^{(r)}_1,\ldots ,{{\varvec{v}}}^{(r)}_m\) and shrunk lists \(L^{\prime (r)}_i\) where \(L^{\prime (r)}_i\subset L_i\) is obtained by taking those \(x_i\in L_i\) that have some prescribed distances \(d_{i,j}\) to \({{\varvec{v}}}_j^{(r)}\). Note that the \(d_{i,j}\) may depend on i and so need not treat the lists symmetrically. As a consequence, it does no longer make sense to think of this technique in terms of hash collisions in our setting.
We organize all the distances between \({{\varvec{v}}}\)’s and \({{\varvec{x}}}\)’s that occur into a single matrix \(C\) (i.e. a configuration) that governs the distances between \({{\varvec{v}}}\)’s and \({{\varvec{x}}}\)’s: the \(\langle {{\varvec{v}}}_j,{{\varvec{v}}}_{j'}\rangle \)-entries of \(C\) describe the relation between the \({{\varvec{v}}}\)’s and the \(\langle {{\varvec{x}}}_i,{{\varvec{v}}}_j\rangle \)-entries of \(C\) describe the \(d_{i,j}\). The \(\langle {{\varvec{x}}}_i,{{\varvec{x}}}_{i'}\rangle \)-entries come from the approximate k-List problem we want to solve. While not relevant for constructing actual \({{\varvec{v}}}^{(r)}_j\)’s and \(L^{\prime (r)}_i\)’s, the \(\langle {{\varvec{x}}}_i,{{\varvec{x}}}_{i'}\rangle \)-entries are needed to choose the number R of instances.
For our applications to sieving, the elements from the input list \(L_i\) may possibly be not uniform from all of \(\mathsf {S}^{n}\) due to previous processing of the lists. Rather, the elements \({{\varvec{x}}}_i\) from \(L_i\) have some prescribed distance \(d_{i,j}\) to (known) \({{\varvec{v}}}_j\)’s: e.g. in Algorithm 1, we fix \({{\varvec{x}}}_1\in L_1\) that we use to filter the remaining \(k-1\) lists; we model this by taking \({{\varvec{x}}}_1\) as one of the \({{\varvec{v}}}_j\)’s (and reducing k by 1). Another possibility is that we use configuration extension on lists that are the output of a previous application of configuration extension.
In general, we consider “old” points \({{\varvec{v}}}_j\) and wish to create “new” points \({{\varvec{v}}}_\ell \), so we have actually three different types of rows/columns in \(C\), corresponding to the list elements, old and new points.
Definition 4 (Configuration Extension)
Consider a configuration matrix \(C\). We consider \(C\) as being indexed by disjoint sets \(I_{{\text {lists}}},I_{{\text {old}}},I_{{\text {new}}}\). Here, \(\left| I_{{\text {lists}}}\right| =k\) corresponds to the input lists, \(\left| I_{{\text {old}}}\right| =m_{{\text {old}}}\) corresponds to the “old” points, \(\left| I_{{\text {new}}}\right| =m_{{\text {new}}}\) corresponds to the “new” points. We denote appropriate square submatrices by \(C[I_{{\text {lists}}}]\) etc. By configuration extension, we mean an algorithm \({\mathsf {ConfExt}}\) that takes as input k exponentially large lists \(L_i\subset \mathsf {S}^{n}\) for \(i\in I_{{\text {lists}}}\), \(m_{{\text {old}}}\) “old” points \({{\varvec{v}}}_j\in \mathsf {S}^{n}\), \(j\in I_{{\text {old}}}\) and the matrix \(C\). Assume that each input list separately satisfies the given configuration constraints wrt. the old points: \({\mathrm{Conf}}\,( {{\varvec{x}}}_i, ({{\varvec{v}}}_j)_{j\in I_{{\text {old}}}})\approx C[i,I_{{\text {old}}}]\) for \(i\in I_{{\text {lists}}}\), \({{\varvec{x}}}_i\in L_i\).
It outputs R instances, where each instance consists of \(m_{{\text {new}}}\) points \({{\varvec{v}}}_\ell \), \(\ell \in I_{{\text {new}}}\) and shrunk lists \(L_i'\subset L_i\), where \({\mathrm{Conf}}\,( ({{\varvec{v}}}_j)_{j\in I_{{\text {old}}}},({{\varvec{v}}}_\ell )_{\ell \in I_{{\text {new}}}})\approx C[I_{{\text {old}}},I_{{\text {new}}}]\) and each \({{\varvec{x}}}'_i\in L'_i\) satisfies
The instances are output one-by-one in a streaming fashion. This is important, since the total size of the output usually exceeds the amount of available memory.
The naive way to implement configuration extension is as follows: independently for each instance, sample uniform \({{\varvec{v}}}_\ell \)’s conditioned on the given constraints and then make a single pass over each input list \(L_i\) to construct \(L'_i\). This would require \(\widetilde{\mathcal {O}}(\max _i \left| L_i\right| \cdot R)\) time. However, using the block coding/stripe techniques of [5, 16], one can do much better. The central observation is that if we subdivide the coordinates into blocks, then a configuration constraint on all coordinates is (up to losing a subexponential fraction of solutions) equivalent to independent configuration constraints on each block. The basic idea is then to construct the \({{\varvec{v}}}_\ell \)’s in a block-wise fashion such that an exponential number of instances have the same \({{\varvec{v}}}_\ell \)’s on a block of coordinates. We can then amortize the construction of the \(L'_i\)’s among such instances, since we can first construct some intermediate \(L_i''\subset L_i\) that is compatible with the \({{\varvec{v}}}_\ell \)’s on the shared block of coordinates. To actually construct \(L'_i\subset L''_i\), we only need to pass over \(L''_i\) rather than \(L_i\). Of course, this foregos independence of the \({{\varvec{v}}}_\ell \)’s across different instances, but one can show that they are still independent enough to ensure that we will find most solutions if the number of instances is large enough.
Adapting these techniques of [5, 16] to our framework is straightforward, but extremely technical. We work out the details in the full version of the paper.
A rough summary of the properties of our Configuration Extension Algorithm \({\mathsf {ConfExt}}\) (see the full version for a proof) is given by the following:
Theorem 5
Use notation as in Definition 4. Assume that \(C,k,m_{{\text {old}}},m_{{\text {new}}}\) do not depend on n. Then our algorithm \({\mathsf {ConfExt}}\), given as input \(C,k,m_{{\text {old}}},m_{{\text {new}}}\), old points \({{\varvec{v}}}_j\) and exponentially large lists \(L_1,\ldots ,L_k\) of points from \(\mathsf {S}^{n}\), outputs
instances, where each output instance consists of \(m_{{\text {new}}}\) points \({{\varvec{v}}}_\ell \) and sublists \(L'_i\subset L_i\). In each such output instance, the new points \(({{\varvec{v}}}_\ell )_{\ell \in I_{{\text {new}}}}\) are chosen uniformly conditioned on the constraints (but not independent across instances). Consider solution k-tuples, i.e. \({{\varvec{x}}}_i\in L_i\) with \({\mathrm{Conf}}\,( ({{\varvec{x}}}_i)_{i\in I_{{\text {lists}}}})\approx C[I_{{\text {lists}}}]\). With overwhelming probability, for every solution k-tuple \(({{\varvec{x}}}_i)_{i\in I_{{\text {lists}}}}\), there exists at least one instance such that all \({{\varvec{x}}}_i\in L'_i\) for this instance, so we retain all solutions. Assume further that the elements from the input lists \(L_i\), \(i\in I_{{\text {lists}}}\) are iid uniformly distributed conditioned on the configuration \({\mathrm{Conf}}\,({{\varvec{x}}}_i,({{\varvec{v}}}_j)_{j\in I_{{\text {old}}}})\) for \({{\varvec{x}}}_i\in L_i\), which is assumed to be compatible with \(C\). Then the expected size of the output lists per instance is given by
Assume that all these expected output list sizes are exponentially increasing in n (rather than decreasing). Then the running time of the algorithm is given by \(\widetilde{\mathcal {O}}(R\cdot \max _i \mathbb {E}[\left| L'_i\right| ])\) (essentially the size of the output) and the memory complexity is given by \(\widetilde{\mathcal {O}}(\max _i \left| L_i\right| )\) (essentially the size of the input).
7 Improved k-List Algorithm with Configuration Extension
Now we explain how to use the Configuration Extension Algorithm within the k-List Algorithm 1 to speed-up the search for configurations. In fact, there is a whole family of algorithms obtained by combining \(\mathsf {Filter}\) from Algorithm 1 and the configuration extension algorithm \({\mathsf {ConfExt}}\). The combined algorithm is given in Algorithm 2.
Recall that Algorithm 1 takes as inputs k lists \(L_1,\ldots , L_k\) of equal size and processes the lists in several levels (cf. Fig. 2a). The lists \(L_j^{(i)}\) for \(j\ge i\) at the \(i^{{\mathrm {th}}}\) level (where the input lists correspond to the \(0^{{\mathrm {th}}}\) level) are obtained by brute-forcing over \({{\varvec{x}}}_i\in L_i^{(i-1)}\) and running \(\mathsf {Filter}\) on \(L_j^{(i-1)}\) and \({{\varvec{x}}}_i\).
We can use \({\mathsf {ConfExt}}\) in the following way: before using \(\mathsf {Filter}\) on \(L_j^{(i-1)}\), we run \({\mathsf {ConfExt}}\) to create R instances with smaller sublists \(L_j^{\prime (i-1)}\subset L_j^{(i-1)}\). We then apply \(\mathsf {Filter}\) to each of these \(L_j^{\prime (i-1)}\) rather than to \(L_j^{(i-1)}\). The advantage is that for a given instance, the \(L_j^{\prime (i-1)}\) are dependent (over the choice of j), so we expect a higher chance to find solutions.
In principle, one can use \({\mathsf {ConfExt}}\) on any level, i.e. we alternate between using \({\mathsf {ConfExt}}\) and \(\mathsf {Filter}\). Note that the \({{\varvec{x}}}_i\)’s that we brute-force over in order to apply \(\mathsf {Filter}\) become “old” \({{\varvec{v}}}_j\)’s in the context of the following applications of \({\mathsf {ConfExt}}\).
It turns out that among the variety of potential combinations of \(\mathsf {Filter}\) and \({\mathsf {ConfExt}}\), some are more promising than others. From the analysis of Algorithm 1, we know that the running time is dominated by the cost of filtering (appropriately multiplied by the number of times we need to filter) to create lists at some level \(i_{{\text {max}}}\). The value of \(i_{{\text {max}}}\) can be deduced from Eq. (14), where the individual contribution \(\vert L\vert \cdot \vert L^{(i-1)}\vert \cdot \prod _{j=1}^{i-1}\vert L^{(j)}\vert \) in that formula exactly corresponds to the total cost of creating all lists at the i-th level.
It makes sense to use \({\mathsf {ConfExt}}\) to reduce the cost of filtering at this critical level. This means that we use \({\mathsf {ConfExt}}\) on the lists \(L_j^{(i_{{\text {max}}}-1)}\), \(j\ge i_{{\text {max}}}-1\). Let us choose \(m_{{\text {new}}}=1\) new point \({{\varvec{v}}}_\ell \). The lists \(L_j^{(i_{{\text {max}}}-1)}\) are already reduced by enforcing configuration constraints with \({{\varvec{x}}}_1\in L_1,\ldots , {{\varvec{x}}}_{i_{{\text {max}}}-1}\in L_{i_{{\text {max}}}-1}\) from previous applications of \(\mathsf {Filter}\). This means that the \({{\varvec{x}}}_1,\ldots ,{{\varvec{x}}}_{i_{{\text {max}}}-1}\) take the role of “old” \({{\varvec{v}}}_j\)’s in \({\mathsf {ConfExt}}\). The configuration \(C^{{\tiny ext}}\in {{\mathbb R}}^{(k+1)\times (k+1)}\) for \({\mathsf {ConfExt}}\) is obtained as follows: The \(C^{{\tiny ext}}[I_{{\text {lists}}},I_{{\text {old}}}]\)-part is given by the target configuration. The rest (which means the last row/column corresponding to the single “new” point) can be chosen freely and is subject to optimization. Note that the optimization problem does not depend on n.
This approach is taken in Algorithm 2. Note that for levels below \(i_{{\text {max}}}\), it does not matter whether we continue to use our \(\mathsf {Filter}\) approach or just brute-force: if \(i_{{\text {max}}}=k\), there are no levels below. If \(i_{{\text {max}}}<k\), the lists are small from this level downward and brute-force becomes cheap enough not to affect the asymptotics.
Let us focus on the case where the input list sizes are the same as the output list sizes, which is the relevant case for applications to Shortest Vector sieving. It turns out (numerically) that in this case, the approach taken by Algorithm 2 is optimal for most values of k. The reason is as follows: Let T be the contribution to the running time of Algorithm 1 from level \(i_{{\text {max}}}\), which is asymptotically the same as the total running time. The second-largest contribution, denoted \(T'\) comes from level \(i_{{\text {max}}}-1\). The improvement in running time from using \({\mathsf {ConfExt}}\) to reduce T decreases with k and is typically not enough to push it below \(T'\). Consequently, using \({\mathsf {ConfExt}}\) between other levels will not help. We also observed that choosing \(m_{{\text {new}}}=1\) was usually optimal for k up to 10. Exceptions to these observations occur when T and \(T'\) are very close (this happens, e.g. for \(k=6\)) or when k is small and the benefit from using \({\mathsf {ConfExt}}\) is large (i.e. \(k=3\)).
Since the case \(k=3\) is particularly interesting for the Shortest Vector sieving (see Sect. 7.2), we present the 3-List algorithm separately in Sect. 7.1.

7.1 Improved 3-List Algorithm
The case \(k=3\) stands out from the above discussion as one can achieve a faster algorithm running the Configuration Extension Algorithm on two points \({{\varvec{v}}}_1, {{\varvec{v}}}_2\). This case is also interesting in applications to lattice sieving, so we detail on it below.
From Eq. (14) we have \(i_{{\text {max}}}=2\), or more precisely, the running time of the 3-List algorithm (without Configuration Extension) is \(T = |L_1| \cdot |L_2^{(1)}| \cdot |L_3^{(1)}|\). So we start shrinking the lists right from the beginning which corresponds to \(m_{{\text {old}}}= 0\). For the balance configuration as the target, we have \(C[I_{{\text {lists}}}] = -1/3\) on the off-diagonals. With the help of an optimization solver, we obtain the optimal values for \(\langle {{\varvec{x}}}_i,{{\varvec{v}}}_j\rangle \) for \(i=\{1,2,3\}\) and \(j=\{1,2\}\), and for \(\langle {{\varvec{v}}}_1,{{\varvec{v}}}_2\rangle \) (there are 7 values to optimize for), so the input to the Configuration Extension Algorithm is determined. The target configuration is of the form
and the number of instances is given by \(R = \widetilde{\mathcal {O}}(1.4038^n)\) according to (17). The algorithm runs in a streamed fashion: the lists \(L_1', L_2', L_3'\) in line 2 of Algorithm 3 are obtained instance by instance and, hence, lines 3 to 9 are repeated R times.

From Theorem 3, it follows that if the input lists satisfy \(|L| = 2^{0.1887n + o(n)}\), then we expect \(|L_{{\text {out}}}| = |L|\). Also from Eq. (8), it follows that the 3-List Algorithm 1 (i.e. without combining with the Configuration Extension Algorithm) has running time of \(2^{0.3962n + o(n)}\). The above Algorithm 3 brings it down to \(2^{0.3717n + o(n)}\).
7.2 Application to the Shortest Vector Problem
In this section we briefly discuss how certain shortest vector algorithms can benefit from our improvement for the approximate k-List problem. We start by stating the approximate shortest vector problem.
On input, we are given a full-rank lattice \(\mathcal {L}(B)\) described by a matrix \(B \in {{\mathbb R}}^{n \times n}\) (with polynomially-sized entries) whose columns correspond to basis vectors, and some constant \(\mathsf {c}\ge 1\). The task is to output a nonzero lattice vector \({{\varvec{x}}}\in \mathcal {L}(B)\), s.t. \( \Vert {{\varvec{x}}}\Vert \le \mathsf {c} \lambda _1 (B)\) where \(\lambda _1 (B)\) denotes the length of the shortest nonzero vector in \(\mathcal {L}(B)\). \({{\varvec{x}}}\) is a solution to the approximate shortest vector problem.
The AKS sieving algorithm (introduced by Ajtai, Kumar, and Sivakumar in [1]) is currently the best (heuristic) algorithm for the approximate shortest vector problem: for an n-dimensional lattice, the running time and memory are of order \(2^n\). Sieving algorithms have two flavours: the Nguyen-Vidick sieve [18] and the Gauss sieve [17]. Both make polynomial in n number of calls to the approximate 2-List solver. Without LSH-techniques, the running time both the Nguyen-Vidick and the Gauss sieve is the running time of the approximate 2-List algorithm: \(2^{0.415 n + o(n)}\) with \(2^{0.208 n + o(n)}\) memory. Using our 3-List Algorithm 1 instead, the running time can be reduced to \(2^{0.3962n + o(n)}\) (with only \(2^{0.1887n + o(n)}\) memory) introducing essentially no polynomial overhead. Using Algorithm 3, we achieve even better asymptotics: \(2^{0.3717n + o(n)}\), but it might be too involved for practical speed-ups due very large polynomial overhead for too little exponential gain in realistic dimensions.
Now we describe the Nguyen-Vidick sieve that uses a k-List solver as a main subroutine (see [4] for a more formal description). We start by sampling lattice-vectors \({{\varvec{x}}}\in \mathcal {L}(B) \cap \mathsf {B}_{n}{(2^{O(n)} \cdot \lambda _1(B))}\), where \(\mathsf {B}_{n}{(R)}\) denotes an n-dimensional ball of radius R. This can be done using, for example, Klein’s nearest plane procedure [11]. In the k-List Nguyen-Vidick for \(k > 2\), we sample many such lattice-vectors, put them in a list L, and search for k-tuples \({{\varvec{x}}}_1, \ldots , {{\varvec{x}}}_k \in L \times \ldots \times L\) s.t. \(\Vert {{\varvec{x}}}_1 + \ldots + {{\varvec{x}}}_k \Vert \le \gamma \cdot \max _{1 \le i \le k}{{{\varvec{x}}}_i}\) for some \(\gamma < 1\). The sum \({{\varvec{x}}}_1 + \ldots + {{\varvec{x}}}_k\) is put into \(L_{{\text {out}}}\). The size of L is chosen in a way to guarantee that \(|L| \approx |L_{{\text {out}}}|\). The search for short k-tuples is repeated over the list \(L_{{\text {out}}}\). Note that since with each new iteration we obtain vectors that are shorter by a constant factor \(\gamma \), starting with \(2^{\mathcal {O}(n)}\) approximation to the shortest vector (this property is guaranteed by Klein’s sampling algorithm applied to an LLL-reduced basis), we need only linear in n iterations to find the desired \({{\varvec{x}}}\in \mathcal {L}(B)\).
Naturally, we would like to apply our approximate k-List algorithm to k copies of the list L to implement the search for short sums. Indeed, we can do so by making a commonly used assumption: we assume the lattice-vectors we put into the lists lie uniformly on a spherical shell (on a very thin shell, essentially a sphere). The heuristic here is that it does not affect the behaviour of the algorithm. Intuitively, the discreteness of a lattice should not be “visible” to the algorithm (at least not until we find the approximate shortest vector).
We conclude by noting that our improved k-List Algorithm can as well be used within the Gauss sieve, which is known to perform faster in practice than the Nguyen-Vidick sieve. An iteration of the original 2-Gauss sieve as described in [17], searches for pairs \(({{\varvec{p}}}, {{\varvec{v}}})\), s.t. \(\Vert {{\varvec{p}}}+ {{\varvec{v}}}\Vert < \max \{\Vert {{\varvec{p}}}\Vert , \Vert {{\varvec{v}}}\Vert \}\), where \({{\varvec{p}}}\in \mathcal {L}(B)\) is fixed, \({{\varvec{v}}}\in L \subset \mathcal {L}(B)\), and \({{\varvec{p}}}\ne {{\varvec{v}}}\). Once such a pair is found and \(\Vert {{\varvec{p}}}\Vert > \Vert {{\varvec{v}}}\Vert \), we set \({{\varvec{p}}}' \leftarrow {{\varvec{p}}}+ {{\varvec{v}}}\) and proceed with the search over \(({{\varvec{p}}}', {{\varvec{v}}})\), otherwise if \(\Vert {{\varvec{p}}}\Vert < \Vert {{\varvec{v}}}\Vert \), we delete \({{\varvec{v}}}\in L\) and store the sum \( {{\varvec{p}}}+ {{\varvec{v}}}\) as \({{\varvec{p}}}\)-input point for the next iteration. Once no pair is found, we add \({{\varvec{p}}}'\) to L. On the next iteration, the search is repeated with another \({{\varvec{p}}}\) which is obtained either by reducing some deleted \({{\varvec{v}}}\in L\) before, or by sampling from \(\mathcal {L}(B)\). The idea is to keep only those vectors in L that cannot form a pair with a shorter sum. Bai, Laarhoven, and Stehlé in [4], generalize it to k-Gauss sieve by keeping only those vectors in L that do not form a shorter k-sum. In the language of configuration search, we look for configurations \(({{\varvec{p}}}, {{\varvec{v}}}_1, \ldots , {{\varvec{v}}}_{k-1}) \in \{{{\varvec{p}}}\} \times L \times \ldots \times L\) where the first point is fixed, so we apply our Algorithm 1 on \(k-1\) (identical) lists.
Unfortunately, applying LSH/configuration extension-techniques for the Gauss Sieve is much more involved than for the Nguyen-Vidick Sieve. For \(k=2\), [13] applies LSH techniques, but this requires an exponential increase in memory (which runs counter to our goal). We do not know whether these techniques extend to our setting. At any rate, since the gain from LSH/Configuration Extension techniques decreases with k (with the biggest jump from \(k=2\) to \(k=3\)), while the overhead increases, gaining a practical speed-up from LSH/Configuration Extension within the Gauss sieve for \(k\ge 3\) seems unrealistic.
Open Questions. We present all our algorithms for a fixed k, and in the analysis, we suppress all the prefactors (in running time and list-sizes) for fixed k in the \(\mathcal {O}_k(.)\) notation. Taking a closer look at how these factors depend on k, we notice (see, for example, the expression for \(W_{n,k}\) in Theorem 1) that exponents of the polynomial prefactors depend on k. It prevents us from discussing the case \(k \rightarrow \infty \), which is an interesting question especially in light of SVP. Another similar question is the optimal choice of \(\varepsilon \) and how it affects the pre-factors.
8 Experimental Results
We implement the 3-Gauss sieve algorithm in collaboration with S. Bai [3]. The implementation is based on the program developed by Bai, Laarhoven, and Stehlé in [4], making the approaches comparable.
Lattice bases are generated by the SVP challenge generator [7]. It produces a lattice generated by the columns of the matrix
where p is a large prime, and \(x_i< p\) for all i. Lattices of this type are random in the sense of Goldstein and Mayer [9].
For all the dimensions except 80, the bases are preprocessed with BKZ reduction of block-size 20. For \(n=80\), the block-size is 30. For our input lattices, we do not know their minimum \(\lambda _1\). The algorithm terminates when it finds many linearly dependent triples \(({{\varvec{v}}}_1, {{\varvec{v}}}_2, {{\varvec{v}}}_3)\). We set a counter for such an event and terminate the algorithm once this counter goes over a pre-defined threshold. The intuition behind this idea is straightforward: at some point the list L will contain very short basis-vectors and the remaining list-vectors will be their linear combinations. Trying to reduced the latter will ultimately produce the zero-vector. The same termination condition was already used in [15], where the authors experimentally determine a threshold of such “zero-sum” triples.
Up to \(n=64\), the experiments are repeated 5 times (i.e. on 5 random lattices), for the dimensions less than 80, 3 times. For the running times and the list-sizes presented in the table below, the average is taken. For \(n=80\), the experiment was performed once.
Our tests confirm a noticeable speed-up of the 3-Gauss sieve when our Configuration Search Algorithm 1 is used. Moreover, as the analysis suggests (see Fig. 3), our algorithm outperforms the naive 2-Gauss sieve while using much less memory. The results can be found in Table 1.
Another interesting aspect of the algorithm is the list-sizes when compared with BLS. Despite the fact that, asymptotically, the size of the list \(\left| L\right| \) is the same for our and for the BLS algorithms, in practice our algorithm requires a longer list (cf. the right numbers in each column). This is due to the fact that we filter out a larger fraction of solutions. Also notice that increasing \(\varepsilon \) – the approximation to the target configuration, we achieve an additional speed-up. This becomes obvious once we look at the \(\mathsf {Filter}\) procedure: allowing for a smaller inner-product throws away less vectors, which in turn results in a shorter list L. For the range of dimensions we consider, we experimentally found \(\varepsilon =0.3\) to be a good choice.
Notes
- 1.
Throughout this proof, the equations that involve list-sizes \(\left| L\right| \) and running time T are assumed to have \(\widetilde{\mathcal {O}}(\cdot )\) on the right-hand side. We omit it for clarity.
References
Ajtai, M., Kumar, R., Sivakumar, D.: A sieve algorithm for the shortest lattice vector problem. In: Proceedings of STOC, pp. 601–610 (2001)
Akritas, A.G., Akritas, E.K., Malaschonok, G.I.: Symbolic computation, new trends and developments various proofs of Sylvester’s (determinant) identity. Math. Comput. Simul. 42(4), 585–593 (1996)
Bai, S.: Personal Communication, August 2016
Bai, S., Laarhoven, T., Stehlé, D.: Tuple lattice sieving. LMS J. Comput. Math. 19A, 146–162 (2016). doi:10.1112/S1461157016000292. Algorithmic Number Theory Symposium (ANTS) XII
Becker, A., Ducas, L., Gama, N., Laarhoven, T.: New directions in nearest neighbor searching with applications to lattice sieving. In: Krauthgamer, R. (eds.) Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, pp. 10–24. SIAM, 10–12 January 2016
Blum, A., Kalai, A., Wasserman, H.: Noise-tolerant learning, the parity problem, and the statistical query model. J. ACM 50, 506–519 (2003)
SVP Challenge: SVP challenge generator. http://latticechallenge.org/svp-challenge
Ghosh, M., Sinha, B.K.: A simple derivation of the Wishart distribution. Am. Stat. 56(2), 100–101 (2002)
Goldstein, D., Mayer, A.: On the equidistribution of Hecke points. Forum Mathematicum 15(3), 165–189 (2006)
Kannan, R.: Improved algorithms for integer programming and related lattice problems. In: Proceedings of STOC, pp. 193–206 (1983)
Klein, P.: Finding the closest lattice vector when it’s unusually close. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2000, pp. 937–941 (2000)
Kupferman, O., Lustig, Y., Vardi, M.Y.: On locally checkable properties. In: Hermann, M., Voronkov, A. (eds.) LPAR 2006. LNCS (LNAI), vol. 4246, pp. 302–316. Springer, Heidelberg (2006). doi:10.1007/11916277_21
Laarhoven, T.: Sieving for shortest vectors in lattices using angular locality-sensitive hashing. In: Gennaro, R., Robshaw, M. (eds.) CRYPTO 2015. LNCS, vol. 9215, pp. 3–22. Springer, Heidelberg (2015)
Lyubashevsky, V.: The parity problem in the presence of noise, decoding random linear codes, and the subset sum problem. In: Chekuri, C., Jansen, K., Rolim, J.D.P., Trevisan, L. (eds.) APPROX/RANDOM -2005. LNCS, vol. 3624, pp. 378–389. Springer, Heidelberg (2005). doi:10.1007/11538462_32
Mariano, A., Laarhoven, T., Bischof, C.: Parallel (probable) lock-free hash sieve: a practical sieving algorithm for the SVP. In: 44th International Conference on Parallel Processing (ICPP), pp. 590–599, September 2015
May, A., Ozerov, I.: On computing nearest neighbors with applications to decoding of binary linear codes. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 203–228. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46800-5_9
Micciancio, D., Voulgaris, P.: Faster exponential time algorithms for the shortest vector problem. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 1468–1480 (2010)
Nguyen, P.Q., Vidick, T.: Sieve algorithms for the shortest vector problem are practical. J. Math. Crypt. 2, 181–207 (2008)
Wagner, D.: A generalized birthday problem. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 288–304. Springer, Heidelberg (2002). doi:10.1007/3-540-45708-9_19
Wishart, J.: The generalized product moment distribution in samples from a normal multivariate population. Biometrika 20A(1–2), 32–52 (1928)
Acknowledgments
We would like to thank the authors of [4], Shi Bai, Damien Stehlé, and Thijs Laarhoven for constructive discussions.
Elena Kirshanova was supported by UbiCrypt, the research training group 1817/1 funded by the DFG. Gottfried Herold was funded by ERC grant 307952 (acronym FSC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 International Association for Cryptologic Research
About this paper
Cite this paper
Herold, G., Kirshanova, E. (2017). Improved Algorithms for the Approximate k-List Problem in Euclidean Norm. In: Fehr, S. (eds) Public-Key Cryptography – PKC 2017. PKC 2017. Lecture Notes in Computer Science(), vol 10174. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-54365-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-662-54365-8_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-54364-1
Online ISBN: 978-3-662-54365-8
eBook Packages: Computer ScienceComputer Science (R0)