1 Introduction

The rapid development of lattice-based cryptography in recent years has moved the topic from a theoretical corner of cryptography to a leading candidate for post-quantum cryptographyFootnote 1, while also providing advanced cryptographic functionalities like fully homomorphic encryption [Gen09]. Further appealing aspects of lattice-based cryptography are its innate parallelism and that its two foundational hardness assumptions, Short Integer Solution (SIS) and Learning With Errors (LWE), are supported by worst-case to average-case reductions (e.g., [Ajt96, Reg05]).

A very important object in lattice cryptography, and the computational and mathematical aspects of lattices more broadly, is a discrete Gaussian probability distribution, which (informally) is a Gaussian distribution restricted to a particular lattice (or coset thereof). For example, the strongest worst-case to average-case reductions [MR04, GPV08, Reg05] all rely centrally on discrete Gaussians and their nice properties. In addition, much of the development of lattice-based signature schemes, identity-based encryption, and other cryptosystems has centered around efficiently sampling from discrete Gaussians (see, e.g., [GPV08, Pei10, MP12, DDLL13, DLP14, MW17]), as well as the analysis of various kinds of combinations of discrete Gaussians [Pei10, BF11, MP13, AGHS13, AR16, BPMW16, GM18, CGM19, DGPY19].

By now, the literature contains a plethora of theorems about the behavior of discrete Gaussians in a variety of contexts, e.g., “convolution theorems” about sums of independent or dependent discrete Gaussians. Despite the close similarities between the proof approaches and techniques employed, these theorems are frequently incomparable and are almost always proved monolithically and nearly “from scratch.” This state of affairs makes it unnecessarily difficult to understand the existing proofs, and to devise and prove new theorems when the known ones are inadequate. Because of the structural similarities among so many of the existing theorems and their proofs, a natural question is whether there is some “master theorem” for which many others are corollaries. That is what we aim to provide in this work.

1.1 Our Contributions

We present a modular framework for analyzing linear operations on discrete Gaussians over lattices, and show several applications. Our main theorem, which is the heart of the framework, is a simple, general statement about linear transformations of discrete Gaussians. We establish several natural consequences of this theorem, e.g., for joint distributions of correlated discrete Gaussians. Then we show how to combine these tools in a modular way to obtain all previous discrete Gaussian convolution theorems (and some new ones) as corollaries. Notably—and in contrast to prior works—all the consequences of our main theorem follow mostly by elementary linear algebra, and do not use any additional properties (or even the definition) of the discrete Gaussian. In other words, our framework abstracts away the particulars of discrete Gaussians, and makes it easier to prove and verify many useful theorems about them.

As a novel application of our framework, we describe and tightly analyze an LWE self-reduction that, given a fixed number of LWE samples, directly generates (up to negligible statistical distance) an unlimited number of additional LWE samples with discrete Gaussian error (of a somewhat larger width than the original error). The ability to generate fresh, properly distributed LWE samples is often used in cryptosystems and security proofs (see [GPV08, ACPS09] for two early examples), so the tightness and simplicity of the procedure is important. The high-level idea behind prior LWE self-reductions, first outlined in [GPV08], is that a core procedure of [Reg05] can be used to generate fresh LWE samples with continuous Gaussian error. If desired, these samples can then be randomly rounded to have discrete Gaussian error [Pei10], but this increases the error width somewhat, and using continuous error to generate discrete samples seems unnecessarily cumbersome. We instead describe a fully discrete procedure, and use our framework to prove that it works for exactly the same parameters as the continuous one.

As a secondary contribution, motivated by the concrete security of “trapdoor” lattice cryptosystems, we analyze the singular values of the subgaussian matrices often used as such trapdoors [AP09, MP12]. Our analysis precisely tracks the exact constants in traditional concentration bounds for the singular values of a random matrix with independent, subgaussian rows [Ver12]. We also give a tighter heuristic bound on matrices chosen with independent subgaussian entries, supported by experimental evidence. Since the trapdoor’s maximum singular value directly influences the hardness of the underlying SIS/LWE problems in trapdoor cryptosystems, our heuristic yields up to 10 more bits of security in a common parameter regime, where the trapdoor’s entries are chosen independently from \(\{{0, \pm 1}\}\) (with one-half probability on 0, and one-quarter probability on each of \(\pm 1\)).Footnote 2

1.2 Technical Overview

Linear Transformations of Discrete Gaussians. It is well known that any linear transformation of a (continuous, multivariate) Gaussian is another Gaussian. The heart of our work is a similar theorem for discrete Gaussians (Theorem 1). Note that we cannot hope to say anything about this in full generality, because a linear transformation of a lattice \(\varLambda \) may not even be a lattice. However, it is one if the kernel K of the transformation is a \(\varLambda \)-subspace, i.e., the lattice \(\varLambda \cap K\) spans K (equivalently, K is spanned by vectors in \(\varLambda \)), so we restrict our attention to this case.

For a positive definite matrix \(\varSigma \) and a lattice coset \(\varLambda +\mathbf {c}\), the discrete Gaussian distribution \(\mathcal {D}_{\varLambda + \mathbf {c}, \sqrt{\varSigma }}\) assigns to each \(\mathbf {x}\) in its support \(\varLambda + \mathbf {c}\) a probability proportional to \(\exp (-\pi \cdot \mathbf {x}^t \varSigma ^{-1} \mathbf {x})\). We show that for an arbitrary linear transformation \(\mathbf {T}\), if the lattice \(\varLambda \cap \ker (\mathbf {T})\) spans \(\ker (\mathbf {T})\) and has smoothing parameter bounded by \(\sqrt{\varSigma }\), then \(\mathbf {T}\) applied to \(\mathcal {D}_{\varLambda + \mathbf {c}, \sqrt{\varSigma }}\) behaves essentially as one might expect from continuous Gaussians:

$$\mathbf {T} \mathcal {D}_{\varLambda + \mathbf {c}, \sqrt{\varSigma }} \approx \mathcal {D}_{\mathbf {T} (\varLambda +\mathbf {c}), \mathbf {T}\sqrt{\varSigma }} .$$

The key observation for the proof is that for any point in the support of these two distributions, its probabilities under \(\mathbf {T} \mathcal {D}_{\varLambda + \mathbf {c}, \sqrt{\varSigma }}\) and \(\mathcal {D}_{\mathbf {T}(\varLambda + \mathbf {c}), \mathbf {T} \sqrt{\varSigma }}\) differ only by a factor proportional to the Gaussian mass of some coset of \(\varLambda \cap K\). But because this sublattice is “smooth” by assumption, all such cosets have essentially the same mass.

Convolutions. It is well known that the sum of two independent continuous Gaussians having covariances \(\varSigma _{1}, \varSigma _{2}\) is another Gaussian of covariance \(\varSigma \). We use our above-described Theorem 1 to prove similar statements for convolutions of discrete Gaussians. A typical such convolution is the statistical experiment where one samples

$$\mathbf {x}_1 \leftarrow \mathcal {D}_{\varLambda _1 + \mathbf {c}_1, \sqrt{\varSigma _1}} \, , \, \mathbf {x}_2 \leftarrow \mathbf {x}_1 + \mathcal {D}_{\varLambda _2 + \mathbf {c}_{2} - \mathbf {x}_1, \sqrt{\varSigma _2}}. $$

Based on the behavior of continuous Gaussians, one might expect the distribution of \(\mathbf {x}_2\) to be close to \(\mathcal {D}_{\varLambda _2 + \mathbf {c}_2, \sqrt{\varSigma }}\), where \(\varSigma = \varSigma _1 + \varSigma _2\). This turns out to be the case, under certain smoothness conditions on the lattices \(\varLambda _{1}, \varLambda _{2}\) relative to the Gaussian parameters \(\sqrt{\varSigma _{1}}, \sqrt{\varSigma _{2}}\). This was previously shown in [Pei10, Theorem 3.1], using a specialized analysis of the particular experiment in question.

We show how to obtain the same theorem in a higher-level and modular way, via Theorem 1. First, we show that the joint distribution of \((\mathbf {x}_1, \mathbf {x}_2)\) is close to a discrete Gaussian over \((\varLambda _{1} + \mathbf {c}_{1}) \times (\varLambda _{2} + \mathbf {c}_{2})\), then we analyze the marginal distribution of \(\mathbf {x}_{2}\) by applying the linear transformation \((\mathbf {x}_{1}, \mathbf {x}_{2}) \mapsto \mathbf {x}_{2}\) and analyzing the intersection of \(\varLambda _{1} \times \varLambda _{2}\) with the kernel of the transformation. Interestingly, our analysis arrives upon exactly the same hypotheses on the parameters as [Pei10, Theorem 3.1], so nothing is lost by proceeding via this generic route.

We further demonstrate the power of this approach—i.e., viewing convolutions as linear transformations of a joint distribution—by showing that it yields all prior discrete Gaussian convolution theorems from the literature. Indeed, we give a very general theorem on integer combinations of independent discrete Gaussians (Theorem 4), then show that several prior convolution theorems follow as immediate corollaries.

LWE Self-reduction. Recall the LWE distribution \((\mathbf {A}, \mathbf {b}^t = \mathbf {s}^t\mathbf {A} + \mathbf {e}^t \bmod q)\) where the secret \(\mathbf {s} \leftarrow \mathbb Z_q^n\) and \(\mathbf {A} \leftarrow \mathbb Z_q^{n \times m}\) are uniform and independent, and the entries of \(\mathbf {e}\) are chosen independently from some error distribution, usually a discrete one over \(\mathbb {Z}\). As described in [GPV08, ACPS09] (based on a core technique from [Reg05]), when \(m \approx n \log q\) or more we can generate unlimited additional LWE samples (up to small statistical distance) with the same secret \(\mathbf {s}\) and continuous Gaussian error, as

$$ (\mathbf {a} = \mathbf {A} \mathbf {x} \in \mathbb {Z}_{q}^{n} \, , \, b = \mathbf {b}^{t} \mathbf {x} + \tilde{e} = \mathbf {s}^{t} \mathbf {a} + (\mathbf {e}^{t} \mathbf {x} + \tilde{e}) \bmod q) $$

for discrete Gaussian \(\mathbf {x} \leftarrow \mathcal {D}_{\mathbb {Z}^m, r}\) and continuous Gaussian “smoothing error” \(\tilde{e} \leftarrow \mathcal {D}_{\tilde{r}}\), for suitable parameters \(r,\tilde{r}\). More specifically, the error term \(\mathbf {e}^{t} \mathbf {x} + \tilde{e}\) is close to a continuous Gaussian \(\mathcal {D}_{t}\), where \(t^{2} = (r \Vert {\mathbf {e}}\Vert )^{2} + \tilde{r}^{2}\).

We emphasize that the above procedure yields samples with continuous Gaussian error. If discrete error is desired, one can then “round off” b, either naïvely (yielding somewhat unnatural “rounded Gaussian” error), or using more sophisticated randomized rounding (yielding a true discrete Gaussian [Pei10]). However, this indirect route to discrete error via a continuous intermediate step seems cumbersome and also somewhat loose, due to the extra round-off error.

An obvious alternative approach is to directly generate samples with discrete error, by choosing the “smoothing” term \(\tilde{e} \leftarrow \mathcal {D}_{\mathbb {Z},\tilde{r}}\) from a discrete Gaussian. However, directly and tightly analyzing this alternative is surprisingly non-trivial, and to our knowledge it has never been proven that the resulting error is (close to) a discrete Gaussian, without incurring some loss relative to what is known for the continuous case.Footnote 3 Using the techniques developed in this paper, we give a modular proof that this alternative approach does indeed work, for the very same parameters as in the continuous case. As the reader may guess, we again express the overall error distribution as a linear transformation on some joint discrete Gaussian distribution. More specifically, the joint distribution is that of \((\mathbf {x},\tilde{e})\) where \(\mathbf {x}\) is conditioned on \(\mathbf {a} = \mathbf {A} \mathbf {x}\), and the linear transformation is given by \([\mathbf {e}^t \mid 1]\) (where \(\mathbf {e}^t\) is the original LWE error vector). The result then follows from our general theorem on linear transformations of discrete Gaussians (Theorem 1).

Analysis of Subgaussian Matrices. A distribution over \(\mathbb {R}\) is subgaussian with parameter \(s>0\) if its tails are dominated by those of a Gaussian distribution of parameter s. More generally, a distribution \(\mathcal {X}\) over \(\mathbb {R}^{n}\) is subgaussian (with parameter s) if its marginals \(\langle {\mathcal {X}, \mathbf {u}}\rangle \) are subgaussian (with the same parameter s) for every unit vector \(\mathbf {u} \in \mathbb {R}^{n}\). We give precise concentration bounds on the singular values of random matrices whose columns, rows, or individual entries are independent subgaussians. We follow a standard proof strategy based on a union bound over an \(\varepsilon \)-net (see, e.g., [Ver12]), but we precisely track all the constant factors. For example, let \(\mathbf {R} \in \mathbb {R}^{m \times n}\) be a matrix with independent subgaussian rows. First, we reduce the analysis of \(\mathbf {R}\)’s singular values to measuring how close \(\mathbf {R}\) is to an isometry, specifically the norm \(\Vert \mathbf {R}^t\mathbf {R} - \mathbf {I}_n\Vert = \sup _{\mathbf {u}}\Vert (\mathbf {R}^t\mathbf {R} - \mathbf {I}_n)\mathbf {u}\Vert \) where the supremum is taken over all unit vectors \(\mathbf {u}\). Next, we approximate all unit vectors by an \(\varepsilon \)-net of the unit-sphere and bound the probability that \(\Vert \mathbf {R}\mathbf {u}\Vert _2^2\) is too large by expressing \(\Vert \mathbf {R}\mathbf {u}\Vert _2^2\) as a sum of independent terms (namely, \(\Vert \mathbf {R}\mathbf {u}\Vert _2^2 = \sum _i \langle {\mathbf {r}_i, \mathbf {u}}\rangle ^{2}\) where \(\mathbf {r}_i\) is a row of \(\mathbf {R}\)). Finally, we take a union bound over the net to get a concentration bound. Lastly, we give a tighter heuristic for subgaussian matrices with independent entries from commonly used distributions in lattice-based cryptography.

1.3 Organization

The rest of the paper is organized as follows. Section 2 reviews the relevant mathematical background. Section 3 gives our general theorem on linear transformations of discrete Gaussians. Section 4 is devoted to convolutions of discrete Gaussians: we first analyze joint distributions and linear transforms of such convolutions, then show how all prior convolution theorems follow as corollaries. Section 5 gives our improved, purely discrete LWE self-reduction. Finally, Sect. 6 gives our provable and heuristic subgaussian matrix analysis; the proof of the main subgaussianity theorem appears in the full version.

2 Preliminaries

In this section we review some basic notions and mathematical notation used throughout the paper. Column vectors are denoted by lower-case bold letters (\(\mathbf {a}, \mathbf {b}\), etc.) and matrices by upper-case bold letters (\(\mathbf {A}, \mathbf {B}\), etc.). In addition, positive semidefinite matrices are sometimes denoted by upper-case Greek letters like \(\varSigma \). The integers and reals are respectively denoted by \(\mathbb {Z}\) and \(\mathbb {R}\). All logarithms are base two unless specified otherwise.

Probability. We use calligraphic letters like \(\mathcal {X},\mathcal {Y}\) for probability distributions, and sometimes for random variables having such distributions. We make informal use of probability theory, without setting up formal probability spaces. We use set-like notation to describe probability distributions: for any distribution \(\mathcal {X}\) over a set X, predicate P on X, and function \(f:X\rightarrow Y\), we write \(\llbracket {f(x) \mid x \leftarrow \mathcal {X}, P(x)}\rrbracket \) for the probability distribution over Y obtained by sampling x according to \(\mathcal {X}\), conditioning on P(x) being satisfied, and outputting \(f(x) \in Y\). Similarly, we write \(\{{P(x) \mid x\leftarrow \mathcal {X}}\}\) to denote the event that P(x) is satisfied when x is selected according to \(\mathcal {X}\), and use \(\Pr \{{z \leftarrow \mathcal {X}}\}\) as an abbreviation for \(\mathcal {X}(z) = \Pr \{{x=z \mid x \leftarrow \mathcal {X}}\}\). We write \(f(\mathcal {X}) = \llbracket {f(x) \mid x\leftarrow \mathcal {X}}\rrbracket \) for the result of applying a function to a probability distribution. We let \(\mathcal {U}(X)\) denote the uniform distribution over a set X of finite measure.

The statistical distance between any two probability distributions \(\mathcal {X},\mathcal {Y}\) over the same set is \(\varDelta (\mathcal {X},\mathcal {Y}) := \sup _A |{\Pr \{{\mathcal {X}\in A}\} - \Pr \{{\mathcal {Y}\in A}\}}|\), where A ranges over all measurable sets. Similarly, for distributions \(\mathcal {X}, \mathcal {Y}\) with the same support, their max-log distance [MW18] is defined as

$$ \varDelta _{\textsc {ml}}(\mathcal {X},\mathcal {Y}) := \sup _A |{\log \Pr \{{\mathcal {X}\in A}\} - \log \Pr \{{\mathcal {Y}\in A}\}}|, $$

or, equivalently, \(\varDelta _{\textsc {ml}}(\mathcal {X},\mathcal {Y}) = \sup _a |{\log \Pr \{{\mathcal {X}= a}\} - \log \Pr \{{\mathcal {Y}= a}\}}|\).

Distance Notation. For any two real numbers xy, and \(\varepsilon \ge 0\), we say that x approximates y within relative error \(\varepsilon \) (written \(x \approx _{\varepsilon } y\)) if \(x \in [1-\varepsilon , 1+\varepsilon ] \cdot y\). We also write \(x \overset{\varepsilon }{\approx } y\) as an abbreviation for the symmetric relation \((x \approx _\varepsilon y) \wedge (y \approx _\varepsilon x)\), or, equivalently, \(|\log x - \log y| \le \log (1+\varepsilon )\le \varepsilon \).

For two probability distributions \(\mathcal {X},\mathcal {Y}\) over the same set, we write \(\mathcal {X}\approx _{\varepsilon } \mathcal {Y}\) if \(\mathcal {X}(z) \approx _{\varepsilon } \mathcal {Y}(z)\) for every z. Similarly, we write \(\mathcal {X}\overset{\varepsilon }{\approx }\mathcal {Y}\) if \(\mathcal {X}\approx _\varepsilon \mathcal {Y}\) and \(\mathcal {Y}\approx _\varepsilon \mathcal {X}\). The following facts are easily verified:

  1. 1.

    If \(\mathcal {X}\approx _\varepsilon \mathcal {Y}\), then \(\mathcal {Y}\approx _{\bar{\varepsilon }} \mathcal {X}\) (and therefore, \(\mathcal {X}\overset{\bar{\varepsilon }}{\approx } \mathcal {Y}\)) for \(\bar{\varepsilon }= \varepsilon /(1-\varepsilon )\).

  2. 2.

    If \(\mathcal {X}\approx _\varepsilon \mathcal {Y}\) and \(\mathcal {Y}\approx _\delta \mathcal {Z}\) then \(\mathcal {X}\approx _{\varepsilon +\delta + \varepsilon \delta } \mathcal {Z}\), and similarly for \(\overset{\varepsilon }{\approx }\).

  3. 3.

    For any (possibly randomized) function f, \(\varDelta (f(\mathcal {X}),f(\mathcal {Y})) \le \varDelta (\mathcal {X},\mathcal {Y})\), and \(\mathcal {X}\approx _\varepsilon \mathcal {Y}\) implies \(f(\mathcal {X}) \approx _\varepsilon f(\mathcal {Y})\).

  4. 4.

    If \(\mathcal {X}\approx _{\varepsilon } \mathcal {Y}\) then \(\varDelta (\mathcal {X},\mathcal {Y})\le \varepsilon /2\).

  5. 5.

    \(\mathcal {X}\overset{\varepsilon }{\approx }\mathcal {Y}\) if and only if \(\varDelta _{\textsc {ml}}(\mathcal {X},\mathcal {Y}) \le \log (1+\varepsilon )\).

Linear Algebra. For any set of vectors \(S \subseteq \mathbb {R}^n\), we write \(\mathrm {span}(S)\) for the linear span of S, i.e., the smallest linear subspace of \(\mathbb {R}^n\) that contains S. For any matrix \(\mathbf {T} \in \mathbb {R}^{n\times k}\), we write \(\mathrm {span}(\mathbf {T})\) for the linear span of the columns of \(\mathbf {T}\), or, equivalently, the image of \(\mathbf {T}\) as a linear transformation. Moreover, we often identify \(\mathbf {T}\) with this linear transformation, treating them interchangeably. A matrix has full column rank if its columns are linearly independent.

We write \(\langle {\mathbf {x},\mathbf {y}}\rangle = \sum _i x_i\cdot y_i\) for the standard inner product of two vectors in \(\mathbb {R}^n\). For any vector \(\mathbf {x} \in \mathbb {R}^n\) and a (possibly empty) set \(S\subseteq \mathbb {R}^n\), we write \({\mathbf {x}}_{\perp {S}}\) for the component of \(\mathbf {x}\) orthogonal to S, i.e., the unique vector \({\mathbf {x}}_{\perp {S}} \in \mathbf {x} + \mathrm {span}(S)\) such that \(\langle {{\mathbf {x}}_{\perp {S}},\mathbf {s}}\rangle = 0\) for every \(\mathbf {s} \in S\).

The singular values of a matrix \(\mathbf {A} \in \mathbb {R}^{m \times n}\) are the square roots of the first \(d = \min (m,n)\) eigenvalues of its Gram matrix \(\mathbf {A}^t\mathbf {A}\). We list singular values in non-increasing order, as \(s_1(\mathbf {A}) \ge s_2(\mathbf {A}) \ge \cdots \ge s_d(\mathbf {A})\ge 0\). The spectral norm is \(\Vert \mathbf {A}\Vert := \sup _{\mathbf {x} \ne \mathbf {0}} \Vert \mathbf {A} \mathbf {x}\Vert _2/\Vert \mathbf {x} \Vert _{2}\), which equals its largest singular value \(s_1(\mathbf {A})\).

The (Moore-Penrose) pseudoinverse of a matrix \(\mathbf {A} \in \mathbb {R}^{n\times k}\) of full column rankFootnote 4 is \(\mathbf {A}^+ = (\mathbf {A}^t\mathbf {A})^{-1}\mathbf {A}^t\), and it is the unique matrix \(\mathbf {A}^+ \in \mathbb {R}^{k\times n}\) such that \(\mathbf {A}^+\mathbf {A} = \mathbf {I}\) and \(\mathrm {span}((\mathbf {A}^+)^t) = \mathrm {span}(\mathbf {A})\). (If \(\mathbf {A}\) is square, its pseudoinverse is just its inverse \(\mathbf {A}^+ = \mathbf {A}^{-1}\).) For any \(\mathbf {v} \in \mathrm {span}(\mathbf {A})\) we have \(\mathbf {A} \mathbf {A}^{+} \mathbf {v} = \mathbf {v}\), because \(\mathbf {v} = \mathbf {A} \mathbf {c}\) for some vector \(\mathbf {c}\).

The tensor product (or Kronecker product) of any two matrices \(\mathbf {A} = (a_{i,j})\) and \(\mathbf {B}\) is the matrix obtained by replacing each entry \(a_{i,j}\) of \(\mathbf {A}\) with the block \(a_{i,j} \mathbf {B}\). It obeys the mixed-product property \((\mathbf {A} \otimes \mathbf {B}) (\mathbf {C} \otimes \mathbf {D}) = (\mathbf {A} \mathbf {C}) \otimes (\mathbf {B} \mathbf {D})\) for any matrices \(\mathbf {A}, \mathbf {B}, \mathbf {C}, \mathbf {D}\) with compatible dimensions.

Positive (Semi)definite Matrices. A symmetric matrix \(\varSigma = \varSigma ^{t}\) is positive semidefinite, written \(\varSigma \succeq 0\), if \(\mathbf {x}^t \varSigma \mathbf {x} \ge 0\) for all vectors \(\mathbf {x}\). It is positive definite, written \(\varSigma \succ 0\), if \(\mathbf {x}^t \varSigma \mathbf {x} > 0\) for all nonzero \(\mathbf {x}\). Positive (semi)definiteness defines a partial ordering on symmetric matrices: we write \(\varSigma \succeq \varSigma '\) (and \(\varSigma ' \preceq \varSigma \)) if \(\varSigma - \varSigma ' \succeq 0\) is positive semidefinite, and similarly for \(\varSigma \succ \varSigma '\).Footnote 5 For any two (not necessarily positive semidefinite) matrices \(\mathbf {S},\mathbf {T} \in \mathbb {R}^{n\times k}\), we write \(\mathbf {S} \le \mathbf {T}\) if \(\mathbf {S}\mathbf {S}^t \preceq \mathbf {T}\mathbf {T}^t\).

For any matrix \(\mathbf {A}\), its Gram matrix \(\mathbf {A}^t \mathbf {A}\) is positive semidefinite. Conversely, a matrix \(\varSigma \) is positive semidefinite if and only if it can be written as \(\varSigma = \mathbf {S} \mathbf {S}^t\) for some matrix \(\mathbf {S}\); we write \(\mathbf {S} = \sqrt{\varSigma }\), and say that \(\mathbf {S}\) is a square root of \(\varSigma \). Note that such a square root is not unique, because, e.g., \(-\mathbf {S} = \sqrt{\varSigma }\) as well. We often just write \(\sqrt{\varSigma }\) to refer to some arbitrary but fixed square root of \(\varSigma \). For positive definite \(\varSigma \succ 0\), observe that \(\mathbf {S} = \sqrt{\varSigma }\) if and only if \(\varSigma ^{-1} = (\mathbf {S} \mathbf {S}^{t})^{-1} = \mathbf {S}^{-t} \mathbf {S}^{-1}\), so \(\mathbf {S}^{-t} = \sqrt{\varSigma ^{-1}}\), i.e., \(\sqrt{\varSigma }^{-t}\) is equivalent to \(\sqrt{\varSigma ^{-1}}\), and hence \(\sqrt{\varSigma }^{-1}\) is equivalent to \(\sqrt{\varSigma ^{-1}}^{t}\).

Lattices. An n-dimensional lattice \(\varLambda \) is a discrete subgroup of \(\mathbb {R}^n\), or, equivalently, the set \(\varLambda = \mathcal {L}(\mathbf {B}) = \{{\mathbf {B}\mathbf {x}:\mathbf {x} \in \mathbb {Z}^k}\}\) of all integer linear combinations of the columns of a full-column-rank basis matrix \(\mathbf {B} \in \mathbb {R}^{n\times k}\). The dimension k is the rank of \(\varLambda \), and the lattice is full rank if \(k=n\). The basis \(\mathbf {B}\) is not unique; any \(\mathbf {B}' = \mathbf {B}\mathbf {U}\) for \(\mathbf {U} \in \mathbb {Z}^{k \times k}\) with \(\det (\mathbf {U}) = \pm 1\) is also a basis of the same lattice.

A coset of a lattice \(\varLambda \subset \mathbb {R}^{n}\) is a set of the form \(A = \varLambda +\mathbf {a} = \{{\mathbf {v}+\mathbf {a} : \mathbf {v} \in \varLambda }\}\) for some \(\mathbf {a} \in \mathbb {R}^{n}\). The dual lattice of \(\varLambda \) is the lattice \(\varLambda ^\vee = \{{ \mathbf {x} \in \mathrm {span}(\varLambda ) : \langle {\mathbf {x},\varLambda }\rangle \subseteq \mathbb {Z}}\}\). If \(\mathbf {B}\) is a basis for \(\varLambda \), then \(\mathbf {B}^{+t}\) is a basis for \(\varLambda ^\vee \). A \(\varLambda \)-subspace, also called a lattice subspace when \(\varLambda \) is clear from context, is the linear span of some set of lattice points, i.e., a subspace S for which \(S = \mathrm {span}(\varLambda \cap S)\). A fundamental property of lattices (used in the proof that every lattice has a basis) is that if \(\mathbf {T}\) is a linear transformation for which \(\ker (\mathbf {T})\) is a \(\varLambda \)-subspace, then \(\mathbf {T} \varLambda \) is also a lattice.Footnote 6

The Gram-Schmidt orthogonalization (GSO) of a lattice basis \(\mathbf {B} = \{{\mathbf {b}_{i}}\}\) is the set \(\tilde{\mathbf {B}} = \{{\tilde{\mathbf {b}}_{i}}\}\) of vectors defined iteratively as \(\tilde{\mathbf {b}}_{i} = (\mathbf {b}_{i})_{\perp \{{\mathbf {b}_{1}, \ldots , \mathbf {b}_{i-1}}\}}\), i.e., the component of \(\mathbf {b}_i\) orthogonal to the previous basis vectors. (Notice that the GSO is sensitive to the ordering of the basis vectors.) We define the minimum GSO length of a lattice as \(\tilde{bl}(\varLambda ) := \min _{\mathbf {B}} \max _{i} \Vert \tilde{\mathbf {b}}_{i}\Vert _{2}\), where the minimum is taken over all bases \(\mathbf {B}\) of \(\varLambda \).

For any two lattices \(\varLambda _{1}, \varLambda _{2}\), their tensor product \(\varLambda _{1} \otimes \varLambda _{2}\) is the set of all sums of vectors of the form \(\mathbf {v}_{1} \otimes \mathbf {v}_{2}\) where \(\mathbf {v}_{1} \in \varLambda _{1}\) and \(\mathbf {v}_{2} \in \varLambda _{2}\). If \(\mathbf {B}_{1}, \mathbf {B}_{2}\) are respectively bases of \(\varLambda _{1}, \varLambda _{2}\), then \(\mathbf {B}_{1} \otimes \mathbf {B}_{2}\) is a basis of \(\varLambda _{1} \otimes \varLambda _{2}\).

Gaussians. Let \(\mathcal {D}\) be the Gaussian probability measure on \(\mathbb {R}^k\) (for any \(k \ge 1\)) having density function defined by \(\rho (\mathbf {x}) = e^{-\pi \Vert \mathbf {x}\Vert ^2}\), the Gaussian function with total measure \(\int _{\mathbf {x} \in \mathbb {R}^k} \rho (\mathbf {x}) \, d\mathbf {x} = 1\). For any (possibly non-full-rank) matrix \(\mathbf {S} \in \mathbb {R}^{n \times k}\), we define the (possibly non-spherical) Gaussian distribution

$$ \mathcal {D}_{\mathbf {S}} := \mathbf {S} \cdot \mathcal {D}= \llbracket {\mathbf {S}\mathbf {x} \mid \mathbf {x}\leftarrow \mathcal {D}}\rrbracket $$

as the image of \(\mathcal {D}\) under \(\mathbf {S}\); this distribution has covariance \(\varSigma /(2\pi )\) where \(\varSigma = \mathbf {S} \mathbf {S}^{t}\) is positive semidefinite. Notice that \(\mathcal {D}_{\mathbf {S}}\) depends only on \(\varSigma \), and not on any specific choice of the square root \(\mathbf {S}\).Footnote 7 So, we often write \(\mathcal {D}_{\sqrt{\varSigma }}\) instead of \(\mathcal {D}_{\mathbf {S}}\). When \(\varSigma = s^2\mathbf {I}\) is a scalar matrix, we often write \(\mathcal {D}_{s}\) (observe that \(\mathcal {D}= \mathcal {D}_{1}\)).

For any Gaussian distribution \(\mathcal {D}_{\mathbf {S}}\) and set \(A\subseteq \mathrm {span}(\mathbf {S})\), we define \(\mathcal {D}_{A, \mathbf {S}}\) as the conditional distribution (where \(\mathbf {S}^{-1}(A) = \{{\mathbf {x} : \mathbf {S} \mathbf {x} \in A}\}\))

$$ \mathcal {D}_{A, \mathbf {S}} := [\mathcal {D}_{\mathbf {S}}]_{A} = \llbracket {\mathbf {y} \mid \mathbf {y} \leftarrow \mathcal {D}_{\mathbf {S}}, \mathbf {y} \in A}\rrbracket = \llbracket {\mathbf {S}\mathbf {x} \mid \mathbf {x} \leftarrow \mathcal {D}, \mathbf {S}\mathbf {x} \in A}\rrbracket = \mathbf {S} \cdot [\mathcal {D}]_{\mathbf {S}^{-1}(A)} $$

whenever this distribution is well-defined.Footnote 8 Examples for which this is the case include all sets A with positive measure \(\int _{\mathbf {x} \in A} d\mathbf {x} > 0\), and all sets of the form \(A = L + \varLambda + \mathbf {c}\), where \(L \subseteq \mathbb {R}^n\) is a linear subspace and \(\varLambda + \mathbf {c} \subset \mathbb {R}^n\) is a lattice coset.

For any lattice coset \(A = \varLambda +\mathbf {c}\) (and taking \(\mathbf {S}=\mathbf {I}\) for simplicity), the distribution \(\mathcal {D}_{\varLambda + \mathbf {c}}\) is exactly the (origin-centered) discrete Gaussian distribution given by \(\Pr \{{\mathbf {x} \leftarrow \mathcal {D}_{A}}\} := \rho (\mathbf {x})/\sum _{\mathbf {y}\in A}\rho (\mathbf {y})\), as usually defined in lattice cryptography. It also follows immediately from the definition that \(\mathbf {c} + \mathcal {D}_{\varLambda - \mathbf {c}}\) is the “\(\mathbf {c}\)-centered” discrete Gaussian \(\mathcal {D}_{\varLambda , \mathbf {c}}\) that is defined and used in some works. Because of this, there is no loss of generality in dealing solely with origin-centered Gaussians, as we do in this work.

Lemma 1

For any \(A \subseteq \mathbb {R}^{n}\) and matrices \(\mathbf {S}, \mathbf {T}\) representing linear functions where \(\mathbf {T}\) is injective on A, we have

$$\begin{aligned} \mathbf {T} \cdot \mathcal {D}_{A,\mathbf {S}} = \mathcal {D}_{\mathbf {T} A,\mathbf {T} \mathbf {S}}. \end{aligned}$$
(2.1)

Proof

By definition of the conditioned Gaussian and the fact that \(A = \mathbf {T}^{-1}(\mathbf {T} A)\), we have

$$ \mathbf {T} \cdot \mathcal {D}_{A, \mathbf {S}} = \mathbf {T} \mathbf {S} \cdot [\mathcal {D}]_{\mathbf {S}^{-1}(A)} = \mathbf {T} \mathbf {S} \cdot [\mathcal {D}]_{(\mathbf {T} \mathbf {S})^{-1}(\mathbf {T} A)} = \mathcal {D}_{\mathbf {T} A, \mathbf {T} \mathbf {S}}. $$

    \(\square \)

We now recall the notion of the smoothing parameter [MR04] and its generalization to non-spherical Gaussians [Pei10].

Definition 1

For a lattice \(\varLambda \) and \(\varepsilon \ge 0\), we say \(\eta _\varepsilon (\varLambda ) \le 1\) if \(\rho (\varLambda ^\vee ) \le 1 + \varepsilon \).

More generally, for any matrix \(\mathbf {S}\) of full column rank, we write \(\eta _\varepsilon (\varLambda ) \le \mathbf {S}\) if \(\varLambda \subset \mathrm {span}(\mathbf {S})\) and \(\eta _\varepsilon (\mathbf {S}^{+}\varLambda ) \le 1\), where \(\mathbf {S}^+\) is the pseudoinverse of \(\mathbf {S}\). When \(\mathbf {S} = s \mathbf {I}\) is a scalar matrix, we may simply write \(\eta _\varepsilon (\varLambda ) \le s\).

Observe that for a fixed lattice \(\varLambda \), whether \(\eta _{\varepsilon }(\varLambda ) \le \mathbf {S}\) depends only on \(\varSigma = \mathbf {S} \mathbf {S}^{t}\), and not the specific choice of square root \(\mathbf {S} = \sqrt{\varSigma }\). This is because the dual lattice \((\mathbf {S}^{+} \varLambda )^{\vee } = \mathbf {S}^{t} \varLambda ^{\vee }\), so for any dual vector \(\mathbf {w} = \mathbf {S}^{t} \mathbf {v}\) where \(\mathbf {v} \in \varLambda ^{\vee }\), \(\rho (\mathbf {w}) = \exp (-\pi \Vert {\mathbf {w}}\Vert ^{2}) = \exp (-\pi \mathbf {v}^{t} \mathbf {S} \mathbf {S}^{t} \mathbf {v}) = \exp (-\pi \mathbf {v}^{t} \varSigma \mathbf {v})\) is invariant under the choice of \(\mathbf {S}\). From this analysis it is also immediate that Definition 1 is consistent with our partial ordering of matrices (i.e., \(\mathbf {S} \le \mathbf {T}\) when \(\mathbf {S} \mathbf {S}^{t} \preceq \mathbf {T} \mathbf {T}^{t}\)), and with the original definition [MR04] of the smoothing parameter of \(\varLambda \) as the smallest positive real \(s > 0\) such that \(\rho (s\varLambda ^\vee ) \le 1+\varepsilon \). The following lemma also follows immediately from the definition.

Lemma 2

For any lattice \(\varLambda \), \(\varepsilon \ge 0\), and matrices \(\mathbf {S}, \mathbf {T}\) of full column rank, we have \(\eta _\varepsilon (\varLambda ) \le \mathbf {S}\) if and only if \(\eta _\varepsilon (\mathbf {T} \varLambda ) \le \mathbf {T}\mathbf {S}\).

The name “smoothing parameter” comes from the following fundamental property proved in [MR04, Reg05].

Lemma 3

For any lattice \(\varLambda \) and \(\varepsilon \ge 0\) where \(\eta _{\varepsilon }(\varLambda ) \le 1\), we have \(\rho (\varLambda + \mathbf {c}) \approx _{\varepsilon } 1/\det (\varLambda )\) for any \(\mathbf {c} \in \mathrm {span}(\varLambda )\); equivalently, \((\mathcal {D}\bmod \varLambda ) \approx _\varepsilon \mathcal {U}:= \mathcal {U}(\mathrm {span}(\varLambda )/\varLambda )\).

In particular, \(\varDelta (\mathcal {D}\bmod \varLambda ,\mathcal {U}) \le \varepsilon /2\) and \(\varDelta _{\textsc {ml}}(\mathcal {D}\bmod \varLambda ,\mathcal {U}) \le -\log (1 - \varepsilon )\).

The lemma is easily generalized to arbitrary vectors \(\mathbf {c}\) not necessarily in \(\mathrm {span}(\varLambda )\).

Corollary 1

For any lattice \(\varLambda \) and \(\varepsilon \ge 0\) where \(\eta _{\varepsilon }(\varLambda ) \le 1\), and any vector \(\mathbf {c}\), we have

$$ \rho (\varLambda + \mathbf {c}) \approx _{\varepsilon } \frac{\rho ({\mathbf {c}}_{\perp {\varLambda }})}{\det (\varLambda )}. $$

Proof

Because \({\mathbf {c}}_{\perp {\varLambda }}\) is orthogonal to \(\mathrm {span}(\varLambda )\) and \(\mathbf {c}' = \mathbf {c} - ({\mathbf {c}}_{\perp {\varLambda }}) \in \mathrm {span}(\varLambda )\), we have

$$ \rho (\varLambda + \mathbf {c}) = \rho (\varLambda +\mathbf {c}' + ({\mathbf {c}}_{\perp {\varLambda }})) = \rho ({\mathbf {c}}_{\perp {\varLambda }}) \cdot \rho (\varLambda + \mathbf {c}') \approx _{\varepsilon } \frac{\rho ({\mathbf {c}}_{\perp {\varLambda }})}{\det (\varLambda )}, $$

where \(\rho (\varLambda +\mathbf {c}') \approx _\varepsilon \det (\varLambda )^{-1}\) by Lemma 3.    \(\square \)

Finally, we recall the following bounds on the smoothing parameter.

Lemma 4

([GPV08, Lemma 3.1]). For any rank-n lattice \(\varLambda \) and \(\varepsilon > 0\), we have \(\eta _\varepsilon (\varLambda ) \le \tilde{bl}(\varLambda )\cdot \sqrt{\ln (2n(1+ 1/\varepsilon ))/\pi }\).

Lemma 5

([MP13, Corollary 2.7]). For any lattices \(\varLambda _1, \varLambda _{2}\), we have

$$ \eta _{\varepsilon '}(\varLambda _1 \otimes \varLambda _2) \le \tilde{bl}(\varLambda _1) \cdot \eta _{\varepsilon }(\varLambda _2), $$

where \(1+\varepsilon ' = (1+\varepsilon )^n\) and n is the rank of \(\varLambda _{1}\). (Note that \(\varepsilon ' \approx n \varepsilon \) for sufficiently small \(\varepsilon \).)

Quotients and Groups. Lattice cryptography typically involves integer lattices \(\varLambda \) that are periodic modulo some integer q, i.e., \(q\mathbb {Z}^m \subseteq \varLambda \subseteq \mathbb {Z}^m\). These “q-ary lattices” lattices can be equivalently viewed as subgroups of \(\mathbb {Z}_q^m = \mathbb {Z}^m / q\mathbb {Z}^m\). Let \(\mathbf {A} \in \mathbb {Z}_q^{n\times m}\) for some \(n \ge 1\) and define the lattice \(\varLambda _q^\perp (\mathbf {A}) := \{{\mathbf {x} \in \mathbb {Z}^m :\mathbf {A} \mathbf {x} = \mathbf {0} \bmod q}\}\). We say that \(\mathbf {A}\) is primitive if \(\mathbf {A} \cdot \mathbb {Z}^{m} = \mathbb {Z}_q^n\).

All the results in this paper apply not only to lattices, but also to arbitrary (topologically closed) subgroups of \(\mathbb {R}^n\). These are groups of the form \(G = \varLambda + L\) where \(\varLambda \) is a lattice and L is a linear subspace. When considering such groups, one can always assume, without loss of generality, that \(\varLambda \) and L are mutually orthogonal because \(\varLambda +L = ({\varLambda }_{\perp {L}})+L\). Intuitively, one can think of groups \(\varLambda +L\) as lattices of the form \(\varLambda +\delta \varLambda _L\) where \(\mathrm {span}(\varLambda _L) = L\) and \(\delta \approx 0\). Notice that \(\lim _{\delta \rightarrow 0}\eta _\varepsilon (\varLambda +\delta \varLambda _L) = \eta _\varepsilon ({\varLambda }_{\perp {L}})\). For simplicity, we will focus the presentation on lattices, and leave the generalization to arbitrary groups to the reader. Results for the continuous Gaussian distribution \(\mathcal {D}\) are obtained as a special case by taking the limit, for \(\delta \rightarrow 0\), of \(\delta \varLambda \), where \(\varLambda \) is an arbitrary lattice spanning the support of \(\mathcal {D}\).

Subgaussian Distributions. Subgaussian distributions are those on \(\mathbb {R}\) which have tails dominated by Gaussians [Ver12]. An equivalent formulation is through a distribution’s moment-generating function, and the definition below is commonly used throughout lattice-based cryptography [MP12, LPR13].

Definition 2

A real random variable X is subgaussian with parameter \(s>0\) if for all \(t \in \mathbb {R}\),

$$\begin{aligned} \mathbb E[e^{2 \pi tX}] \le e^{\pi s^2t^2}. \end{aligned}$$

From this we can derive a standard Gaussian concentration bound.

Lemma 6

A subgaussian random variable X with parameter \(s>0\) satisfies, for all \(t>0\),

$$\begin{aligned} \Pr \{|X| \ge t\} \le 2\exp (-\pi t^2/s^2). \end{aligned}$$

Proof

Let \(\delta \in \mathbb R\) be arbitrary. Then,

$$\begin{aligned} \Pr \{X \ge t\}&= \Pr \{\exp (2\pi \delta X) \ge \exp (2 \pi \delta t)\} \le \exp (-2 \pi \delta t)\cdot \mathbb E[\exp (2 \pi \delta X)] \\&\le \exp (-2 \pi \delta t + \pi \delta ^2s^2). \end{aligned}$$

This is minimized at \(\delta = t/s^2\), so we have

$$\begin{aligned} \Pr \{X \ge t\} \le \exp (-\pi t^2/s^2). \end{aligned}$$

The symmetric case \(X \le -t\) is analogous, and the proof is completed by a union bound.    \(\square \)

A random vector \(\mathbf {x}\) over \(\mathbb {R}^n\) is subgaussian with parameter \(\alpha \) if \(\langle {\mathbf {x}, \mathbf {u}}\rangle \) is subgaussian with parameter \(\alpha \) for all unit vectors \(\mathbf {u}\). If each coordinate of a random vector is subgaussian (with a common parameter) conditioned any values of the previous coordinates, then the vector itself is subgaussian (with the same parameter). See [LPR13, Claim 2.1] for a proof.

3 Lattice Projections

We emphasize that the proof of Lemma 1 makes essential use of the injectivity of \(\mathbf {T}\), and the lemma does not hold when \(\mathbf {T}\) is not injective. There are two reasons for this. Consider, for simplicity, the special case where \(A=\varLambda \) is a lattice and \(\mathbf {S}=\mathbf {I}\). First, the set \(\mathbf {T} \varLambda \) is not necessarily a lattice, and the conditional distribution \(\mathcal {D}_{\mathbf {T} \varLambda , \mathbf {T}}\) may not be well defined.Footnote 9 We resolve this issue by restricting \(\mathbf {T}\) to be a linear transformation whose kernel is a lattice subspace \(P = \mathrm {span}(P\cap \varLambda )\). Second, even when \(\mathbf {T} \cdot \mathcal {D}_{\varLambda }\) is well defined, in general it does not equal the discrete Gaussian \(\mathcal {D}_{\mathbf {T} \varLambda , \mathbf {T}}\). We address this issue by showing that these distributions are statistically close, assuming that the sublattice \(\varLambda \cap P\) has small enough smoothing parameter.

Theorem 1

For any \(\varepsilon \in [0,1)\) defining \(\bar{\varepsilon } = 2\varepsilon /(1-\varepsilon )\), matrix \(\mathbf {S}\) of full column rank, lattice coset \(A = \varLambda +\mathbf {a} \subset \mathrm {span}(\mathbf {S})\), and matrix \(\mathbf {T}\) such that \(\ker (\mathbf {T})\) is a \(\varLambda \)-subspace and \(\eta _{\varepsilon }(\varLambda \cap \ker (\mathbf {T})) \le \mathbf {S}\), we have

$$ \mathbf {T} \cdot \mathcal {D}_{A,\mathbf {S}} \overset{\bar{\varepsilon }}{\approx } \mathcal {D}_{\mathbf {T} A, \mathbf {T} \mathbf {S}}. $$

The proof of Theorem 1 (given below) relies primarily on the following specialization to linear transformations that are orthogonal projections \(\mathbf {x} \mapsto {\mathbf {x}}_{\perp {P}}\).

Lemma 7

For any \(\varepsilon \in [0,1)\), lattice coset \(A = \varLambda + \mathbf {a}\), and lattice subspace \(P = \mathrm {span}(\varLambda \cap P)\) such that \(\eta _{\varepsilon }(\varLambda \cap P) \le 1\), we have

$$ \varDelta _{\textsc {ml}}({(\mathcal {D}_A)}_{\perp {P}} \, ,\, \mathcal {D}_{{A}_{\perp {P}}}) \le \log \frac{1+\varepsilon }{1-\varepsilon }, $$

or equivalently, \({(\mathcal {D}_A)}_{\perp {P}} \overset{\bar{\varepsilon }}{\approx } \mathcal {D}_{{A}_{\perp {P}}}\) where \(\bar{\varepsilon } = 2\varepsilon /(1-\varepsilon )\).

Proof

It is immediate that both \({(\mathcal {D}_A)}_{\perp {P}}\) and \(\mathcal {D}_{{A}_{\perp {P}}}\) are both well-defined distributions over \({A}_{\perp {P}}\), which is a lattice coset. For any \(\mathbf {v} \in {A}_{\perp {P}}\), let \(p_{\mathbf {v}} =\Pr \{{\mathbf {v} \leftarrow {(\mathcal {D}_A)}_{\perp {P}}}\}\) and \(q_{\mathbf {v}} = \Pr \{{\mathbf {v} \leftarrow \mathcal {D}_{{A}_{\perp {P}}}}\}\). By definition, \(q_{\mathbf {v}} = \rho (\mathbf {v})/\rho ({A}_{\perp {P}})\). In order to analyze \(p_{\mathbf {v}}\), let \(\varLambda _P = \varLambda \cap P\), and select any \(\mathbf {w} \in A\) such that \({\mathbf {w}}_{\perp {P}} = \mathbf {v}\). Then

$$ p_{\mathbf {v}} = \frac{\rho (\{{\mathbf {x} \in A : {\mathbf {x}}_{\perp {P}} = \mathbf {v}}\})}{\rho (A)} = \frac{\rho (\mathbf {w}+\varLambda _P)}{\rho (A)} \approx _\varepsilon \frac{\rho ({\mathbf {w}}_{\perp {\varLambda _P}})}{\rho (A)\det (\varLambda _P)}, $$

where the last step follows by Corollary 1. By assumption, \(\mathrm {span}(\varLambda _P) = P\), so \({\mathbf {w}}_{\perp {\varLambda _P}} = {\mathbf {w}}_{\perp {P}} = \mathbf {v}\) and hence

$$ p_{\mathbf {v}} \approx _\varepsilon \frac{\rho (\mathbf {v})}{\rho (A)\det (\varLambda _P)} = C\cdot q_{\mathbf {v}}$$

for some constant \(C = \rho ({A}_{\perp {P}}) / (\rho (A)\det (\varLambda _P))\). Summing over all \(\mathbf {v} \in A_{{A}_{\perp {P}}}\) gives \(1 \approx _\varepsilon C\), or, equivalently, \(C \in [1/(1+\varepsilon ),1/(1-\varepsilon )]\). It follows that

$$\frac{1-\varepsilon }{1+\varepsilon } q_{\mathbf {v}} \le p_{\mathbf {v}} \le \frac{1+\varepsilon }{1-\varepsilon }\cdot q_{\mathbf {v}},$$

and therefore \(\varDelta _{\textsc {ml}}({(\mathcal {D}_A)}_{\perp {P}}, \mathcal {D}_{{A}_{\perp {P}}}) \le \log \frac{1+\varepsilon }{1-\varepsilon }\).    \(\square \)

We now prove the main theorem.

Proof

(of Theorem 1). The main idea is to express \(\varLambda \) as \(\mathbf {S} \varLambda '\) for a lattice \(\varLambda '\), then use the injectivity of \(\mathbf {T}\mathbf {S}\) on the subspace orthogonal to \(\ker (\mathbf {T} \mathbf {S})\), which contains \({\varLambda '}_{\perp {\ker (\mathbf {T} \mathbf {S})}}\).

Notice that \(\mathbf {a} \in A \subset \mathrm {span}(\mathbf {S})\) and \(\varLambda = A - \mathbf {a} \subset \mathrm {span}(\mathbf {S})\). Therefore, we can write \(A = \mathbf {S} A'\) for some lattice coset \(A'=\varLambda '+\mathbf {a}'\) with \(\mathbf {S}\varLambda '=\varLambda \) and \(\mathbf {S} \mathbf {a}'=\mathbf {a}\). Since \(\mathbf {S}\) is injective, by Lemma 1 we have

$$\begin{aligned} \mathbf {T} \cdot \mathcal {D}_{A,\mathbf {S}} = \mathbf {T} \cdot \mathcal {D}_{\mathbf {S}A',\mathbf {S}} = \mathbf {T}\mathbf {S} \cdot \mathcal {D}_{A'}. \end{aligned}$$
(3.1)

Now let \(P = \ker (\mathbf {T}\mathbf {S})\), so that \(\mathbf {S} P = \mathrm {span}(\mathbf {S}) \cap \ker (\mathbf {T})\). In particular, using \(\varLambda \subset \mathrm {span}(\mathbf {S})\) and the injectivity of \(\mathbf {S}\), we get

$$ \varLambda \cap \ker (\mathbf {T}) = \varLambda \cap \mathrm {span}(\mathbf {S}) \cap \ker (\mathbf {T}) = \varLambda \cap \mathbf {S}P = \mathbf {S}\varLambda ' \cap \mathbf {S}P = \mathbf {S} (\varLambda ' \cap P).$$

Using the assumption \(\ker (\mathbf {T}) = \mathrm {span}(\varLambda \cap \ker (\mathbf {T}))\) we also get

$$ \mathbf {S} P = \mathrm {span}(\mathbf {S}) \cap \ker (\mathbf {T}) = \mathrm {span}(\mathbf {S}) \cap \mathrm {span}(\varLambda \cap \ker (\mathbf {T})) = \mathrm {span}(\varLambda \cap \ker (\mathbf {T})). $$

It follows that \(\mathbf {S}P = \mathrm {span}(\mathbf {S}(\varLambda '\cap P))\), and, since \(\mathbf {S}\) is injective, \(P = \mathrm {span}(\varLambda '\cap P)\). We also have

$$ \eta _\varepsilon (\mathbf {S}(\varLambda ' \cap P)) = \eta _\varepsilon (\varLambda \cap \ker (\mathbf {T})) \le \mathbf {S} , $$

which, by definition, gives \(\eta _\varepsilon (\varLambda ' \cap P) \le 1\). So, the hypotheses of Lemma 7 are satisfied, and

$$ \varDelta _{\textsc {ml}}({(\mathcal {D}_{A'})}_{\perp {P}} \, , \, \mathcal {D}_{{A'}_{\perp {P}}}) \le \log \frac{1+\varepsilon }{1-\varepsilon }.$$

Applying \(\mathbf {T}\mathbf {S}\) to both distributions we get that

$$ \varDelta _{\textsc {ml}}(\mathbf {T}\mathbf {S} \cdot {(\mathcal {D}_{A'})}_{\perp {P}} \, , \, \mathbf {T}\mathbf {S} \cdot \mathcal {D}_{{A'}_{\perp {P}}}) \le \log \frac{1+\varepsilon }{1-\varepsilon }.$$

It remains to show that these are the distributions in the theorem statement. To this end, observe that \(\mathbf {T}\mathbf {S}\mathbf {x} = \mathbf {T}\mathbf {S}({\mathbf {x}}_{\perp {P}})\) for any vector \(\mathbf {x}\). Therefore, the first distribution equals

$$ \mathbf {T}\mathbf {S} \cdot {(\mathcal {D}_{A'})}_{\perp {P}} = \mathbf {T}\mathbf {S} \cdot \mathcal {D}_{A'} = \mathbf {T} \cdot \mathcal {D}_{\mathbf {S} A',\mathbf {S}} = \mathbf {T} \cdot \mathcal {D}_{A,\mathbf {S}}. $$

Finally, since \(\mathbf {T}\mathbf {S}\) is injective on \({A'}_{\perp {P}}\), we can apply Lemma 1 and see that the second distribution is

$$ \mathbf {T}\mathbf {S} \cdot \mathcal {D}_{{A'}_{\perp {P}}} = \mathcal {D}_{\mathbf {T}\mathbf {S} A',\mathbf {T}\mathbf {S}} = \mathcal {D}_{\mathbf {T} A,\mathbf {T}\mathbf {S}} . $$

    \(\square \)

Corollary 2 below, recently stated in [DGPY19], is a special case of Theorem 1. The difference is that while Corollary 2 assumes that \(\mathbf {T}\) is a primitive integer matrix and \(A = \varLambda = \mathbb {Z}^m\) is the integer lattice, Theorem 1 applies to arbitrary linear transformations \(\mathbf {T}\) and lattice cosets \(A = \varLambda + \mathbf {a} \subset \mathbb {R}^m\).

Corollary 2

([DGPY19, Lemma 3]). For any \(\varepsilon \in (0,1/2)\) and \(\mathbf {T} \in \mathbb {Z}^{n\times m}\) such that \(\mathbf {T} \mathbb {Z}^m = \mathbb {Z}^n\) and \(\eta _\varepsilon (\mathbb {Z}^{m} \cap \ker (\mathbf {T})) \le r\), we have

$$ \varDelta _{\textsc {ml}}(\mathbf {T} \cdot \mathcal {D}_{\mathbb {Z}^m,r} \, , \, \mathcal {D}_{\mathbb {Z}^n,r\mathbf {T}}) \le 4 \varepsilon .$$

4 Convolutions

This section focuses on convolutions of discrete Gaussians. The literature on lattice-based cryptography has a multitude of convolution theorems and lemmas for discrete Gaussians (e.g., [Reg05, Pei10, BF11, MP13]), most of which are formally incomparable despite the close similarity of their statements and proofs. In this section we show all of them can be obtained and generalized solely via Theorem 1 and elementary linear algebra.

First, in Sect. 4.1 we analyze the joint distribution of a convolution. Then in Sect. 4.2 we show how to obtain (and in some cases generalize) all prior discrete Gaussian convolution theorems, by viewing each convolution as a linear transformation on its joint distribution.

4.1 Joint Distributions

Here we prove several general theorems on the joint distributions of discrete Gaussian convolutions.

Theorem 2

For any \(\varepsilon \in [0,1)\), cosets \(A_{1}, A_{2}\) of lattices \(\varLambda _{1}, \varLambda _{2}\) (respectively), and matrix \(\mathbf {T}\) such that \(\mathrm {span}(\mathbf {T}) \subseteq \mathrm {span}(\varLambda _{2})\) and \(\eta _{\varepsilon }(\varLambda _2) \le 1\), we have

$$ \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_1 \leftarrow \mathcal {D}_{A_1} \, , \, \mathbf {x}_2 \leftarrow \mathcal {D}_{A_2 + \mathbf {T} \mathbf {x}_1}}\rrbracket \overset{\bar{\varepsilon }}{\approx } \mathcal {D}_{A} , $$

where and \(\bar{\varepsilon }=2\varepsilon /(1-\varepsilon )\).

Proof

Let \(\mathbf {P}(\mathbf {x}_1, \mathbf {x}_2) = (\mathbf {x}_1 , {(\mathbf {x}_{2})}_{\perp {\varLambda _{2}}})\) be the orthogonal projection on the first \(n_1\) coordinates and the subspace orthogonal to \(\varLambda _{2}\), and observe that \({(A_{2})}_{\perp {\varLambda _{2}}} = \{{\mathbf {a}}\}\) is a singleton set for some \(\mathbf {a}\). For any fixed \(\mathbf {x}_1 \in A_1\), it is straightforward to verify that

$$ \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_2 \leftarrow \mathcal {D}_{A_2+\mathbf {T}\mathbf {x}_1}}\rrbracket = \llbracket {\mathbf {x} \mid \mathbf {x} \leftarrow \mathcal {D}_{A}, \mathbf {P}(\mathbf {x})=(\mathbf {x}_1, \mathbf {a})}\rrbracket . $$

Therefore, it is enough to show that \((\mathcal {D}_{A_1}, \mathbf {a}) \overset{\bar{\varepsilon }}{\approx } \mathbf {P}(\mathcal {D}_{A})\). Define and \(\varLambda _P = \varLambda \cap \ker (\mathbf {P}) = \{{\mathbf {0}}\}\oplus \varLambda _2\). Notice that \(\ker (\mathbf {P}) = \{{\mathbf {0}}\} \oplus \mathrm {span}(\varLambda _{2}) = \mathrm {span}(\varLambda _{P})\) (i.e., \(\ker (\mathbf {P})\) is a \(\varLambda \)-subspace), and \(\eta _\varepsilon (\varLambda _P) = \eta _\varepsilon (\varLambda _2) \le 1\). Therefore, by Theorem 1,

$$ \mathbf {P}(\mathcal {D}_A) \overset{\bar{\varepsilon }}{\approx } \mathcal {D}_{\mathbf {P}(A)} = \mathcal {D}_{A_{1} \times \{{\mathbf {a}}\}} = (\mathcal {D}_{A_1}, \mathbf {a}). $$

    \(\square \)

As a corollary, we get the following more symmetric statement, which says essentially that if the lattices of \(A_{1}\) and \(A_{2}\) are sufficiently smooth, then a pair of \(\bar{\delta }\)-correlated Gaussian samples over \(A_1\) and \(A_2\) can be produced in two different ways, depending on which component is sampled first.

Corollary 3

For any \(\varepsilon \in [0,1)\) and \(\delta \in (0,1]\) with \(\delta ' = \sqrt{1-\delta ^2}\), and any cosets \(A_{1}, A_{2}\) of full-rank lattices \(\varLambda _{1}, \varLambda _{2} \subset \mathbb {R}^{n}\) (respectively) where \(\eta _\varepsilon (\varLambda _1), \eta _\varepsilon (\varLambda _2)\le \delta \), define the distributions

$$\begin{aligned} \mathcal {X}_1&=\!\! \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_1 \leftarrow \mathcal {D}_{A_1} \, , \, \mathbf {x}_2 \leftarrow \delta '\mathbf {x}_1 + \mathcal {D}_{A_2-\delta '\mathbf {x}_1,\delta }}\rrbracket \\ \mathcal {X}_2&=\!\! \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_2 \leftarrow \mathcal {D}_{A_2} \, , \, \mathbf {x}_1 \leftarrow \delta '\mathbf {x}_2 + \mathcal {D}_{A_1 - \delta '\mathbf {x}_2,\delta }}\rrbracket . \end{aligned}$$

Then \(\mathcal {X}_1 \overset{\bar{\varepsilon }}{\approx } \mathcal {D}_{A,\sqrt{\varSigma }} \overset{\bar{\varepsilon }}{\approx }\mathcal {X}_2\), where \(A = A_1\times A_2\), \(\bar{\varepsilon }= 2\varepsilon /(1-\varepsilon )\), and .

Proof

By Lemma 1, the conditional distribution of \(\mathbf {x}_2\) given \(\mathbf {x}_1\) in \(\mathcal {X}_1\) is \( \delta '\mathbf {x}_1 + \delta \mathcal {D}_{(A_2/\delta )-(\delta '/\delta )\mathbf {x}_1}\). So, \(\mathcal {X}_1\) can be equivalently expressed as

Since \(\eta _\varepsilon (\varLambda _2/\delta ) = \eta _\varepsilon (\varLambda _2)/\delta \le 1\), we can apply Theorem 2 with \(\mathbf {T} = - (\delta '/\delta )\mathbf {I}\), and get that the first distribution satisfies \(\mathcal {X}_1 \overset{\bar{\varepsilon }}{\approx } \mathbf {S} \cdot \mathcal {D}_{A'}\), where . Since \(\mathbf {S}\) is injective, by Lemma 1 we have

$$ \mathcal {X}_1 \overset{\bar{\varepsilon }}{\approx } \mathbf {S} \cdot \mathcal {D}_{A'} = \mathcal {D}_{\mathbf {S} A',\mathbf {S}} = \mathcal {D}_{A,\sqrt{\varSigma }} $$

where . By symmetry, \(\mathcal {X}_2 \overset{\bar{\varepsilon }}{\approx } \mathcal {D}_{A,\sqrt{\varSigma }}\) as well.    \(\square \)

Corollary 3 also generalizes straightforwardly to the non-spherical case, as follows.

Corollary 4

For any \(\varepsilon \in [0,1)\), cosets \(A_1, A_2\) of lattices \(\varLambda _1,\varLambda _2\) (respectively), and matrices \(\mathbf {R}, \mathbf {S}_{1}, \mathbf {S}_{2}\) of full column rank where \(A_{1} \subset \mathrm {span}(\mathbf {S}_{1})\), \(\mathrm {span}(\mathbf {R} \mathbf {S}_{1}) \subseteq \mathrm {span}(\varLambda _{2})\), and \(\eta _\varepsilon (\varLambda _2) \le \mathbf {S}_2\), we have

$$\begin{aligned} \mathcal {X}:=\!\! \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_1 \leftarrow \mathcal {D}_{A_1, \mathbf {S}_1} \, , \, \mathbf {x}_2 \leftarrow \mathbf {R} \mathbf {x}_{1} + \mathcal {D}_{A_2 - \mathbf {R} \mathbf {x}_{1},\mathbf {S}_2}}\rrbracket \overset{\bar{\varepsilon }}{\approx } \mathcal {D}_{A,\mathbf {S}} , \end{aligned}$$

where \(A = A_1\times A_2\), \(\bar{\varepsilon }= 2\varepsilon /(1-\varepsilon )\), and .

Proof

We proceed similarly to the proof of Corollary 3. For simplicity, substitute \(\mathbf {x}_{1}\) with \(\mathbf {S}_{1} \mathbf {x}_{1}\) where \(\mathbf {x}_1 \leftarrow \mathcal {D}_{\mathbf {S}_1^{+} A_1}\). Then by Lemma 1, the vector \(\mathbf {x}_2\) in \(\mathcal {X}\), conditioned on any value of \(\mathbf {x}_1\), has distribution

$$ \mathbf {R} \mathbf {S}_1 \mathbf {x}_1 + \mathbf {S}_2 \cdot \mathcal {D}_{\mathbf {S}_2^{+} (A_2 - \mathbf {R} \mathbf {S}_1 \mathbf {x}_1) }.$$

So, we can express \(\mathcal {X}\) equivalently as

and since \(\eta _\varepsilon (\mathbf {S}_2^{+} \cdot \varLambda _2) \le 1\), we can apply Theorem 2 with lattice cosets \(A'_{1} = \mathbf {S}_{1}^{+} A_{1}, A'_{2} = \mathbf {S}_{2}^{+} A_{2}\) and \(\mathbf {T} = -\mathbf {S}_2^{+} \mathbf {R} \mathbf {S}_1\). This yields \(\mathcal {X}\overset{\bar{\varepsilon }}{\approx } \mathbf {S} \cdot \mathcal {D}_{A'} = \mathcal {D}_{\mathbf {S} A', \mathbf {S}}\) where

and hence \(\mathbf {S} A' = A\), as needed.    \(\square \)

The following corollary, which may be useful in cryptography, involves Gaussian distributions over lattices and uniform distributions over their (finite) quotient groups.

Corollary 5

Let \(\varLambda , \varLambda _{1}, \varLambda _{2}\) be full-rank lattices where \(\varLambda \subseteq \varLambda _1\cap \varLambda _2\) and \(\eta _\varepsilon (\varLambda _1), \eta _\varepsilon (\varLambda _2) \le 1\) for some \(\varepsilon > 0\), and define the distributions

$$\begin{aligned} \mathcal {X}_1&=\!\! \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_1 \leftarrow \mathcal {U}(\varLambda _1/\varLambda ) \, , \, \mathbf {x}_2 \leftarrow \mathbf {x}_{1} + \mathcal {D}_{\varLambda _2 - \mathbf {x}_1} \bmod \varLambda }\rrbracket , \\ \mathcal {X}_2&=\!\! \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_2 \leftarrow \mathcal {U}(\varLambda _2/\varLambda ) \, , \, \mathbf {x}_1 \leftarrow \mathbf {x}_{2} + \mathcal {D}_{\varLambda _1 - \mathbf {x}_2} \bmod \varLambda }\rrbracket . \end{aligned}$$

Then \(\mathcal {X}_1 \overset{\bar{\varepsilon }}{\approx } \mathcal {X}_2\) where \(\bar{\varepsilon }=4\varepsilon /(1-\varepsilon )^2\).

Proof

We assume the strict inequality \(\eta _{\varepsilon }(\varLambda _{1}) < 1\); the claim then follows in the limit. Let \(\delta ' \in (\eta _\varepsilon (\varLambda _1), 1)\), \(\delta =\sqrt{1 - \delta '^2}\), and apply Corollary 3 to \(A_1 = (\delta /\delta ') \varLambda _1\) and \(A_2 = \delta \varLambda _2\). Notice that the hypotheses of Corollary 3 are satisfied because \(\eta _\varepsilon (A_1) = \delta \eta _\varepsilon (\varLambda _1)/\delta ' < \delta \) and \(\eta _\varepsilon (A_2) =\delta \eta _\varepsilon (\varLambda _2) \le \delta \). So, the distributions

$$\begin{aligned} \mathcal {X}_1'= & {} \!\! \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_1 \leftarrow \mathcal {D}_{A_1} \, , \, \mathbf {x}_2 \leftarrow \delta ' \mathbf {x}_{1} + \mathcal {D}_{A_2-\delta '\mathbf {x}_1,\delta }}\rrbracket \\ \mathcal {X}_2'= & {} \!\! \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_2 \leftarrow \mathcal {D}_{A_2} \, , \, \mathbf {x}_1 \leftarrow \delta ' \mathbf {x}_{2} + \mathcal {D}_{A_1 - \delta '\mathbf {x}_2,\delta }}\rrbracket \end{aligned}$$

satisfy \(\mathcal {X}_1' \overset{\bar{\varepsilon }}{\approx } \mathcal {X}_2'\). Let \(f:A_1\times A_2 \rightarrow (\varLambda _1/\varLambda ,\varLambda _2/\varLambda )\) be the function

$$ f(\mathbf {x}_1,\mathbf {x}_2) = ((\delta '/\delta ) \mathbf {x}_{1} \bmod \varLambda , \mathbf {x}_2/\delta \bmod \varLambda ). $$

It is easy to check, using Lemma 1, that

$$\begin{aligned} f(\mathcal {X}_1')&=\!\! \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_1 \leftarrow \mathcal {D}_{\varLambda _1,\delta '/\delta } \bmod \varLambda \, , \, \mathbf {x}_2 \leftarrow \mathbf {x}_{1} + \mathcal {D}_{\varLambda _2-\mathbf {x}_1} \bmod \varLambda }\rrbracket \\ f(\mathcal {X}_2')&=\!\! \llbracket {(\mathbf {x}_1,\mathbf {x}_2) \mid \mathbf {x}_2 \leftarrow \mathcal {D}_{\varLambda _2,1/\delta } \bmod \varLambda \, , \, \mathbf {x}_1 \leftarrow \delta '^2 \mathbf {x}_2 + \mathcal {D}_{\varLambda _1 - \delta '^2\mathbf {x}_2,\delta '} \bmod \varLambda }\rrbracket \end{aligned}$$

and \(\mathcal {X}_i = \lim _{\delta '\rightarrow 1} \mathcal {X}_i'\) for \(i=1,2\). Since \(\mathcal {X}_1' \overset{\bar{\varepsilon }}{\approx } \mathcal {X}_2'\) for all \(\delta '\), we have \(\mathcal {X}_1 \overset{\bar{\varepsilon }}{\approx } \mathcal {X}_2\).    \(\square \)

4.2 Convolutions via Linear Transformations

In this subsection we show how the preceding results can be used to easily derive all convolution theorems from previous works, for both discrete and continuous Gaussians. The main idea throughout is very simple: first express the statistical experiment as a linear transformation on some joint distribution, then apply Theorem 1. The only nontrivial step is to bound the smoothing parameter of the intersection of the relevant lattice and the kernel of the transformation, which is done using elementary linear algebra. The main results of the section are Theorems 3 and 4; following them, we show how they imply prior convolution theorems.

The following theorem is essentially equivalent to [Pei10, Theorem 3.1], modulo the notion of distance between distributions. (The theorem statement from [Pei10] uses statistical distance, but the proof actually establishes a bound on the max-log distance, as we do here.) The main difference is in the modularity of our proof, which proceeds solely via our general tools and linear algebra.

Theorem 3

Let \(\varepsilon \in (0,1)\) define \(\bar{\varepsilon } = 2\varepsilon /(1-\varepsilon )\) and \(\varepsilon ' = 4\varepsilon /(1-\varepsilon )^2\), let \(A_{1}, A_{2}\) be cosets of full-rank lattices \(\varLambda _{1}, \varLambda _{2}\) (respectively), let \(\varSigma _1, \varSigma _2 \succ 0\) be positive definite matrices where \(\eta _\varepsilon (\varLambda _2) \le \sqrt{\varSigma _{2}}\), and let

$$\begin{aligned} \mathcal {X}=\!\! \llbracket { (\mathbf {x}_{1}, \mathbf {x}_{2}) \mid \mathbf {x}_1 \leftarrow \mathcal {D}_{A_{1}, \sqrt{\varSigma _1}} \, , \, \mathbf {x}_2 \leftarrow \mathbf {x}_1 + \mathcal {D}_{A_{2} - \mathbf {x}_1, \sqrt{\varSigma _2}}}\rrbracket . \end{aligned}$$

If \(\eta _\varepsilon (\varLambda _{1}) \le \sqrt{\varSigma _3}\) where \(\varSigma _3^{-1} = \varSigma _1^{-1}+\varSigma _2^{-1} \succ 0\), then the marginal distribution \(\mathcal {X}_{2}\) of \(\mathbf {x}_{2}\) in \(\mathcal {X}\) satisfies

$$ \mathcal {X}_{2} \overset{\varepsilon '}{\approx } \mathcal {D}_{A_{2}, \sqrt{\varSigma _1 + \varSigma _2}}. $$

In any case (regardless of \(\eta _{\varepsilon }(\varLambda _{1})\)), the distribution \(\mathcal {X}_{1}^{\mathbf {x}_{2}}\) of \(\mathbf {x}_{1}\) conditioned on any \(\mathbf {x}_{2} \in A_{2}\) satisfies \(\mathcal {X}_{1}^{\mathbf {x}_{2}} \overset{\bar{\varepsilon }}{\approx } \mathbf {x}'_{2} + \mathcal {D}_{A_{1} - \mathbf {x}'_{2}, \sqrt{\varSigma _{3}}}\) where \(\mathbf {x}'_{2} = \varSigma _{1}(\varSigma _{1}+\varSigma _{2})^{-1} \mathbf {x}_{2} = \varSigma _{3} \varSigma _{2}^{-1} \mathbf {x}_{2}\).

Proof

Clearly, \(\mathcal {X}_{2}=\mathbf {P} \cdot \mathcal {X}\), where \(\mathbf {P} = \begin{pmatrix} \mathbf {0}&\mathbf {I} \end{pmatrix}\). Because \(\eta _\varepsilon (\varLambda _2) \le \sqrt{\varSigma _2}\), Corollary 4 implies

$$ \mathcal {X}\overset{\bar{\varepsilon }}{\approx } \mathcal {D}_{A, \sqrt{\varSigma }} \quad \text {and hence} \quad \mathbf {P} \cdot \mathcal {X}\overset{\bar{\varepsilon }}{\approx } \mathbf {P} \cdot \mathcal {D}_{A, \sqrt{\varSigma }}, $$

where \(A = A_{1} \times A_{2}\) and . Then, Theorem 1 (whose hypotheses we verify below) implies that

$$ \mathbf {P} \cdot \mathcal {D}_{A,\sqrt{\varSigma }} \overset{\bar{\varepsilon }}{\approx } \mathcal {D}_{\mathbf {P} A, \mathbf {P} \sqrt{\varSigma }} = \mathcal {D}_{A_{2}, \sqrt{\varSigma _{1} + \varSigma _{2}}}, $$

where the equality follows from the fact that \(\mathcal {D}\) is insensitive to the choice of square root, and \(\mathbf {R} = \mathbf {P} \sqrt{\varSigma } = \begin{pmatrix} \sqrt{\varSigma _{1}}&\sqrt{\varSigma _{2}} \end{pmatrix}\) is a square root of \(\mathbf {R} \mathbf {R}^{t} = \varSigma _{1} + \varSigma _{2}\). This proves the claim about \(\mathcal {X}_{2}\).

To apply Theorem 1, for \(\varLambda = \varLambda _{1} \times \varLambda _{2}\) we require that \(\ker (\mathbf {P})\) is a \(\varLambda \)-subspace, and that \(\eta _{\varepsilon }(\varLambda \cap \ker (\mathbf {P})) = \eta _{\varepsilon }(\varLambda _{1} \times \{{\mathbf {0}}\}) \le \sqrt{\varSigma }\). For the former, because \(\varLambda _{1}\) is full rank we have

$$ \ker (\mathbf {P}) = \mathrm {span}(\varLambda _{1}) \times \{{\mathbf {0}}\} = \mathrm {span}(\varLambda _{1} \times \{{\mathbf {0}}\}) = \mathrm {span}(\ker (\mathbf {P}) \cap \varLambda ). $$

For the latter, by definition we need to show that \(\eta _{\varepsilon }(\varLambda ') \le 1\) where \(\varLambda ' = \sqrt{\varSigma }^{-1} \cdot (\varLambda _{1} \times \{{\mathbf {0}}\})\). Because

Now \(\mathbf {S}^{t} \mathbf {S} = \varSigma _{1}^{-1} + \varSigma _{2}^{-1} = \varSigma _{3}^{-1}\), so \(\Vert {\mathbf {S} \mathbf {v}}\Vert ^{2} = \mathbf {v}^{t} \mathbf {S}^{t} \mathbf {S} \mathbf {v} = \Vert {\sqrt{\varSigma _{3}}^{-1} \mathbf {v}}\Vert ^{2}\) for every \(\mathbf {v}\). Therefore, \(\varLambda ' = \mathbf {S} \cdot \varLambda _{1}\) is isometric to (i.e., a rotation of) \(\sqrt{\varSigma _{3}}^{-1} \cdot \varLambda _{1}\), so \(\eta _{\varepsilon }(\varLambda ') \le 1\) is equivalent to \(\eta _{\varepsilon }(\sqrt{\varSigma _{3}}^{-1} \cdot \varLambda _{1}) \le 1\), which by definition is equivalent to the hypothesis \(\eta _{\varepsilon }(\varLambda _{1}) \le \sqrt{\varSigma _{3}}\).

To prove the claim about \(\mathcal {X}_{1}^{\mathbf {x}_{2}}\) for an arbitrary \(\mathbf {x}_{2} \in A_{2}\), we work with \(\mathcal {D}_{A,\sqrt{\varSigma }}\) using a different choice of the square root of , namely,

for \(\sqrt{\varSigma _{3}} \mathbf {X} = -\varSigma _{1}(\varSigma _{1}+\varSigma _{2})^{-1} = -\varSigma _{3} \varSigma _{2}^{-1}\); this \(\sqrt{\varSigma }\) is valid because

$$\begin{aligned} \varSigma _{3} + \varSigma _{1} (\varSigma _{1}+\varSigma _{2})^{-1} \varSigma _{1}&= (\varSigma _{1}^{-1}+\varSigma _{2}^{-1})^{-1} + \varSigma _{1} - \varSigma _{2}(\varSigma _{1}+\varSigma _{2})^{-1}\varSigma _{1} \\&= \varSigma _{1} + (\varSigma _{1}^{-1}+\varSigma _{2}^{-1})^{-1} - (\varSigma _{1}^{-1} (\varSigma _{1}+\varSigma _{2}) \varSigma _{2}^{-1})^{-1} \\&= \varSigma _{1}, \end{aligned}$$

and \(\varSigma _{1}(\varSigma _{1}+\varSigma _{2})^{-1} = \varSigma _{3} \varSigma _{2}^{-1}\) by a similar manipulation. Now, the distribution \(\mathcal {D}_{A,\sqrt{\varSigma }}\) conditioned on any \(\mathbf {x}_{2} \in A_{2}\) is

$$ \mathcal {D}_{A_{1} \times \{{\mathbf {x}_{2}\}}, \sqrt{\varSigma }} = \sqrt{\varSigma } \cdot \mathcal {D}_{\sqrt{\varSigma }^{-1} (A_{1} \times \{{\mathbf {x}_{2}\}})} = \sqrt{\varSigma } \cdot (\mathcal {D}_{\sqrt{\varSigma _{3}}^{-1} A_{1} + \mathbf {X} \mathbf {x}_{2}}, \sqrt{\varSigma _{1}+\varSigma _{2}}^{-1} \mathbf {x}_{2}), $$

where the last equality follows from the fact that the second component of \(\sqrt{\varSigma }^{-1} (A_{1} \times \{{\mathbf {x}_{2}\}})\) is fixed because \(\sqrt{\varSigma }^{-1}\) is block upper-triangular. So, the conditional distribution of \(\mathbf {x}_{1}\), which is the first component of the above distribution, is

$$ \varSigma _{1} (\varSigma _{1}+\varSigma _{2})^{-1} \mathbf {x}_{2} + \mathcal {D}_{A_{1} + \sqrt{\varSigma _{3}} \mathbf {X} \mathbf {x}_{2}, \sqrt{\varSigma _{3}}} = \mathbf {x}'_{2} + \mathcal {D}_{A_{1} - \mathbf {x}'_{2}, \sqrt{\varSigma _{3}}}. $$

Finally, because \(\mathcal {X}\overset{\bar{\varepsilon }}{\approx } \mathcal {D}_{A,\sqrt{\varSigma }}\), the claim on the conditional distribution \(\mathcal {X}_{1}^{\mathbf {x}_{2}}\) is established.    \(\square \)

There are a number of convolution theorems in the literature that pertain to linear combinations of Gaussian samples. We now present a theorem that, as shown below, subsumes all of them. The proof generalizes part of the proof of [MP13, Theorem 3.3] (stated below as Corollary 6).

Theorem 4

Let \(\varepsilon \in (0,1)\), let \(\mathbf {z} \in \mathbb {Z}^{m} \setminus \{{\mathbf {0}}\}\), and for \(i=1,\ldots ,m\) let \(A_{i} = \varLambda _{i} + \mathbf {a}_{i} \subset \mathbb {R}^{n}\) be a lattice coset and \(\mathbf {S}_{i} \in \mathbb {R}^{n \times n}\) be such that \(\varLambda _{\cap } = \bigcap _{i} \varLambda _i\) is full rank. If \(\eta _{\varepsilon }(\ker (\mathbf {z}^{t} \otimes \mathbf {I}_{n}) \cap \varLambda ) \le \mathbf {S}\) where \(\varLambda = \varLambda _{1} \times \cdots \times \varLambda _{m}\) and \(\mathbf {S} = \mathrm {diag}(\mathbf {S}_{1}, \ldots , \mathbf {S}_{m})\), then

$$ \varDelta _{\textsc {ml}}(\sum _{i=1}^{m} z_{i} \mathcal {D}_{A_{i},\mathbf {S}_{i}}, \, \mathcal {D}_{A', \mathbf {S}'}) \le \log \frac{1+\varepsilon }{1-\varepsilon }, $$

where \(A' = \sum _{i=1}^{m} z_{i} A_{i}\) and \(\mathbf {S}' = \sqrt{\sum _{i=1}^{m} z_{i}^{2} \mathbf {S}_{i} \mathbf {S}_{i}^{t}}\).

In particular, let each \(\mathbf {S}_{i} = s_{i} \mathbf {I}_{n}\) for some \(s_{i} > 0\) where \(\tilde{bl}(\mathrm {diag}(\mathbf {s})^{-1} (\ker (\mathbf {z}^{t}) \cap \mathbb {Z}^{m}))^{-1} \ge \eta _{\varepsilon }(\varLambda _{\cap })\), which is implied by \(((z_{i^{*}}/s_{i^{*}})^{2} + \max _{i \ne i^{*}} (z_{i}/s_{i})^{2})^{-1/2} \ge \eta _{\varepsilon }(\varLambda _{\cap })\) where \(i^{*}\) minimizes \(|{z_{i^{*}}/s_{i^{*}}}|\ne 0\). Then

$$ \varDelta _{\textsc {ml}}(\sum _{i=1}^{m} z_{i} \mathcal {D}_{A_{i},s_{i}}\, , \, \mathcal {D}_{A', s'}) \le \log \frac{1+\varepsilon '}{1-\varepsilon '}, $$

where \(s' = \sqrt{\sum _{i=1}^{m} (z_{i} s_{i})^{2}}\) and \(1+\varepsilon ' = (1+\varepsilon )^{m}\).

Proof

Let \(\mathbf {Z} = \mathbf {z}^{t} \otimes \mathbf {I}_{n}\) and \(A=A_{1} \times \cdots \times A_{m}\), which is a coset of \(\varLambda \), and observe that

$$ \sum _{i=1}^{m} z_{i} \mathcal {D}_{A_{i}, \mathbf {S}_{i}} = \mathbf {Z} \cdot \mathcal {D}_{A,\mathbf {S}} .$$

Also notice that \(\mathbf {Z} A = A'\), and \(\mathbf {R} = \mathbf {Z} \mathbf {S}\) is a square root of \(\mathbf {R} \mathbf {R}^{t} = \sum _{i=1}^{m} z_{i}^{2} \mathbf {S}_{i} \mathbf {S}_{i}^{t}\). So, the first claim follows immediately by Theorem 1, as long as \(\ker (\mathbf {Z})\) is a \(\varLambda \)-subspace.

To see that this is so, first observe that the lattice \(Z = \ker (\mathbf {z}^{t}) \cap \mathbb {Z}^{m}\) has rank \(m-1\). Then the lattice \(Z \otimes \varLambda _{\cap }\) has rank \((m-1)n\) and is contained in \(\ker (\mathbf {Z}) \cap \varLambda \), because for any \(\mathbf {v} \in Z \subseteq \mathbb {Z}^{m}\) and \(\mathbf {w} \in \varLambda _{\cap }\) we have \(\mathbf {Z} (\mathbf {v} \otimes \mathbf {w}) = (\mathbf {z}^{t} \mathbf {v}) \otimes \mathbf {w} = \mathbf {0}\) and \((\mathbf {v} \otimes \mathbf {w}) \in \varLambda _{\cap }^{m} \subseteq \varLambda \). So, because \(\ker (\mathbf {Z})\) has dimension \((m-1)n\) we have \(\ker (\mathbf {Z}) = \mathrm {span}(Z \otimes \varLambda _{\cap }) = \mathrm {span}(\ker (\mathbf {Z}) \cap \varLambda )\), as desired.

For the second claim (with the first hypothesis), we need to show that \(\eta _{\varepsilon '}(\ker (\mathbf {Z}) \cap \varLambda ) \le \mathbf {S} = \mathrm {diag}(\mathbf {s}) \otimes \mathbf {I}_{n}\). Because \(Z \otimes \varLambda _{\cap }\) is a sublattice of \(\ker (\mathbf {Z}) \cap \varLambda \) of the same rank, by Lemma 5 and hypothesis, we have

$$\begin{aligned} \eta _{\varepsilon '}(\mathbf {S}^{-1} (\ker (\mathbf {Z}) \cap \varLambda ))&\le \eta _{\varepsilon '}((\mathrm {diag}(\mathbf {s})^{-1} \otimes \mathbf {I}_{n}) \cdot (Z \otimes \varLambda _{\cap })) \\&\le \eta _{\varepsilon '}((\mathrm {diag}(\mathbf {s})^{-1} Z) \otimes \varLambda _{\cap }) \\&\le \tilde{bl}(\mathrm {diag}(\mathbf {s})^{-1} Z) \cdot \eta _{\varepsilon }(\varLambda _{\cap }) \le 1. \end{aligned}$$

Finally, to see that the first hypothesis is implied by the second one, assume without loss of generality that \(i^{*}=1\), and observe that the vectors

$$ (-\frac{z_{2}}{s_{2}}, \frac{z_{1}}{s_{1}}, 0, \ldots , 0)^{t}, (-\frac{z_{3}}{s_{3}}, 0, \frac{z_{1}}{s_{1}}, 0, \ldots , 0)^{t}, \ldots , (-\frac{z_{m}}{s_{m}}, 0, \ldots , 0, \frac{z_{1}}{s_{1}})^{t} $$

form a full-rank subset of \(\mathrm {diag}(\mathbf {s})^{-1} Z\), and have norms at most

$$r = \sqrt{(z_{i^{*}}/s_{i^{*}})^{2} + \max _{i \ne i^{*}} (z_{i}/s_{i})^{2}}.$$

Therefore, by [MG02, Lemma 7.1] we have \(\tilde{bl}(\mathrm {diag}(\mathbf {s})^{-1} Z)^{-1} \ge 1/r \ge \eta _{\varepsilon }(\varLambda _{\cap })\), as required.    \(\square \)

Corollary 6

([MP13, Theorem 3.3]). Let \(\mathbf {z} \in \mathbb {Z}^{m} \setminus \{{\mathbf {0}}\}\), and for \(i=1, \ldots , m=\mathrm {poly}(n)\) let \(\varLambda + \mathbf {c}_{i}\) be cosets of a full-rank n-dimensional lattice \(\varLambda \) and \(s_i \ge \sqrt{2} \Vert {\mathbf {z}}\Vert _\infty \cdot \eta _{\varepsilon }(\varLambda )\) for some \(\varepsilon = \mathrm {negl}(n)\). Then \(\sum _{i=1}^{m} z_{i} \mathcal {D}_{\varLambda + \mathbf {c}_{i}, s_{i}}\) is within \(\mathrm {negl}(n)\) statistical distance of \(\mathcal {D}_{Y,s}\), where \(Y=\gcd (\mathbf {z})\varLambda + \sum _i z_{i} \mathbf {c}_i\) and \(s = \sqrt{\sum _{i} (z_{i} s_{i})^{2}}\). In particular, if \(\gcd (\mathbf {z}) = 1\) and \(\sum _i z_i \mathbf {c}_i \in \varLambda \), then \(\sum z_i \mathcal {D}_{\varLambda +\mathbf {c}_i,s_i}\) is within \(\mathrm {negl}(n)\) statistical distance of \(\mathcal {D}_{\varLambda ,s}\).

Proof

Apply the second part of Theorem 4 with the second hypothesis, and use the fact that \((1+\mathrm {negl}(n))^{\mathrm {poly}(n)}\) is \(1+\mathrm {negl}(n)\).    \(\square \)

Theorem 4.13 from [BF11] is identical to Corollary 6, except it assumes that all the \(s_{i}\) equal some \(s \ge \Vert {\mathbf {z}}\Vert \cdot \eta _{\varepsilon }(\varLambda )\). This also implies the second hypothesis from the second part of Theorem 4, because \(\Vert {\mathbf {z}}\Vert \ge \sqrt{z_{i^{*}}^{2} + \max _{i \ne i^{*}} z_{i}^{2}}\).

Corollary 7

([BF11, Lemma 4.12]). Let \(\varLambda _1 + \mathbf {t}_{1},\varLambda _2 + \mathbf {t}_{2}\) be cosets of full-rank integer lattices, and let \(s_1,s_2 > 0\) be such that \((s_1^{-2} + s_2^{-2})^{-1/2} \ge \eta _\varepsilon (\varLambda _1\cap \varLambda _2)\) for some \(\varepsilon = \mathrm {negl}(n)\). Then \(\mathcal {D}_{\varLambda _1+\mathbf {t}_1,s_1} + \mathcal {D}_{\varLambda _2+\mathbf {t}_2,s_2}\) is within \(\mathrm {negl}(n)\) statistical distance of \(\mathcal {D}_{\varLambda + \mathbf {t},s}\), where \(\varLambda =\varLambda _1+\varLambda _2\), \(\mathbf {t}=\mathbf {t}_1+\mathbf {t}_2\), and \(s^2=s_1^2+s_2^2\).

Proof

The intersection of full-rank integer lattices always has full rank. So, apply the second part of Theorem 4 with the second hypothesis, for \(m=2\) and \(\mathbf {z} = (1,1)^{t}\).    \(\square \)

Corollary 8

([Reg05, Claim 3.9]). Let \(\varepsilon \in (0,1/2)\), let \(\varLambda + \mathbf {u} \subset \mathbb {R}^{n}\) be a coset of a full-rank lattice, and let \(r,s>0\) be such that \((r^{-2}+s^{-2})^{-1/2} \ge \eta _{\varepsilon }(\varLambda )\). Then \(\mathcal {D}_{\varLambda +\mathbf {u}, r} + \mathcal {D}_{s}\) is within statistical distance \(4\varepsilon \) of \(\mathcal {D}_{\sqrt{r^{2}+s^{2}}}\).

Proof

The proof of Corollary 7 also works for any full-rank lattices \(\varLambda _1 \subseteq \varLambda _2\). The corollary follows by taking \(\varLambda _{1} = \varLambda \) and \(\varLambda _2 = \lim _{d \rightarrow \infty } d^{-1} \varLambda = \mathbb {R}^n\).    \(\square \)

5 LWE Self-reduction

The LWE problem [Reg05] is one of the foundations of lattice-based cryptography.

Definition 3 (LWE distribution)

Fix some parameters \(n,q \in \mathbb {Z}^{+}\) and a distribution \(\chi \) over \(\mathbb {Z}\). The LWE distribution for a secret \(\mathbf {s} \in \mathbb {Z}^n_q\) is

$$ \mathcal {L}_{\mathbf {s}} =\!\! \llbracket {(\mathbf {a}, \mathbf {s}^t \mathbf {a} + e \bmod q) \mid \mathbf {a} \leftarrow \mathcal {U}(\mathbb {Z}_{q}^{n}), e \leftarrow \mathcal {X}}\rrbracket . $$

Given m samples \((\mathbf {a}_{i}, b_{i} = \mathbf {s}^{t} \mathbf {a}_{i} + e_{i} \bmod q)\) from \(\mathcal {L}_{\mathbf {s}}\), we often group them as \((\mathbf {A}, \mathbf {b}^{t} = \mathbf {s}^{t} \mathbf {A} + \mathbf {e}^{t})\), where the \(\mathbf {a}_{i}\) are the columns of \(\mathbf {A} \in \mathbb {Z}_{q}^{n \times m}\) and the \(b_{i}, e_{i}\) are respectively the corresponding entries of \(\mathbf {b} \in \mathbb {Z}_{q}^{m}, \mathbf {e} \in \mathbb {Z}^{m}\).

While LWE was originally also defined for continuous error distributions (in particular, the Gaussian distribution \(\mathcal {D}_{s}\)), we restrict the definition to discrete distributions (over \(\mathbb {Z}\)), since discrete distributions are the focus of this work, and are much more widely used in cryptography. We refer to continuous error distributions only in informal discussion.

Definition 4 (LWE Problem)

The search problem \(\text {S-LWE}_{n,q,\chi ,m}\) is to recover \(\mathbf {s}\) given m independent samples drawn from \(\mathcal {L}_{\mathbf {s}}\), where \(\mathbf {s} \leftarrow \mathcal {U}(\mathbb {Z}_q^n)\). The decision problem \(\text {D-LWE}_{n,q,\chi ,m}\) is to distinguish m independent samples drawn from \(\mathcal {L}_{\mathbf {s}}\), where \(\mathbf {s} \leftarrow \mathcal {U}(\mathbb {Z}_q^{n})\), from m independent and uniformly random samples from \(\mathcal {U}(\mathbb {Z}_{q}^{n+1})\).

For appropriate parameters, very similar hardness results are known for search and decision \(\text {LWE}_{n,q,\chi ,m}\) with \(\chi \in \{{\mathcal {D}_{s}, \lfloor {\mathcal {D}_{s}}\rceil ,\mathcal {D}_{\mathbb {Z},s}}\}\), i.e., continuous, rounded, or discrete Gaussian error. Notably, the theoretical and empirical hardness of the problem depends mainly on \(n \log q\) and the “error rate” \(\alpha = s/q\), and less on m. This weak dependence on m is consistent with the fact that there is a self-reduction that, given just \(m=O(n \log q)\) LWE samples from \(\mathcal {L}_{\mathbf {s}}\) with (continuous, rounded, or discrete) Gaussian error of parameter s, generates any polynomial number of samples from a distribution statistically close to \(\mathcal {L}_{\mathbf {s}}\) with (continuous, rounded, or discrete) Gaussian error of parameter \(O(s \sqrt{m}) \cdot \eta _{\varepsilon }(\mathbb {Z})\), for arbitrary negligible \(\varepsilon \). Such self-reductions were described in [GPV08, ACPS09, Pei10] (the latter for discrete Gaussian error), based on the observation that they are just special cases of Regev’s core reduction [Reg05] from Bounded Distance Decoding (BDD) to LWE, and that LWE is an average-case BDD variant.

The prior LWE self-reduction for discrete Gaussian error, however, contains an unnatural layer of indirection: it first generates new LWE samples having continuous error, then randomly rounds, which by a convolution theorem yields discrete Gaussian error (up to negligible statistical distance). Below we instead give a direct reduction to LWE with discrete Gaussian error, which is more natural and slightly tighter, since it avoids the additional rounding that increases the error width somewhat.

Theorem 5

Let \(\mathbf {A} \in \mathbb {Z}_{q}^{n \times m}\) be primitive, let \(\mathbf {b}^{t} = \mathbf {s}^{t} \mathbf {A} + \mathbf {e}^{t} \bmod q\) for some \(\mathbf {e} \in \mathbb {Z}^{m}\), and let \(r, \tilde{r} > 0\) be such that \(\eta _{\varepsilon }(\varLambda ^{\perp }_{q}(\mathbf {A})) \le ((1/r)^{2} + (\Vert {\mathbf {e}}\Vert /\tilde{r})^{2})^{-1/2} \le r\) for some negligible \(\varepsilon \). Then the distribution

$$ \llbracket {(\mathbf {a} = \mathbf {A} \mathbf {x}, b = \mathbf {b}^t \mathbf {x} + \tilde{e}) \mid \mathbf {x} \leftarrow \mathcal {D}_{\mathbb {Z}^m, r} \, , \, \tilde{e} \leftarrow \mathcal {D}_{\mathbb {Z},\tilde{r}}}\rrbracket $$

is within negligible statistical distance of \(\mathcal {L}_{\mathbf {s}}\) with error \(\chi = \mathcal {D}_{\mathbb {Z}, t}\) where \(t^{2} = (r \Vert {\mathbf {e}}\Vert )^{2} + \tilde{r}^{2}\).

Theorem 5 is the core of the self-reduction. A full reduction between proper LWE problems follows from the fact that a uniformly random matrix \(\mathbf {A} \in \mathbb {Z}_{q}^{n \times m}\) is primitive with overwhelming probability for sufficiently large \(m \gg n\), and by choosing r and \(\tilde{r}\) appropriately. More specifically, it is known [GPV08, MP12] that for appropriate parameters, the smoothing parameter of \(\varLambda ^{\perp }_{q}(\mathbf {A})\) is small with very high probability over the choice of \(\mathbf {A}\). For example, [MP12, Lemma 2.4] implies that when \(m \ge C n \log q\) for any constant \(C > 1\) and \(\varepsilon \approx \varepsilon '\), we have \(\eta _{\varepsilon }(\varLambda ^{\perp }_{q}(\mathbf {A})) \le 2 \eta _{\varepsilon '}(\mathbb {Z}) \le 2 \sqrt{\ln (2(1+1/\varepsilon '))/\pi }\) except with negligible probability. So, we may choose \(r = O(\sqrt{\log (1/\varepsilon ')})\) for some negligible \(\varepsilon '\) and \(\tilde{r} = r \Vert {\mathbf {e}}\Vert \) to satisfy the conditions of Theorem 5 with high probability, and the resulting error distribution has parameter \(t = \sqrt{2}r \Vert {\mathbf {e}}\Vert \), which can be bounded with high probability for any typical LWE error distribution. Finally, there is the subtlety that in the actual LWE problem, the error distribution should be fixed and known, which is not quite the case here since \(\Vert {\mathbf {e}}\Vert \) is secret but bounded from above. This can be handled as in [Reg05] by adding different geometrically increasing amounts of extra error. We omit the details, which are standard.

Proof

(of Theorem 5). Because \(\mathbf {A}\) is primitive, for any \(\mathbf {a} \in \mathbb {Z}_q^n\) there exists an \(\mathbf {x}^* \in \mathbb {Z}^m\) such that \(\mathbf {A} \mathbf {x}^* = \mathbf {a}\), and the probability that \(\mathbf {A} \mathbf {x} = \mathbf {a}\) is proportional to \(\rho _{r}(\mathbf {x}^* + \varLambda _q^{\perp }(\mathbf {A}))\). Because \(\eta _{\varepsilon }(\varLambda _q^{\perp }(\mathbf {A})) \le r\), for each \(\mathbf {a}\) this probability is the same (up to \(\approx _{\varepsilon }\)) by Lemma 3, and thus the distribution of \(\mathbf {A} \mathbf {x}\) is within negligible statistical distance of uniform over \(\mathbb {Z}_{q}^{n}\).

Next, conditioning on the event \(\mathbf {A} \mathbf {x} = \mathbf {a}\), the conditional distribution of \(\mathbf {x}\) is the discrete Gaussian \(\mathcal {D}_{\mathbf {x}^* + \varLambda _q^{\perp }(\mathbf {A}), r}\). Because \(b = (\mathbf {s}^{t} \mathbf {A} + \mathbf {e}^{t}) \mathbf {x} + \tilde{e} = \mathbf {s}^{t} \mathbf {a} + (\mathbf {e}^{t} \mathbf {x} + \tilde{e})\), it just remains to analyze the distribution of \(\mathbf {e}^t \mathbf {x} + \tilde{e}\). By Lemma 8 below with \(\varLambda = \varLambda ^{\perp }_{q}(\mathbf {A})\) and \(\varLambda _{1} = \mathbb {Z}\), the distribution \(\langle {\mathbf {e}, \mathcal {D}_{\mathbf {x}^* + \varLambda _q^{\perp }(\mathbf {A}),r}}\rangle + \mathcal {D}_{\tilde{r}}\) is within negligible statistical distance of \(\mathcal {D}_{\mathbb {Z}, t}\), as desired.    \(\square \)

We now prove (a more general version of) the core statistical lemma needed by Theorem 5, using Theorem 1. A similar lemma in which \(\varLambda _1\) is taken to be \(\mathbb {R}= \lim _{d \rightarrow \infty } d^{-1} \mathbb {Z}\) can be proven using Corollary 8; this yields an LWE self-reduction for continuous Gaussian error (as claimed in prior works).

Lemma 8

Let \(\mathbf {e} \in \mathbb {R}^m\), \(\varLambda + \mathbf {x} \subset \mathbb {R}^{m}\) be a coset of a full-rank lattice, and \(\varLambda _1 \subset \mathbb {R}\) be a lattice such that \(\langle {\mathbf {e}, \varLambda }\rangle \subseteq \varLambda _1\). Also let \(r, \tilde{r}, \varepsilon > 0\) be such that \(\eta _{\varepsilon }(\varLambda ) \le s := ((1/r)^{2} + (\Vert {\mathbf {e}}\Vert /\tilde{r})^{2})^{-1/2}\). Then

$$ \varDelta _{\textsc {ml}}(\langle {\mathbf {e}, \mathcal {D}_{\varLambda + \mathbf {x},r}}\rangle + \mathcal {D}_{\varLambda _1, \tilde{r}} \, , \, \mathcal {D}_{\varLambda _{1} + \langle {\mathbf {e}, \mathbf {x}}\rangle , t}) \le \log \frac{1+\varepsilon }{1-\varepsilon }, $$

where \(t^2 = (r \Vert {\mathbf {e}}\Vert )^2 + \tilde{r}^2\).

Proof

First observe that

$$ \langle {\mathbf {e}, \mathcal {D}_{\varLambda +\mathbf {x}, r}}\rangle + \mathcal {D}_{\varLambda _1, \tilde{r}} = [\mathbf {e}^t \mid 1] \cdot \mathcal {D}_{\varLambda \times \varLambda _1 + (\mathbf {x},0),\mathbf {S}} $$

where . So, by applying Theorem 1 (whose hypotheses we verify below), we get that the above distribution is within the desired ML-distance of \(\mathcal {D}_{\varLambda _{1} + \langle {\mathbf {e}, \mathbf {x}}\rangle , [r \mathbf {e}^{t} \mid \tilde{r}]}\), where \(\mathbf {r}^{t} = [r \mathbf {e}^{t} \mid \tilde{r}]\) is a square root of \(\mathbf {r}^{t} \mathbf {r} = (r \Vert {\mathbf {e}}\Vert )^{2} + \tilde{r}^{2} = t^{2}\), as desired.

To apply Theorem 1, we first need to show that

$$ K = \ker ([\mathbf {e}^t \mid 1]) = \{ (\mathbf {v},-\langle {\mathbf {e},\mathbf {v}}\rangle ) \mid \mathbf {v} \in \mathbb {R}^m \}$$

is a \((\varLambda \times \varLambda _1)\)-subspace. Observe that \(\varLambda ' = K \cap (\varLambda \times \varLambda _{1})\) is exactly the set of vectors \((\mathbf {v},v)\) where \(\mathbf {v} \in \varLambda \) and \(v = -\langle {\mathbf {e}, \mathbf {v}}\rangle \in \varLambda _1\), i.e., the image \(\varLambda \) under the injective linear transformation \(\mathbf {T}(\mathbf {v}) = (\mathbf {v},-\langle {\mathbf {e},\mathbf {v}}\rangle )\). So, because \(\varLambda \) is full rank, \(\mathrm {span}(\varLambda ') = K\), as needed.

Finally, we show that \(s \mathbf {T} \le \mathbf {S}\), which by hypothesis and Lemma 2 implies that \(\eta _{\varepsilon }(\mathbf {T} \cdot \varLambda ) \le s \mathbf {T} \le \mathbf {S}\), as desired. Equivalently, we need to show that the matrix

$$ \mathbf {R} = \mathbf {S} \mathbf {S}^t - s^2 \mathbf {T} \mathbf {T}^t = \begin{pmatrix} (r^2 - s^2) \mathbf {I}_m &{} s^2 \mathbf {e} \\ s^2 \mathbf {e}^{t} &{} \tilde{r}^2 - s^2 e^2 \end{pmatrix} $$

is positive semidefinite, where \(e = \Vert {\mathbf {e}}\Vert \). If \(r^{2}=s^{2}\) then \(\mathbf {e} = \mathbf {0}\) and \(\mathbf {R}\) is positive semidefinite by inspection, so from now on assume that \(r^{2} > s^{2}\). Sylvester’s criterion says that a symmetric matrix is positive semidefinite if (and only if) all its principal minors are nonnegative.Footnote 10 First, every principal minor of \(\mathbf {R}\) obtained by removing the last row and column (and possibly others) is \(\det ((r^{2}-s^{2}) \mathbf {I}_{k}) > 0\) for some k. Now consider a square submatrix of \(\mathbf {R}\) wherein the last row and column have not been removed; such a matrix has the form

$$ \overline{\mathbf {R}} = \begin{pmatrix} (r^{2}-s^{2}) \mathbf {I}_{k} &{} s^{2} \overline{\mathbf {e}} \\ s^{2} \overline{\mathbf {e}}^{t} &{} \tilde{r}^{2}-s^{2}e^{2} \end{pmatrix} , $$

where \(\overline{\mathbf {e}}\) is some subvector of \(\mathbf {e}\), hence \(\Vert {\overline{\mathbf {e}}}\Vert ^{2} \le e^{2}\). Multiplying the last column by \(r^{2}-s^{2} > 0\) and then subtracting from the last column the product of the first k columns with \(s^{2} \cdot \overline{\mathbf {e}}\), we obtain a lower-triangular matrix whose first k diagonal entries are \(r^{2}-s^{2} > 0\), and whose last diagonal entry is

$$ (\tilde{r}^{2} - s^{2} e^{2})(r^{2}-s^{2}) - s^{4} \Vert {\overline{\mathbf {e}}}\Vert ^{2} \ge \tilde{r}^{2} r^{2} - \tilde{r}^{2} s^{2} - e^{2} r^{2} s^{2} = 0, $$

where the equality follows from clearing denominators in the hypothesis \((1/s)^{2} = (1/r)^{2} + (e/\tilde{r})^{2}\). So, every principal minor of \(\mathbf {R}\) is nonnegative, as desired.    \(\square \)

6 Subgaussian Matrices

The concrete parameters for optimized SIS- and LWE-based trapdoor cryptosystems following [MP12] depend on the largest singular value of a subgaussian random matrix with independent rows, columns, or entries, which serves as the trapdoor. The cryptosystem designer will typically need to rely on a singular value concentration bound to determine Gaussian parameters, set norm thresholds for signatures, estimate concrete security, etc. The current literature does not provide sufficiently precise concentration bounds for this purpose. For example, commonly cited bounds contains non-explicit hidden constant factors, e.g., [Ver12, Theorem 5.39] and [Ver18, Theorems 4.4.5 and 4.6.1].

In Theorem 6 (whose proof is in the full version) we present a singular value concentration bound with explicit constants, for random matrices having independent subgaussian rows. We also report on experiments to determine the singular values for commonly used distributions in lattice cryptography. Throughout this section, we use \(\sigma \) to denote a distribution’s standard deviation and \(m>n>0\) for the dimensions of a random matrix \(\mathbf {R} \in \mathbb {R}^{m \times n}\) following some particular distribution. We call a random vector \(\mathbf {x} \in \mathbb {R}^n\) \(\sigma \)-isotropic if \(\mathbb E[\mathbf {x} \mathbf {x}^t] = \sigma ^2\mathbf {I}_n\).

Theorem 6

Let \(\mathbf {R} \in \mathbb {R}^{m \times n}\) be a random matrix whose rows \(\mathbf {r}_i\) are independent, identically distributed, zero-mean, \(\sigma \)-isotropic, and subgaussian with parameter \(s>0\). Then for any \(t\ge 0\), with probability at least \(1 - 2e^{-t^2}\) we have

$$\sigma (\sqrt{m} - C(s^2/\sigma ^2) (\sqrt{n}+t)) \le s_n(\mathbf {R}) \le s_1(\mathbf {R}) \le \sigma (\sqrt{m} + C(s^2/\sigma ^2)(\sqrt{n} + t)), $$

where \(C = 8e^{1 + 2/e}\sqrt{\ln 9}/\sqrt{\pi } < 38\).

Comparison. There are two commonly cited concentration bounds for the singular values of subgaussian matrices. The first is for a random matrix with independent entries.

Theorem 7

([Ver18, Theorem 4.4.5]). Let \(\mathbf {R} \in \mathbb {R}^{m \times n}\) be a random matrix with entries drawn independently from a subgaussian distribution with parameter \(s>0\). Then, there exists some universal constant \(C>0\) such that for any \(t \ge 0\), with probability at least \(1-2e^{-t^2}\) we have

$$\begin{aligned} s_1(\mathbf {R}) \le C\cdot s(\sqrt{m} + \sqrt{n} + t). \end{aligned}$$

The second theorem is for a random matrix with independent subgaussian and isotropic rows.

Theorem 8

([Ver18, Theorem 4.6.1]). Let \(\mathbf {R} \in \mathbb {R}^{m \times n}\) be a random matrix whose rows \(\mathbf {a}_i\) are independent, identically distributed, zero-mean, 1-isotropic, and subgaussian with parameter \(s>0\). Then there is a universal constant \(C > 0\) such that for any \(t\ge 0\), with probability at least \(1 - 2e^{-t^2}\) we have

$$\sqrt{m} - Cs^2(\sqrt{n}+t) \le s_n(\mathbf {R}) \le s_1(\mathbf {R}) \le \sqrt{m} + Cs^2(\sqrt{n} + t). $$

We note that the above theorem is normalized to \(\sigma = 1\). Our Theorem 6 is a more explicit version of this theorem for arbitrary \(\sigma \), which scales in the appropriate way in \(\sigma \), since scaling a subgaussian distribution simply scales its parameter.

Fig. 1.
figure 1

Data from fifty random matrices of dimension \(6144 \times 512\) for each distribution \(\mathcal X\). The average largest and smallest singular values are respectively denoted \(\bar{s}_{1}\) and \(\bar{s}_{n}\), and we recorded the sample variance for each distribution’s singular values. The third column is the expected singular value using each distribution’s calculated \(C_{\mathcal X}\): \(1/2\pi \), \(1/2\pi \), and \(1/4\pi \) for discrete/continuous gaussians, \(\mathcal U\{-1, 1\}\), and \(\mathcal P\) respectively.

6.1 Experiments

Here we present empirical data on the singular values of random matrices with independent entries drawn from commonly used distributions in lattice-based cryptography. These distributions are the continuous Gaussian, the discrete Gaussian over \(\mathbb {Z}\), the uniform distribution over \(\{-1,1\}\) (denoted as \(\mathcal U\{-1, 1\}\)), and the distribution given by choosing 0 with probability 1/2 and \(\pm 1\) each with probability 1/4, which we denote \(\mathcal P\).

First Experiment. For each distribution, we sampled fifty m-by-n (where \(m = 6144\) by \(n = 512\)) random matrices and measured their singular values, and assumed the singular values were approximately

$$\begin{aligned}&s_1 \approx \sigma \left( \sqrt{m} + C_{\mathcal X}(s/\sigma )^2\sqrt{n}\right) \\&s_n \approx \sigma \left( \sqrt{m} - C_{\mathcal X}(s/\sigma )^2\sqrt{n}\right) \end{aligned}$$

where \(C_{\mathcal X}\) is a small constant dependent on the distribution \(\mathcal X\). The results are given in Fig. 1. We observed \(C_{\mathcal X}(s/\sigma )^2 \approx 1\) for each distribution.

Continuous and Discrete Gaussians. The continuous Gaussian \(\mathcal {D}_{\sigma }\) is subgaussian with parameter \(\sigma \) since \(\mathbb E[e^{2\pi t \mathcal X}] = e^{\pi t^2 \sigma ^2}\) where \(\mathcal X \sim \mathcal {D}_{\sigma }\). Further, the discrete Gaussian \(\mathcal D_{\mathbb {Z}, s}\) is subgaussian with parameter s, [MP12, Lemma 2.8]. Assuming that the discrete Gaussian is smooth, then one can expect the standard deviation of \(\mathcal {D}_{\mathbb Z, s}\) to be close to the standard deviation of the continuous Gaussian it approximates, \(s/\sqrt{2\pi }\). This implies the ratio between the subgaussian parameter and the standard deviation of (discrete) gaussians is \(\sqrt{2\pi }\). Under this assumption on the discrete Gaussian’s standard deviation, we observed \(C_{\text {Gaussian}} = 1/2\pi \).

Uniform over \(\{-1,1\}\). Here \(\sigma = 1\) and \(\mathbb E[e^{2\pi t X}] = \cosh {2\pi t} \le e^{2\pi ^2 t^2}\), or the subgaussian parameter is at most \(\sqrt{2 \pi }\). We observed \(C_{\mathcal U\{-1, 1\}} = {1}/{2\pi }\) in our experiment.

The Distribution \(\mathcal P\). By nearly the same steps as the previous distribution, \(\mathcal P\) is subgaussian with parameter \(\sqrt{2\pi }\) and \(\sigma ={1}/{\sqrt{2}}\). Then, we observed \(C_{\mathcal P} = {1}/{4\pi }\).

Fig. 2.
figure 2

Here we compare the measured largest singular value with the expectation under our heurstic, with entries from the distribution \(\mathcal X = \{-1, 1\}\). For each \(n = 50, 100, 200, 500, 1000\), the experiment sampled \(N=50\) random 32n-by-n matrices and averaged their largest singular value. The measured sample variances were .099, .064, .050, .048, .031 for \(n = 50, 100, 200, 500, 1000\), respectively. Also of note, the measured constant \(C_{\mathcal X}\) approached \(1/2\pi \) from below as n increased (\(.92/2\pi , .96/2\pi , .97/2\pi , .99/2\pi , .99/2\pi \) for \(n = 50, 100, 200, 500, 1000\)). (Color figure online)

Second Experiment. As a second experiment, we sampled \(\mathcal U\{-1,1\}^{32n \times n}\) and averaged its maximum singular value over 50 samples. We varied \(n = 50\), 100, 200, 500, 1000 and plotted the results in Fig. 2 (red squares) graphed with the expected largest singular value (dashed blue line). We remark that we saw the same behavior for all four distributions when we varied the dimension.

6.2 Applications

Here we show how the updated singular value estimates from the previous subsection impact concrete security of lattice trapdoor schemes. As an example, we use the [MP12] trapdoor scheme with entries drawn independently from \(\mathcal P\). That is, we consider the SIS trapdoor scheme based on \(\mathbf {A} = [\bar{\mathbf {A}}| \mathbf {G} - \bar{\mathbf {A}}\mathbf {R}] \in \mathbb Z_q^{n \times m}\) where \(\mathbf {R} \leftarrow \mathcal P^{(m - n\log q)\times n\log q}\) is a subgaussian matrix serving as the trapdoorFootnote 11, \(\mathbf {G} = [\mathbf {I}_n | 2\mathbf {I}_n| \dots | 2^{\log q -1}\mathbf {I}_n]\) is the gadget matrix, and \(\bar{\mathbf {A}}\) is a truly random matrix. Further, let \(s>0\) be the width of the discrete Gaussian that we are sampling over. This \(s>0\) scales linearly with \(s_1(\mathbf {R})\)Footnote 12. Since the singular values of \(\mathbf {R}\) scale with \(\sigma = 1/\sqrt{2}\), the concrete security of the underlying SIS problem increases compared to assuming the largest singular value of \(\mathbf {R}\) scales with the subgaussian parameter, \(s=1\). See Fig. 3 for the difference in a commonly-used parameter regime.

In order to estimate security, we followed [APS15, ACD+18] by estimating the time-complexity of the BKZ algorithm [SE94] using sieving as its SVP oracleFootnote 13. BKZ is expected to return a vector of length \(\delta ^{2n}\text {det}(\varLambda )^{1/2n}\) for a lattice, \(\varLambda \), of dimension 2n. Also, Minkowski’s theorem tells us a short enough lattice vector exists when we only use 2n columns of \(\mathbf {A}\). In other words, breaking the trapdoor corresponds to finding a short vector in \(\varLambda _q^\perp (\mathbf {A}_{2n}) = \{\mathbf {z} \in \mathbb {Z}^{2n} | \mathbf {A}_{2n}\mathbf {z} = \mathbf {0} \in \mathbb {Z}_q^n\}\) where \(\mathbf {A}_{2n}\) is the matrix formed by the first 2n columns of \(\mathbf {A}\).

Fig. 3.
figure 3

The change in concrete security of the underlying SIS problem in MP12 when the trapdoor is drawn from \(\mathcal P^{(m - n\log q )\times n\log q}\). We give the smallest BKZ block size k achieving the \(\delta \) needed to find a vector of length \(s\sqrt{m}\) in (a subspace of) the lattice \(\varLambda ^\perp _q(\mathbf {A})\).

We found the smallest block size k achieving the needed \(\delta \) satisfying \(s\sqrt{m} = \delta ^{2n}\text {det}(\mathbf {A}_{2n})^{\frac{1}{2n}} = \delta ^{2n} \sqrt{q}\). Finally, we used the heuristic \(\delta \approx (\frac{k}{2\pi e}(\pi k)^{1/k})^{\frac{1}{2(k-1)}}\) to determine the relationship between k and \(\delta \), and we set the total time complexity of BKZ with block-size k, dimension 2n as \(8\cdot (2n) \cdot \text {time}(\text {SVP}) = 8\cdot (2n)\cdot 2^{.292k+16.4}\) [Che13, APS15]Footnote 14.