1 Introduction

Let \(D_1\) and \(D_2\) be two non-vanishing probability distributions over a common measurable support X. Let \(a \in (1,+\infty )\). The Rényi divergence [33, 35] (RD for short) \(R_a(D_1 \Vert D_2)\) of order a between \(D_1\) and \(D_2\) is defined as the (\((a-1)\)th root of the) expected value of \((D_1(x)/D_2(x))^{a-1}\) over the randomness of x sampled from \(D_1\). For notational convenience, our definition of the RD is the exponential of the classical definition [35]. The RD is an alternative to the statistical distance (SD for short) \(\Delta (D_1,D_2) = \frac{1}{2}\sum _{x \in X} |D_1(x) - D_2(x)|\) as measure of distribution closeness, where we replace the difference in SD, by the ratio in RD. RD enjoys several properties that are analogous of those enjoyed by SD, where the additive property of SD is replaced by a multiplicative property in RD (see Sect. 2.3).

SD is ubiquitous in cryptographic security proofs. One of its most useful properties is the so-called probability preservation property: For any measurable event \(E \subseteq X\), we have \(D_2(E) \ge D_1(E) - \Delta (D_1,D_2)\). RD enjoys the analogous property \(D_2(E) \ge D_1(E)^{\frac{a}{a-1}}/R_{a}(D_1 \Vert D_2)\). If the event E occurs with significant probability under \(D_1\), and if the SD (resp. RD) is small, then the event E also occurs with significant probability under \(D_2\). These properties are particularly handy when the success of an attacker against a given scheme can be described as an event whose probability should be non-negligible, e.g., the attacker outputs a new valid message-signature pair for a signature scheme. If the attacker succeeds with good probability in the real scheme based on distribution \(D_1\), then it also succeeds with good probability in the simulated scheme (of the security proof) based on distribution \(D_2\).

To make the SD probability preservation property useful, it must be ensured that the SD \(\Delta (D_1,D_2)\) is smaller than any \(D_1(E)\) that the security proof must handle. Typically, the quantity \(D_1(E)\) is assumed to be greater than some success probability lower bound \(\varepsilon \), which is of the order of \(1/\mathrm {poly}(\lambda )\) where \(\lambda \) refers to the security parameter, or even \(2^{-o(\lambda )}\) if the proof handles attackers whose success probabilities can be sub-exponentially small (which we believe better reflects practical objectives). As a result, the SD \(\Delta (D_1,D_2)\) must be \(<\varepsilon \) for the SD probability preservation property to be relevant. In contrast, the RD probability preservation property is non-vacuous when the RD \(R_{a}(D_1 \Vert D_2)\) is \(\le \mathrm {poly}(1/\varepsilon )\). In many cases, the latter seems less demanding than the former: in all our applications, the RD between \(D_1\) and \(D_2\) is small enough for the RD probability preservation property while their SD is too large for the SD probability preservation to be applicable (see Sect. 2.3). This explains the superiority of the RD in several of our applications.

Although RD seems more amenable than SD for search problems, it seems less so for distinguishing problems. A typical cryptographic example is semantic security of an encryption scheme. Semantic security requires an adversary \(\mathcal {A}\) to distinguish between the encryption distributions of two plaintext messages of its choosing: the distinguishing advantage \(\text{ Adv }_{\mathcal {A}}(D_1,D_2)\), defined as the difference of probabilities that \(\mathcal {A}\) outputs 1 using \(D_1\) or \(D_2\), should be sufficiently large. In security proofs, algorithm \(\mathcal {A}\) is often called on distributions \(D_1'\) and \(D_2'\) that are close to \(D_1\) and \(D_2\) (respectively). If the SDs between \(D_1\) and \(D_1'\) and \(D_2\) and \(D_2'\) are both bounded from above by \(\varepsilon \), then, by the SD probability preservation property (used twice), we have \(\text{ Adv }_{\mathcal {A}}(D_1',D_2') \ge \text{ Adv }_{\mathcal {A}}(D_1,D_2) - 2\varepsilon \). As a result, SD can be used for distinguishing problems in a similar fashion as for search problems. The multiplicativity of the RD probability preservation property seems to prevent RD from being applicable to distinguishing problems.

We replace the statistical distance by the Rényi divergence in several security proofs for lattice-based cryptographic primitives. Lattice-based cryptography is a relatively recent cryptographic paradigm in which cryptographic primitives are shown at least as secure as it is hard to solve standard problems over lattices (see the surveys [26, 29]). Security proofs in lattice-based cryptography involve different types of distributions, often over infinite sets, such as continuous Gaussian distributions and Gaussian distributions with lattice supports. The RD seems particularly well suited to quantify the closeness of Gaussian distributions. Consider for example two continuous Gaussian distributions over the reals, both with standard deviation 1, but one with center 0 and the other one with center c. Their SD is linear in c, so that c must remain extremely small for the SD probability preservation property to be useful. On the other hand, their RD of order \(a=2\) is bounded as \(\exp (O(c^2))\) so that the RD preservation property remains useful even for slightly growing c.

RD was first used in lattice-based cryptography in Lyubashevsky et al. [19], in the decision to search reduction for the Ring Learning With Errors problem (which serves as a security foundation for many asymptotically fast primitives). It was then exploited in Langlois et al. [21] to decrease the parameters of the [14] (approximation to) cryptographic multilinear maps. In the present work, we present a more extensive study of the power of RD in lattice-based cryptography, by showing several independent applications of RD. In some cases, it leads to security proofs allowing to take smaller parameters in the cryptographic schemes, hence leading to efficiency improvements. In other cases, this leads to alternative security proofs that are conceptually simpler.

Our applications of RD also include distinguishing problems. To circumvent the aforementioned a priori limitation of the RD probability preservation property for distinguishing problems, we propose an alternative approach that handles a class of distinguishing problems enjoying a special property that we call public sampleability. This public sampleability allows to estimate success probabilities via Hoeffding’s bound.

The applications we show in lattice-based cryptography are as follows:

  • Smaller storage requirement for the Fiat-Shamir BLISS signature scheme [11, 13, 27].

  • Smaller parameters in the dual-Regev encryption scheme from Gentry et al. [16].

  • Alternative proof that the Learning With Errors (LWE) problem with noise chosen uniformly in an interval is no easier than the Learning With Errors problem with Gaussian noise [12]. Our reduction does not require the latter problem to be hard, and it is hence marginally more general as it also applies to distributions with smaller noises. Further, our reduction preserves the LWE dimension n, and is hence tighter than the one from Refs. [12] (the latter degrades the LWE dimension by a constant factor).Footnote 1

  • Alternative proof that the Learning With Rounding (LWR) problem [7] is no easier than LWE. Our reduction is the first which preserves the dimension n without resorting to noise flooding (which significantly degrades the noise rate): the reductions from Refs. [3, 4] do not preserve the dimension, and the one from Banerjee et al. [7] preserves the dimension but makes use of noise flooding. Alwen et al. [3], the authors can get close to preserve the dimension up to a constant but at a price of larger polynomial modulos. Denoting by \(\mathbb {Z}_p\) the ring in which we perform rounding, our new reduction also gains extra factors of \(p\sqrt{\log n}\) and \(p n\sqrt{\log n}\) in the number of LWR samples handled, compared with Bogdanov et al. [4] and Alwen et al. [3], respectively.

We think RD is likely to have further applications in lattice-based cryptography, for both search and distinguishing problems.

Related Work The framework for using RD in distinguishing problems was used in Ling et al. [20], in the context of the k-LWE problem (a variant of LWE in which the attacker is given extra information). Pöppelmann et al. [27] used the Kullback–Leibler divergence (which is the RD of order \(a=1\)) to lower the storage requirement of BLISS scheme [11]. Asymptotically, using the Kullback–Leibler divergence rather than SD only leads to a constant factor improvement. Our approach allows bigger savings in the case where the number of signature queries is limited, as explained in Sect. 3.

Recently, Bogdanov et al. [4] adapted parts of (an earlier version of) our RD-based hardness proof for LWE with noise uniform in a small interval, to the LWR problem. In particular, they obtained a substantial improvement over the hardness results of Refs. [3, 7]. In this revised and extended version of our earlier conference paper [5], we show an alternative LWR hardness proof that improves on that of Bogdanov et al. [4], exploiting the equivalence of LWR to LWE with noise uniform in an interval; an equivalence was also established in Bogdanov et al. [4] but not used there to relate the hardness of LWE to that of LWR.

After the publication of earlier versions of this article, some of our results have been improved [34] and used in Libert et al. [18] in the context of dynamic group signatures and in Alkim et al. [1] to replace the LWE error distribution by a more efficiently samplable distribution.

Road-map In Sect. 2, we provide necessary background on lattice-based cryptography, and on the Rényi divergence. In Sect. 3, we use RD to improve lattice-based signature scheme parameters via more efficient Gaussian sampling. Section 4 contains the description of the framework in which we can use RD for distinguishing problems, which we apply to improve the parameters of the dual-Regev encryption scheme. In Sect. 5, we describe an alternative hardness proof for LWE with noise uniformly chosen in an interval. Section 6 shows an application of the previous section to give a new hardness proof for the LWR problem. Finally, Sect. 7 concludes with open problems.

Notation If x is a real number, we let \(\lfloor x \rceil \) denote a closest integer to x. The notation \(\ln \) refers to the natural logarithm and the notation \(\log \) refers to the base 2 logarithm. We define \({\mathbb {T}}=([0,1],+)\), where the addition operation is just modulo 1 operation. For an integer q, we let \({\mathbb {Z}}_q\) denote the ring of integers modulo q. We let \({\mathbb {T}}_q\) denote the group \({\mathbb {T}}_q = \{i/q \bmod 1: i \in {\mathbb {Z}}\} \subseteq {\mathbb {T}}\). Vectors are denoted in bold. If \(\mathbf {b}\) is a vector in \({\mathbb {R}}^d\), we let \(\Vert \mathbf {b}\Vert \) denote its Euclidean norm. By default, all our vectors are column vectors.

If D is a probability distribution, we let \(\mathrm {Supp}(D) = \{x: D(x) \ne 0\}\) denote its support. For a set X of finite weight, we let U(X) denote the uniform distribution on X. To ease notation, we let \(U_{\beta }\) denote the distribution \(U([-\beta ,\beta ])\) for a positive real \(\beta \). The statistical distance between two distributions \(D_1\) and \(D_2\) over a countable support X is \(\Delta (D_1, D_2)= \frac{1}{2} \sum _{x \in X} |D_1(x)-D_2(x)|\). This definition is extended in the natural way to continuous distributions. If \(f:X \rightarrow {\mathbb {R}}\) takes non-negative values, then for all countable \(Y \subseteq X\), we define \(f(Y) = \sum _{y\in Y} f(y) \in [0,+\infty ]\). For any vector \(\mathbf {c} \in {\mathbb {R}}^n\) and any real \(s>0\), the (spherical) Gaussian function with standard deviation parameter s and center \(\mathbf {c}\) is defined as follows: \(\forall \mathbf {x} \in {\mathbb {R}}^n, \rho _{s,\mathbf {c}}(\mathbf {x})= \exp ( -\pi \Vert \mathbf {x}-\mathbf {c} \Vert ^2/s^2)\). The Gaussian distribution is \(D_{s,\mathbf {c}} = \rho _{s,\mathbf {c}} / s^n\). When \(\mathbf {c}=\mathbf {0}\), we may omit the subscript \(\mathbf {c}\).

We use the usual Landau notations. A function \(f(\lambda )\) is said negligible if it is \(\lambda ^{-\omega (1)}\). A probability \(p(\lambda )\) is said overwhelming if it is \(1-\lambda ^{-\omega (1)}\).

The distinguishing advantage of an algorithm \(\mathcal {A}\) between two distributions \(D_0\) and \(D_1\) is defined as \(\mathrm {Adv}_{\mathcal {A}}(D_0,D_1) = |\Pr _{x\hookleftarrow D_0}[\mathcal {A}(x) = 1] - \Pr _{x\hookleftarrow D_1}[\mathcal {A}(x) = 1]|\), where the probabilities are taken over the randomness of the input x and the internal randomness of \(\mathcal {A}\). Algorithm \(\mathcal {A}\) is said to be an \((\varepsilon ,T)\)-distinguisher if it runs in time \(\le T\) and if \(\mathrm {Adv}_{\mathcal {A}}(D_0,D_1) \ge \varepsilon \).

We say a distribution \(\chi \) is B-bounded, for some positive real B, if its support be in the interval \([-B,B]\). In the case where \(\chi \) is over \(\mathbb {Z}_q\), we assume that \(B\le (q-1)/2\). A B-bounded distribution \(\chi \) is said to be balanced if \(\mathrm {Pr}[\chi \le 0] \ge 1/2\) and \(\mathrm {Pr}[\chi \ge 0] \ge 1/2\).

2 Preliminaries

We assume the reader is familiar with standard cryptographic notions, as well as with lattices and lattice-based cryptography. We refer to Refs. [26, 31] for introductions on the latter topic.

2.1 Lattices

A (full-rank) n-dimensional Euclidean lattice \(\varLambda \subseteq \mathbb {R}^n\) is the set of all integer linear combinations \(\sum _{i=1}^{n} x_{i} \mathbf {b}_{i}\) of some \(\mathbb {R}\)-basis \((\mathbf {b}_{i})_{1 \le i \le n}\) of \(\mathbb {R}^n\). In this setup, the tuple \((\mathbf {b}_i)_i\) is said to form a \({\mathbb {Z}}\)-basis of \(\varLambda \). For a lattice \(\varLambda \) and any \(i\le n\), the ith successive minimum \(\lambda _i(\varLambda )\) is the smallest radius r such that \(\varLambda \) contains i linearly independent vectors of norm at most r. The dual \(\varLambda ^*\) of a lattice \(\varLambda \) is defined as \(\varLambda ^* = \{\mathbf {y} \in {\mathbb {R}}^n: \mathbf {y}^t \varLambda \subseteq {\mathbb {Z}}^n\}\).

The (spherical) discrete Gaussian distribution over a lattice \(\varLambda \subseteq {\mathbb {R}}^n\), with standard deviation parameter \(s>0\) and center \(\mathbf {c}\) is defined as:

$$\begin{aligned} \forall \mathbf {x} \in \varLambda , D_{\varLambda , s, \mathbf {c}} = \frac{\rho _{s,\mathbf {c}}(\mathbf {x})}{\rho _{s,\mathbf {c}}(\varLambda )}. \end{aligned}$$

When the center is \(\mathbf {0}\), we omit the subscript \(\mathbf {c}\).

The smoothing parameter [25] of an n-dimensional lattice \(\varLambda \) with respect to \(\varepsilon > 0\), denoted by \(\eta _{\varepsilon }(\varLambda )\), is the smallest \(s>0\) such that \(\rho _{1/s}(\varLambda ^{*}\setminus \{0\}) \le \varepsilon \). We use the following properties.

Lemma 2.1

([25, Lemma 3.3]) Let \(\varLambda \) be an n-dimensional lattice and \(\varepsilon > 0\). Then

$$\begin{aligned} \eta _{\varepsilon }(\varLambda ) \le \sqrt{\frac{\ln (2n(1+1/\varepsilon ))}{\pi }} \cdot \lambda _n(\varLambda ). \end{aligned}$$

Lemma 2.2

(Adapted from [16, Lemma 5.3]) Let \(m, n \ge 1\) and q a prime integer, with \(m \ge 2n \ln q\). For \(\mathbf{A} \in {\mathbb {Z}}_q^{n \times m}\) we define \(\varLambda _\mathbf{A}^{\perp }\) as the lattice \(\{\mathbf {x} \in {\mathbb {Z}}^m: \mathbf{A} \mathbf {x} = \mathbf {0} \bmod q\}\). Then

$$\begin{aligned} \forall \varepsilon <1/2: \ \Pr _{\mathbf{A} \hookleftarrow U({\mathbb {Z}}_q^{n \times m})} \left[ \eta _{\varepsilon }(\varLambda _\mathbf{A}^\perp ) \ge 4\sqrt{\frac{\ln (4m / \varepsilon )}{\pi }} \right] \le q^{-n}. \end{aligned}$$

Lemma 2.3

(Adapted from [16, Cor. 2.8]) Let \(\varLambda ,\varLambda '\) be n-dimensional lattices with \(\varLambda ' \subseteq \varLambda \) and \(\varepsilon \in (0,1/2)\). Then for any \(\mathbf {c} \in {\mathbb {R}}^n\) and \(s \ge \eta _{\varepsilon }(\varLambda ')\) and any \(x \in \varLambda /\varLambda '\) we have

$$\begin{aligned} (D_{\varLambda ,s,\mathbf {c}} \bmod \varLambda ')(x) \in \left[ \frac{1-\varepsilon }{1+\varepsilon }, \frac{1+\varepsilon }{1-\varepsilon }\right] \cdot \frac{\det (\varLambda )}{\det (\varLambda ')}. \end{aligned}$$

2.2 The SIS, LWE, and LWR Problems

The Small Integer Solution (SIS) problem was introduced by Ajtai [2]. It serves as a security foundation for numerous cryptographic primitives, including, among many others, hash functions [2] and signatures [11, 16].

Definition 2.4

Let \(m \ge n \ge 1\) and \(q \ge 2\) be integers, and \(\beta \) a positive real. The \(\mathrm {SIS}_{n,m,q,\beta }\) problem is as follows: given \(\mathbf{A} \hookleftarrow U({\mathbb {Z}}_q^{n \times m})\), the goal is to find \(\mathbf {x} \in {\mathbb {Z}}^m\) such that \(\mathbf{A} \mathbf {x} = \mathbf {0} \bmod q\) and \(0 < \Vert \mathbf {x}\Vert \le \beta \).

The SIS problem was proven by Ajtai [2] to be at least as hard as some standard worst-case problems over Euclidean lattices, under specific parameter constraints. We refer to Gentry et al. [16] for an improved (and simplified) reduction.

The Learning With Errors (LWE) problem was introduced in 2005 by Regev [30, 32]. LWE is also extensively used as a security foundation, for encryption schemes [16, 32], fully homomorphic encryption schemes [8], and pseudorandom functions [3, 7], among many others. Its definition involves the following distribution. Let \(\chi \) be a distribution over \({\mathbb {T}}\)\(q \ge 2\)\(n\ge 1\) and \(\mathbf {s} \in {\mathbb {Z}}_q^n\). A sample from \(A_{\mathbf {s},\chi }\) is of the form \((\mathbf {a}, b) \in {\mathbb {Z}}_q^n \times {\mathbb {T}}\), with \(\mathbf {a} \hookleftarrow U({\mathbb {Z}}_q^n)\)\(b = \frac{1}{q} \langle \mathbf {a} , \mathbf {s} \rangle +e\) and \(e \hookleftarrow \chi \).

Definition 2.5

Let \(\chi \) be a distribution over \({\mathbb {T}}\)\(q \ge 2\), and \(m \ge n\ge 1\). The search variant \(\mathrm {sLWE}_{n,q,\chi ,m}\) of the \(\mathrm {LWE}\) problem is as follows: given m samples from \(A_{\mathbf {s},\chi }\) for some \(\mathbf {s} \in {\mathbb {Z}}_q^n\), the goal is to find \(\mathbf {s}\). The decision variant \(\mathrm {LWE}_{n,q,\chi ,m}\) consists in distinguishing between the distributions \((A_{\mathbf {s},\chi })^m\) and \(U({\mathbb {Z}}_q^n \times {\mathbb {T}})^m\), where \(\mathbf {s} \hookleftarrow U({\mathbb {Z}}_q^n)\).

Definition 2.6

The \(\mathrm {sbinLWE}_{n,q,\chi ,m}\) (resp. \(\mathrm {binLWE}_{n,q,\chi ,m}\)) for any error distribution \(\chi \) denotes the \(\mathrm {sLWE}_{n,q,\chi ,m}\) problem (resp. \(\mathrm {LWE}_{n,q,\chi ,m}\) problem) when the vector \(\mathbf {s}\) is uniformly sampled in \(\{0,1\}^n\).

Bogdanov et al. [4], the secret \(\mathbf {s}\) can be drawn from any distribution over \(\{0,1\}^n\) similar to what we defined above. It would be more consistent with the definition of \(\mathrm {sLWE}\) to let the secret \(\mathbf {s}\) be arbitrary, but it does not seem possible to prove equivalence via the random self reducibility property of \(\mathrm {LWE}\). A less direct reduction from worst-case \(\mathrm {sbinLWE}\) to uniform-secret \(\mathrm {sbinLWE}\) is as follows: worst-case \(\mathrm {sbinLWE}\) reduces to \(\mathrm {LWE}\), then Goldwasser et al. [15] and Brakerski et al. [6, Theorem 4.1] provide reductions from \(\mathrm {LWE}\) to \(\mathrm {binLWE}\), and finally [4] contains a reduction from \(\mathrm {binLWE}\) to uniform-secret \(\mathrm {sbinLWE}\). In any case, we will only use uniform-secret \(\mathrm {sbinLWE}\) so we stick to this variant in the present article. In some cases, it is convenient to use an error distribution \(\chi \) whose support is \({\mathbb {T}}_q\). In these cases, the definition of LWE is adapted such that \(U({\mathbb {Z}}_q^n \times {\mathbb {T}})\) is replaced by \(U({\mathbb {Z}}_q^n \times {\mathbb {T}}_q)\). Note also that for a fixed number of samples m, we can represent the LWE samples using matrices. The \(\mathbf {a}_i\)’s form the rows of a matrix \(\mathbf{A}\) uniform in \({\mathbb {Z}}_q^{m \times n}\), and the scalar product is represented by the product between \(\mathbf{A}\) and \(\mathbf {s}\).

Regev [32] gave a quantum reduction from standard worst-case problems over Euclidean lattices to \(\mathrm {sLWE}\) and LWE, under specific parameter constraints. Classical (but weaker) reductions have later been obtained (see [6, 28]). We will use the following sample-preserving search to decision reduction for LWE.

Theorem 2.7

(Adapted from [23, Proposition 4.10] If \(q \le \mathrm {poly}(m,n)\) is prime and the error distribution \(\chi \) has support in \({\mathbb {T}}_q\), then there exists a reduction from \(\mathrm {sLWE}_{n,q,\chi ,m}\) to \(\mathrm {LWE}_{n,q,\chi ,m}\) that is polynomial in n and m.

For integers \(p,q\ge 2\), the rounding function from \(\mathbb {Z}_q\) to \(\mathbb {Z}_p\) is defined by

$$\begin{aligned} \lfloor x\rceil _p = \lfloor (p/q) \bar{x}\rfloor \pmod {p}, \end{aligned}$$

where \(\bar{x} \in \mathbb {Z}\) is any integer congruent to x modulo q. This can also be extended componentwise to vectors and matrices.

For a secret vector \(\mathbf {s}\in \mathbb {Z}_q^n\), a sample \((\mathbf {a},b)\) from the LWR distribution \(B_{\mathbf {s}}\) over \(\mathbb {Z}_q^n\times \mathbb {Z}_p\) is obtained by choosing a vector \(\mathbf {a}\hookleftarrow U\left( \mathbb {Z}_q^n\right) \) and setting \(b=\lfloor \langle \mathbf {a},\mathbf {s}\rangle \rceil _p\).

Definition 2.8

The decision variant \(\mathrm {LWR}_{n,q,p,m}\) of \(\mathrm {LWR}\) consists in distinguishing between the distributions \(\left( B_{\mathbf {s}}\right) ^m\) and \(U(\mathbb {Z}_q^n \times {\mathbb {Z}}_p)^m\), where \(\mathbf {s} \hookleftarrow U({\mathbb {Z}}_q^n)\).

The LWR problem was introduced in Banerjee et al. [7] and used there and in subsequent works to construct pseudorandom functions (\(\mathrm {PRF}\)s) based on the hardness of LWE.

2.3 The Rényi Divergence

For any two discrete probability distributions P and Q such that \(\mathrm {Supp}(P) \subseteq \mathrm {Supp}(Q)\) and \(a \in (1,+\infty )\), we define the Rényi divergence of order a by

$$\begin{aligned} R_{a}(P \Vert Q) = \left( \sum _{x \in \mathrm {Supp}(P)} \frac{P(x)^{a}}{Q(x)^{a-1}}\right) ^{\frac{1}{a-1}}. \end{aligned}$$

We omit the a subscript when \(a=2\). We define the Rényi divergences of orders 1 and \(+\infty \) by

$$\begin{aligned} R_1(P \Vert Q) = \exp \left( \sum _{x \in \mathrm {Supp}(P)} P(x) \log \frac{P(x)}{Q(x)}\right) \ \text{ and } \ R_{\infty }(P \Vert Q) = \max _{x \in \mathrm {Supp}(P)} \frac{P(x)}{Q(x)}. \end{aligned}$$

The definitions are extended in the natural way to continuous distributions. The divergence \(R_1\) is the (exponential of) the Kullback–Leibler divergence.

For any fixed PQ, the function \(a \mapsto R_{a}(P\Vert Q) \in (0, +\infty ]\) is non-decreasing, continuous over \((1, +\infty )\), tends to \(R_{\infty }(P \Vert Q)\) when a grows to infinity, and if \(R_{a}(P\Vert Q)\) is finite for some a, then \(R_{a}(P \Vert Q)\) tends to \(R_1(P \Vert Q)\) when a tends to 1 (we refer to Van Erven et al. [35] for proofs). A direct consequence is that if \(P(x)/Q(x) \le c\) for all \(x \in \mathrm {Supp}(P)\) and for some constant c, then \(R_{a}(P \Vert Q) \le R_{\infty }(P \Vert Q) \le c\). In the same setup, we have \(\Delta (P,Q) \le c/2\).

The following properties can be considered the multiplicative analogues of those of the SD. We refer to Refs. [21, 35] for proofs.

Lemma 2.9

Let \(a \in [1, +\infty ]\). Let P and Q denote distributions with \(\mathrm {Supp}(P) \subseteq \mathrm {Supp}(Q)\). Then the following properties hold:

  • Log. Positivity \(R_{a}(P \Vert Q) \ge R_{a}(P \Vert P) = 1\).

  • Data Processing Inequality \(R_{a}(P^f \Vert Q^f) \le R_{a}(P \Vert Q)\) for any function f, where \(P^f\) (respectively, \(Q^f\)) denotes the distribution of f(y) induced by sampling \(y \hookleftarrow P\) (respectively, \(y \hookleftarrow Q\)).

  • Multiplicativity Assume P and Q are two distributions of a pair of random variables \((Y_1,Y_2)\). For \(i \in \{1,2\}\), let \(P_i\) (resp. \(Q_i\)) denote the marginal distribution of \(Y_i\) under P (resp. Q), and let \(P_{2|1}(\cdot |y_1)\) (resp. \(Q_{2|1}(\cdot |y_1)\)) denote the conditional distribution of \(Y_2\) given that \(Y_1=y_1\). Then we have:

    • \(R_{a}(P \Vert Q) = R_{a}(P_1 \Vert Q_1) \cdot R_{a}(P_2 \Vert Q_2)\) if \(Y_1\) and \(Y_2\) are independent for \(a\in [1,\infty ]\).

    •  \(R_{a}(P \Vert Q) \le R_{\infty }(P_1 \Vert Q_1) \cdot \max _{y_1 \in X} R_{a}(P_{2|1}(\cdot |y_1) \Vert Q_{2|1}(\cdot |y_1))\).

  • Probability Preservation Let \(E \subseteq \mathrm {Supp}(Q)\) be an arbitrary event. If \(a \in (1, +\infty )\), then \(Q(E) \ge P(E)^{\frac{a}{a-1}}/R_{a}(P \Vert Q)\). Further, we have

    $$\begin{aligned} Q(E) \ge P(E)/R_{\infty }(P \Vert Q). \end{aligned}$$

Let \(P_1,P_2,P_3\) be three distributions with \(\mathrm {Supp}(P_1) \subseteq \mathrm {Supp}(P_2) \subseteq \mathrm {Supp}(P_3)\). Then we have:

  • Weak Triangle Inequality

    $$\begin{aligned} R_{a}(P_1 \Vert P_3) \le \left\{ \begin{array}{rl} R_{a}(P_1 \Vert P_2) \cdot R_{\infty }(P_2 \Vert P_3), \\ R_{\infty }(P_1 \Vert P_2)^{\frac{a}{a-1}} \cdot R_{a}(P_2 \Vert P_3) &{} \ \text{ if } a \in (1,+\infty ). \end{array} \right. \end{aligned}$$

Getting back to the setup in which \(P(x)/Q(x) \le c\) for all \(x \in \mathrm {Supp}(P)\) and for some constant c, the RD probability preservation property above is relevant even for large c, whereas the analogous SD probability preservation property starts making sense only when \(c < 2\).

Pinsker’s inequality is the analogue of the probability preservation property for \(a=1\): for an arbitrary event \(E \subseteq \mathrm {Supp}(Q)\), we have \(Q(E) \ge P(E) - \sqrt{\ln R_1(P \Vert Q)/2}\) (see [27, Lemma 1] for a proof). Analogously to the statistical distance, this probability preservation property is useful for unlikely events E only if \(\ln R_1 (P \Vert Q)\) is very small. We refer to Sect. 3 for additional comments on this property.

2.4 Some RD Bounds

As we have already seen, if two distributions are close in a uniform sense, then their RD is small. We observe the following immediate consequence of Lemma 2.3, that allows replacing the SD with the RD in the context of smoothing arguments, in order to save on the required parameter s. In applications of Lemma 2.3, it is customary to use \(s \ge \eta _{\varepsilon }(\varLambda ')\) with \(\varepsilon \le 2^{-\lambda }\), in order to make the distribution \(D_{\varLambda /\varLambda ',s,\mathbf {c}} = D_{\varLambda ,s,\mathbf {c}} \bmod \varLambda '\) within SD \(2^{-\lambda }\) of the uniform distribution \(U(\varLambda /\varLambda ')\). This translates via Lemma 2.1 to use \(s =\varOmega (\sqrt{\lambda + \log n} \cdot \lambda _n(\varLambda '))\). If using an RD bound, the fact that \(R_{\infty }(D_{\varLambda /\varLambda ',s,\mathbf {c}} \Vert U_{\varLambda /\varLambda '}) = O(1)\) suffices for the application: one can take \(\varepsilon = O(1)\) in the corollary below, which translates to just \(s =\varOmega (\sqrt{\log n} \cdot \lambda _n(\varLambda '))\), saving a factor \(\varTheta (\sqrt{\lambda })\).

Lemma 2.10

Let \(\varLambda ,\varLambda '\) be n-dimensional lattices with \(\varLambda ' \subseteq \varLambda \) and \(\varepsilon \in (0,1/2)\). Let \(D_{\varLambda /\varLambda ',s,\mathbf {c}}\) for any \(\mathbf {c} \in {\mathbb {R}}^n\) denote the distribution on \(\varLambda /\varLambda '\) induced by sampling from \(D_{\varLambda ,s,\mathbf {c}}\) and reducing modulo \(\varLambda '\), and let \(U_{\varLambda /\varLambda '}\) denote the uniform distribution on \(\varLambda /\varLambda '\). Then for \(s \ge \eta _{\varepsilon }(\varLambda ')\), we have

$$\begin{aligned} R_{\infty }(D_{\varLambda /\varLambda ',s,\mathbf {c}}\Vert U_{\varLambda /\varLambda '}) \le \frac{1+\varepsilon }{1-\varepsilon }. \end{aligned}$$

In our hardness analysis of the LWR problem, the following Gaussian tail-cut lemma is used. It bounds the RD of order \(\infty \) between a continuous Gaussian \(D_{\alpha }\) and the same Gaussian with its tail cut to be B-bounded, that we denote by \(D'_{\alpha ,B}\). This allows, via an application of the RD probability preservation property, to conclude that any algorithm with success probability \(\varepsilon \) for m-sample search LWE with noise coordinates sampled from the tail-cut Gaussian \(D'_{\alpha ,B}\), is also an algorithm for LWE with noise coordinates sampled from the true Gaussian \(D_{\alpha }\) with success probability \(\ge \varepsilon /O(1)\), as long as \(B = \varOmega ( \alpha \cdot \sqrt{\log m})\). This improves upon the bound \(B = \varOmega (\alpha \cdot \sqrt{\log (m \cdot \varepsilon ^{-1})})\) that one obtains with an application of the SD to get the same conclusion.

Lemma 2.11

Let \(D'_{\alpha ,B}\) denote the continuous distribution on \({\mathbb {R}}\) obtained from \(D_{\alpha }\) by cutting its tail (by rejection sampling) to be B-bounded. Then we have

$$\begin{aligned} R_{\infty }(D'_{\alpha ,B} \Vert D_{\alpha }) \le \frac{1}{1-\exp (-\pi B^2/\alpha ^2)}. \end{aligned}$$

Furthermore, for m independent samples, we have \(R_{\infty }((D'_{\alpha ,B})^m \Vert (D_{\alpha })^m) \le \exp (1)\) if \(B \ge \alpha \cdot \sqrt{\ln (2m)/\pi }\).

Proof

For \(x \in {\mathbb {R}}\), we have \(D'_{\alpha ,B}(x) = c \cdot D_{\alpha }(x)\) for \(|x|<B\) and \(D'_{\alpha ,B}(x)=0\) otherwise, where c is a normalization constant such that \(\int ^{\infty }_{-\infty } D'_{\alpha ,B}(x) dx = 1\). It follows that \(c = \frac{1}{1-2Q_{\alpha }(B)}\), where \(Q_{\alpha }(B) = \int _{B}^{\infty } D_{\alpha }(x) dx\) is the tail probability \(\Pr _{z \hookleftarrow D_{\alpha }}[z \ge B]\). By a standard Gaussian tail bound, we have \(Q_{\alpha }(B) \le \frac{1}{2} \cdot \exp (-\pi B^2/\alpha ^2)\), and hence \(c \le \frac{1}{1-\exp (-\pi B^2/\alpha ^2)}\). The first part of the lemma now follows from the observation that \(R_{\infty }(D'_{\alpha ,B} \Vert D_{\alpha }) = \max _x \frac{D'_{\alpha ,B}(x)}{D_{\alpha }(x)} = c\). For the second part of the lemma, observe that \(c \le \exp (4Q_{\alpha }(B))\) if \(2Q_{\alpha }(B) \le 1/2\) using the inequality \(1-x \ge \exp (-2x)\) for \(0<x\le 1/2\). It follows by the multiplicativity property of RD that \(R_{\infty }((D'_{\alpha ,B})^m \Vert (D_{\alpha })^m) \le \exp (4m Q_{\alpha }(B)) \le \exp (1)\) if \(2Q_{\alpha }(B) \le \frac{1}{2m}\). The latter condition is satisfied by the above tail bound on \(Q_{\alpha }(B)\) if \(B \ge \alpha \cdot \sqrt{\ln (2m)/\pi }\). \(\square \)

3 Application to Lattice-Based Signature Schemes

In this section, we use the RD to improve the security proofs of the BLISS signature scheme [11], allowing to take smaller parameters for any fixed security level.

More precisely, we show that the use of RD in place of SD leads to significant savings in the required precision of integers sampled according to a discrete Gaussian distribution in the security analysis of lattice-based signature schemes. These savings consequently lower the precomputed table storage for sampling discrete Gaussians with the method described in Refs. [11, 27]. In Tables 1 and 2, we provide a numerical comparison of RD and SD based on an instantiations of BLISS-IV and BLISS-I.

Discrete Gaussian Sampling In the BLISS signature scheme [11] (and similarly in earlier variants [22]), each signature requires the signing algorithm to sample O(n) independent integers from the 1-dimensional discrete Gaussian distribution \(D_{{\mathbb {Z}},s}\), where \(s = O(m)\) is the deviation parameter (here the variable m denotes a parameter related to the underlying lattice dimension, and is typically in the order of several hundreds).Footnote 2

Ducas et al. [11], a particularly efficient sampling algorithm for \(D_{{\mathbb {Z}}, s}\) is presented. To produce a sample from \(D_{{\mathbb {Z}}, s}\), this algorithm samples about \(\ell = \lfloor \log ((k-1) \cdot ( k-1 + 2k \cdot \tau \sigma _2)\rfloor + 1\) Bernoulli random variables of the form \(B_{\exp (-\pi 2^i/s^2)}\) for  \(0\le i\le \ell -1\). Here, \(\sigma _2 = \frac{1}{\sqrt{2 \ln (2)}}\) is the standard deviation of a ‘small width’ Gaussian (sampled by Algorithm 10 in Ducas et al. [11]), \(k = \frac{s}{\sigma _2 \cdot \sqrt{2\pi }}\) is the standard deviation ‘amplification factor’ (in Algorithm 11 in Ducas et al. [11]), and \(\tau \) is the tail-cut factor for the ‘small width’ Gaussian samples (i.e., those samples are cut by rejection sampling to be less than \(\tau \cdot \sigma _2\)). To sample the required Bernoulli random variables \(B_{\exp (-\pi 2^i/s^2)}\), the authors of [11] use a precomputed table of the probabilities \(c_i = \exp (-\pi 2^i/s^2)\), for \(0\le i \le \ell -1\). Since these probabilities are real numbers, they must be truncated to some bit precision p in the precomputed table, so that truncated values \(\tilde{c}_i = c_i + \varepsilon _i\) are stored, where \(|\varepsilon _i| \le 2^{-p} c_i\) are the truncation errors.

In previous works, the precision was determined by an analysis either based on the statistical distance (SD) [11] or the Kullback–Leibler divergence (KLD) [27]. In this section, we review and complete these methods, and we propose an RD-based analysis that in some cases leads to bigger savings, asymptotically and in practice, in particular for larger security levels and or smaller number of sign queries, when the number of attack sign queries is significantly less than \(2^{\lambda /2}\) for security level \(\lambda \) (see Tables 12). More precisely, we give sufficient lower bounds on the precision p in terms of the number of signing queries \(q_s\) and security parameter \(\lambda \) to ensure security level \(\lambda \) for the scheme implemented with truncated values against adversaries making \(\le q_s\) signing queries in time T, assuming that the scheme implemented with untruncated (exact) values has security level \(\lambda +1\) (i.e., our truncated scheme loses at most 1 bit of security with respect to the untruncated scheme).

Here, and in the following analysis, we say that a scheme has security level \(\lambda \) against \((T,q_s,\varepsilon )\) forging adversaries running in time T, making \(q_s\) sign queries (where each sign query involves \(\ell \cdot m\) Bernoulli samples), and succeeding with probability \(\varepsilon \), if \(T/\varepsilon \ge 2^{\lambda }\) for all adversaries with \(T \ge Q = q_s \cdot \ell \cdot m\) and \(q_s \ge 1\) (we count each Bernoulli sampling in signing queries as a unit time operation, so that \(T \ge Q\), where Q is the total number of Bernoulli samples over all signing queries).

For any adversary, the distributions \(\varPhi '\) and \(\varPhi \) denote the signatures in the view of the adversary in the untruncated (resp. truncated) cases.

SD-based Analysis [11] Any forging adversary \(\mathcal {A}\) with success probability \(\ge \varepsilon \) in time T on the scheme implemented with truncated Gaussian has a success probability \(\varepsilon '\ge \varepsilon -\Delta (\varPhi ,\varPhi ')\) against the scheme implemented with perfect Gaussian sampling in time \(T'\). We guarantee a security level \(\lambda \) for truncated scheme if \(T/\varepsilon <2^\lambda \). This means that an adversary \(\mathcal {A}'\) against the untruncated scheme has \(T'/\varepsilon '\le (2T)/\varepsilon \) if \(\varepsilon '\ge \varepsilon /2\). Therefore, we select parameters to handle adversaries with success probabilities \(\ge \varepsilon /2\) against the untruncated scheme; we can set the required precision p so that \(\Delta (\varPhi ,\varPhi ')\le \varepsilon /2\). Each signature requires \(\ell \cdot m\) samples from the Bernoulli random variables \((B_{\tilde{c}_i})_i\). To ensure security against \(q_s\) signing queries, each of the truncated Bernoulli random variables \(B_{\tilde{c}_i}\) should be within SD \(\Delta (\varPhi ,\varPhi ')/(\ell \cdot m \cdot q_s)\) of the desired \(B_{c_i}\) (by the union bound). Using \(\Delta (B_{\tilde{c}_i},B_{c_i})=|\varepsilon _i| \le 2^{-p}c_i \le 2^{-p-1}\) leads to a precision requirement

$$\begin{aligned} p \ge \log (\ell \cdot m \cdot q_s/\Delta (\varPhi ,\varPhi ')) \ge \log \left( \frac{\ell \cdot m \cdot q_s}{\varepsilon }\right) . \end{aligned}$$

Letting \(Q=\ell \cdot m \cdot q_s\) it is sufficient to take \(p\ge \log (\frac{Q}{\varepsilon })\). For each \(\ell \cdot m \le Q \le 2^{\lambda }\), the maximum value of \(\frac{Q}{\varepsilon }\) under the constraint \(\frac{T}{\varepsilon } \le 2^{\lambda }\) is \(\frac{Q}{T} \cdot 2^{\lambda }\) which in turn has maximum value \(2^{\lambda }\) using \(T \ge Q\). Therefore, the SD-based precision requirement for truncated scheme security level \(\lambda \) is

$$\begin{aligned} p \ge \lambda . \end{aligned}$$
(1)

The overall precomputed table is hence of bit size \(L_{\text{ SD }} = p \cdot \ell \ge \log (\ell \cdot m \cdot q_s/\varepsilon ) \cdot \ell \).

One may also set the precision \(p_i\) depending on i for \(0 \le i \le \ell -1\). It is sufficient to set

$$\begin{aligned} Q \cdot 2^{-p_i} c_i \le \varepsilon /2. \end{aligned}$$

Hence, since the maximum of \(Q/\varepsilon \) is \(T/\varepsilon \le 2^{\lambda }\), the precision \(p_i\) is

$$\begin{aligned} p_i \ge \lambda + 1 + \log \left( \min \left( c_i,1-c_i\right) \right) , \, 0 \le i \le \ell -1. \end{aligned}$$
(2)

The bit size of the overall precomputed table can be computed as a sum of the above \(p_i\)’s. The \(\min \) in the precision estimate above exploits the symmetry of the Bernoulli variable to decrease the bit size of the precomputed table (i.e., we may sample \(B_{1-\tilde{c}_i}\) and flip the sampled bit to get a bit with distribution \(B_{\tilde{c}_i}\)).

KLD-Based Analysis [27] Pöppelman et al. [27] replace the SD-based analysis by a KLD-based analysis (i.e., using the RD of order \(a=1\)) to reduce the precision p needed in the precomputed table. They show that any forging adversary \(\mathcal {A}\) with success probability \(\varepsilon \) on the scheme implemented with truncated Gaussian has a success probability \(\varepsilon ' \ge \varepsilon - \sqrt{\ln R_1(\varPhi \Vert \varPhi ')/2}\) on the scheme implemented with perfect Gaussian (see remark at the end of Sect. 2.3). By the multiplicative property of the RD over the \(Q = \ell \cdot m \cdot q_s\) independent Bernoulli samples needed for signing \(q_s\) times, we get that \(R_1(\varPhi \Vert \varPhi ')\le (\max _{1\le i\le \ell }R_1(B_{\tilde{c}_i}\Vert B_{c_i}))^{\ell \cdot m\cdot q_s}\). Now, we have:

$$\begin{aligned} \ln R_1(B_{\tilde{c}_i}\Vert B_{c_i})= & {} (1-c_i-\varepsilon _i) \ln \frac{1-c_i-\varepsilon _i}{1-c_i} + (c_i + \varepsilon _i) \ln \frac{c_i+\varepsilon _i}{c_i} \\\le & {} - (1-c_i-\varepsilon _i)\frac{\varepsilon _i}{1-c_i} + (c_i + \varepsilon _i) \frac{\varepsilon _i}{c_i} = \frac{\varepsilon _i^2}{(1-c_i)c_i}. \end{aligned}$$

Exploiting the symmetry of the distribution,  \(|\varepsilon _i| \le 2^{-p} \min (c_i,1-c_i)\), we obtain \(\ln R_1(B_{\tilde{c}_i}\Vert B_{c_i}) = 2^{-2p}\min (\frac{c_i}{1-c_i},\frac{1-c_i}{c_i}) \le 2^{-2p}\). Therefore, we get \(\varepsilon ' \ge \varepsilon - \sqrt{Q \cdot 2^{-2p-1}}\). We can select parameters such that \(\sqrt{Q \cdot 2^{-2p-1}}\le \varepsilon /2\). This leads to a precision requirement

$$\begin{aligned} p \ge \frac{1}{2} \log \left( \frac{Q}{\varepsilon ^2}\right) + \frac{1}{2}. \end{aligned}$$
(3)

To minimize the required precision, if the attacker has run-time \(T<2^{\lambda }\), makes \(Q \le T\) queries, and has success probability \(\epsilon \ge T/2^{\lambda }\), we assume, as in Pöppelman et al. [27], that the attacker is first converted, by re-running it \(\approx 2^{\lambda }/T\) times with independent public keys and random coins and returning the forgery from any successful run, to an attacker with run-time \(\widehat{T}=(2^{\lambda }/T) \cdot T = 2^{\lambda }\), making \(\widehat{Q}=2^{\lambda } \cdot (Q/T)\) queries, and having success probability \(\widehat{\varepsilon } \ge 1-(1-\varepsilon )^{2^{\lambda }/T} \ge 1-\exp \left( -2^{\lambda }/(T/\varepsilon )\right) \ge 1-\exp (-1) \ge 0.63\). We remark that this new attacker works in a multi-key model, in which an attacker gets as input \(2^{\lambda }/T\) keys, and outputs a forgery for any one of them. Then, since \(\frac{\widehat{Q}}{\widehat{\varepsilon }^2} \le (2^{\lambda } \cdot Q/T)/0.63^2 \le 2^{\lambda }/0.63^2\) using \(Q \le T\), the required precision is

$$\begin{aligned} p \ge \frac{1}{2} \log \left( \frac{2^{\lambda }}{0.63^2}\right) + \frac{1}{2} \approx \frac{\lambda }{2} + 1.2. \end{aligned}$$
(4)

The overall precomputed table is hence of bit size \(L_{\text{ KLD }} \ge (\lambda /2 + 1.2) \cdot \ell \).

One may also set the precision \(p_i\) depending on i. It is sufficient to set

$$\begin{aligned} \sqrt{\widehat{Q} \cdot \frac{\left( 2^{-p_i} \min (c_i,1-c_i)\right) ^2}{2(1-c_i)c_i}} \le \frac{\widehat{\varepsilon }}{2}. \end{aligned}$$

Hence, since as above the maximum of \(\frac{\widehat{Q}}{\widehat{\varepsilon }^2}\) is \(\le 2^{\lambda }/0.63^2\) using \(Q \le T\), the precision \(p_i\) is

$$\begin{aligned} p_i \ge \frac{\lambda }{2} + 1.2 + \frac{1}{2} \log \left( \min \left( \frac{c_i}{1-c_i},\frac{1-c_i}{c_i}\right) \right) , \, 0 \le i \le \ell -1. \end{aligned}$$
(5)

\(R_\infty \)-based analysis. The probability preservation property of the Rényi divergence from Lemma 2.9 is multiplicative for \(a>1\) (rather than additive for \(a=1\)). Here we use the order \(a = \infty \). This property gives that any forging adversary \(\mathcal {A}\) having success probability \(\varepsilon \) on the scheme implemented with truncated Gaussian sampling has a success probability \(\varepsilon '\ge \varepsilon / R_\infty (\varPhi \Vert \varPhi ')\) on the scheme implemented with perfect Gaussian. If \(R=R_{\infty }(\varPhi \Vert \varPhi ') \le O(1)\), then \(\varepsilon '=\varOmega (\varepsilon )\). By the multiplicative property of the RD over the \(Q=\ell \cdot m \cdot q_s\) samples needed for signing \(q_s\) times, we have \(R_\infty (\varPhi \Vert \varPhi ')\le \prod _{i\le Q}R_{\infty }(B_{\tilde{c}_i}\Vert B_{c_i})\). By our assumption that \(c_i \le 1/2\), we have \(R_\infty (B_{\tilde{c}_i}\Vert B_{c_i})= 1 + |\varepsilon _i| / c_i \le 1+2^{-p}\). Therefore, we get  \(R_\infty (\varPhi \Vert \varPhi ')\le (1+2^{-p})^{Q}\) and hence  \(\varepsilon '\ge \varepsilon / (1 +2^{-p})^{Q}\). We select parameters to get adversaries with success probabilities \(\ge \varepsilon /2\) against the untruncated scheme and hence set the precision so that \((1 +2^{-p})^{Q}\le 2\). Using the inequality \(1+x \le \exp (x)\), this yields a sufficient precision requirement

$$\begin{aligned} p \ge \log (Q) + \log (1/\ln (2)) \approx \lambda _Q + 0.16, \end{aligned}$$
(6)

where \(\lambda _Q = \log Q\). Overall, we get a precomputed table of bit size \(L_{\text{ RD }} = \lambda _Q \cdot \ell \). In terms of the security parameter \(\lambda \), the precision requirement (6) for \(R_{\infty }\) is lower than the requirement (4) for \(R_1\) if the number of on-line queries Q is smaller than \(2^{\lambda /2}\). In practice this condition may be satisfied, especially for larger security parameters \(\lambda \) (see numberical examples below).

\(R_a\)-based analysis. We may also consider \(R_a\)-based analysis for general \(a > 1\). It should be noted that the reductions here are not tight: for \(R_a\)-based analysis with \(a > 1\), the probability preservation shows \(\varepsilon ' > \varepsilon ^{a/(a-1)} / R_a(\varPhi \Vert \varPhi ').\) The Rényi divergence can be computed, as follows

$$\begin{aligned} ({R_a(B_{\tilde{c}_i}\Vert B_{c_i})})^{a -1}&= \frac{(1-c_i - \varepsilon _i)^a}{(1-c_i)^{a-1}} + \frac{(c_i + \varepsilon _i )^a}{c_i^{a-1}}\\&= (1-c_i -\varepsilon _i) \left( 1 - \frac{\varepsilon _i}{1-c_i} \right) ^{a-1} + (c_i + \varepsilon _i) \left( 1 + \frac{\varepsilon _i}{c_i}\right) ^{a-1}. \end{aligned}$$

If a is much smaller than \(2^p\), we obtain

$$\begin{aligned} ({R_a(B_{\tilde{c}_i}\Vert B_{c_i})})^{a -1}&\approx (1-c_i -\varepsilon _i) \left( 1 - \frac{(a-1)\varepsilon _i}{1-c_i} + \frac{(a-1)(a-2)}{2} \cdot \frac{\varepsilon _i^2}{(1-c_i)^2} \right) \\&\quad + (c_i + \varepsilon _i) \left( 1 + \frac{(a-1)\varepsilon _i}{c_i} + \frac{(a-1)(a-2)}{2} \cdot \frac{\varepsilon _i^2}{c_i^2}\right) \\&\approx 1 + \frac{a(a-1)}{2} \cdot \frac{\varepsilon _i^2 }{c_i(1-c_i)} \le 1 + \frac{a(a-1)}{2} \cdot 2^{-2p}. \end{aligned}$$

For instance, if we take \(a = 2\), we have \({R_2(B_{\tilde{c}_i}\Vert B_{c_i})} \le 1 + 2^{-2p}\) and hence \(\varepsilon ' \ge \varepsilon ^{2} / {R_2(B_{\tilde{c}_i}\Vert B_{c_i})}\). On the other hand, if a is much larger than \(2^p\), then we have

$$\begin{aligned} ({R_a(B_{\tilde{c}_i}\Vert B_{c_i})})^{a -1}&= (1-c_i -\varepsilon _i) \left( 1 - \frac{\varepsilon _i}{1-c_i} \right) ^{a-1} + (c_i + \varepsilon _i) \left( 1 + \frac{\varepsilon _i}{c_i}\right) ^{a-1}\\&\approx (c_i + \varepsilon _i) \exp \left( \frac{(a-1)\varepsilon _i}{c_i}\right) . \end{aligned}$$

Hence the Rényi divergence satisfies

$$\begin{aligned} {R_a(B_{\tilde{c}_i}\Vert B_{c_i})} \approx (c_i + \varepsilon _i)^{1/(a - 1)} \exp \left( \frac{\varepsilon _i}{c_i}\right) \approx 1 + \frac{\varepsilon _i}{c_i}. \end{aligned}$$

As \(a \rightarrow \infty \), we have \({R_a(B_{\tilde{c}_i}\Vert B_{c_i})} \rightarrow 1 + 2^{-p}\).

Thus if the tightness of the reduction is not a concern, using \(R_a\) with small a reduces the precision requirement. Subsequent work [34] shows that by choosing an adequate a, tightness can be reached (\(\varepsilon ' \approx \varepsilon \)) for the same number of queries. This may however lead to a slightly larger precision (compared to the case of using a tiny Rényi order a).

Numerical Examples

In Tables 1 and 2, we consider a numerical example which gives the lower bound on the precision p and table bit size for Gaussian sampling in the schemes BLISS-IV (\(\lambda =192\)) and BLISS-I (\(\lambda =128\)), and three settings for the number of sign queries \(q_s = (2^{42},2^{50},2^{64})\) allowed for the adversary. In all cases, we assume that the ‘small deviation’ (positive) Gaussian samples of standard deviation \(\sigma _2 = \frac{1}{\sqrt{2\ln (2)}}\) (sampled in Algorithm 11 of Ducas et al. [11]) are tail cut to \(\tau \sigma _2\), where we set the tail-cut factor \(\tau = \sqrt{2 \ln (2 m q_s)}\) by applying Lemma 2.11, to make the \(R_{\infty }\) bound \(\le \exp (1)\) between the cut and uncut distributions, over all \(m q_s\) ‘small deviation’ Gaussian samples.

For the BLISS-IV parameters, we use \(\lambda =192\), \(m=1024\)\(k=\lceil 271 /\sigma _2 \rceil = 320\), \(\tau = \sqrt{2 \ln (2 m q_s)} \approx (7.3, 7.8, 8.7)\), \(\ell = \lfloor \log ((k-1) \cdot ( k-1 + 2k \cdot \tau \sigma _2)\rfloor + 1 = 21\)\(s= \lceil \sqrt{2\pi }\cdot k \cdot \sigma _2 \rceil = 682\) and \(Q=\ell \cdot m \cdot q_s \approx (2^{56}, 2^{64}, 2^{78})\). For the BLISS-I parameters, we use \(\lambda =128\), \(m=1024\)\(k=\lceil 215 /\sigma _2 \rceil = 254\), \(\tau = \sqrt{2 \ln (2 m q_s)} \approx (7.3, 7.8, 8.7)\), \(\ell = \lfloor \log ((k-1) \cdot ( k-1 + 2k \cdot \tau \sigma _2)\rfloor + 1 = 20\)\(s= \lceil \sqrt{2\pi }\cdot k \cdot \sigma _2 \rceil = 541\) and \(Q=\ell \cdot m \cdot q_s \approx (2^{56}, 2^{64}, 2^{78})\). In all cases, we assume that the underlying BLISS scheme with perfect (infinite precision) Bernoulli sampling has security level \(2^{\lambda +1}\).

Note that we assume, as is common in practice, that the allowed ‘off-line’ attack run-time \(T = 2^\lambda \) is much bigger than the allowed ‘on-line’ number of sign queries \(q_s\). This assumption may be satisfied in practice since in many applications the number of issued signatures \(q_s\) is limited by computation, communication and/or policy restrictions of the attacked user’s application running the signing algorithm, whereas the ‘off-line’ run-time T depends only on the attacker’s resources and may be much larger. For example, even for the scenario with the smallest allowed number of signatures \(q_s=2^{42}\) considered in the Tables, if the attacked user’s signing algorithm runs on a single Intel Core i7 CPU at 3.4 GHz, it would take the attacker more than 17 years to collect all \(q_s\) signatures, even if the signer was continuously signing messages throughout this time.

Table 1 Comparison of the precision needed to obtain \(2^\lambda \) security for the finite precision BLISS-IV scheme against adversaries with off-line run-time \(T\le 2^\lambda \) and making less than \(q_s\) sign queries (resulting in \(Q= \ell \cdot m \cdot q_s\) Bernoulli samples over all sign queries), assuming \(\approx 2^{\lambda +1}\) security of the infinite precision scheme
Table 2 Comparison of the precision needed to obtain \(2^\lambda \) security for the finite precision BLISS-I scheme against adversaries with off-line run-time \(T\le 2^\lambda \) and making less than \(q_s\) sign queries (resulting in \(Q= \ell \cdot m \cdot q_s\) Bernoulli samples over all sign queries), assuming \(\approx 2^{\lambda +1}\) security of the infinite precision scheme

4 Rényi Divergence and Distinguishing Problems

In this section, we prove Theorem 4.2 which allows to use the RD for distinguishing problems, and we show how to apply it to the dual-Regev encryption scheme.

4.1 Problems with Public Sampleability

A general setting one comes across in analyzing the security of cryptographic schemes has the following form. Let P denote a decision problem that asks to distinguish whether a given x was sampled from distribution \(X_0\) or \(X_1\), defined as follows:

$$\begin{aligned} X_0 = \{x : r \hookleftarrow \varPhi , x \hookleftarrow D_0(r) \}, \quad X_1 = \{x : r \hookleftarrow \varPhi , x \hookleftarrow D_1(r) \}. \end{aligned}$$

Here r is some parameter that is sampled from the same distribution \(\varPhi \) in both \(X_0\) and \(X_1\). The parameter r then determines the conditional distributions \(D_0(r)\) and \(D_1(r)\) from which x is sampled in \(X_0\) and \(X_1\), respectively, given r. Now, let \(P'\) denote another decision problem that is defined similarly to P, except that in \(P'\) the parameter r is sampled from a different distribution \(\varPhi '\) (rather than \(\varPhi \)). Given r, the conditional distributions \(D_0(r)\) and \(D_1(r)\) are the same in \(P'\) as in P. Let \(X'_0\) (resp. \(X'_1\)) denote the resulting marginal distributions of x in problem \(P'\). Now, in the applications we have in mind, the distributions \(\varPhi '\) and \(\varPhi \) are “close” in some sense, and we wish to show that this implies an efficient reduction between problems \(P'\) and P, in the usual sense that every distinguisher with efficient run-time T and non-negligible advantage \(\varepsilon \) against P implies a distinguisher for \(P'\) with efficient run-time \(T'\) and non-negligible advantage \(\varepsilon '\). In the classical situation, if the SD \(\Delta (\varPhi ,\varPhi ')\) between \(\varPhi '\) and \(\varPhi \) is negligible, then the reduction is immediate. Indeed, for \(b \in \{0,1\}\), if \(p_b^{}\) (resp. \(p'_b\)) denotes the probability that a distinguisher algorithm \(\mathcal {A}\) outputs 1 on input distribution \(X_b^{}\) (resp. \(X'_b\)), then we have, from the SD probability preservation property, that \(|p'_b - p_b| \le \Delta (\varPhi ,\varPhi ')\). As a result, the advantage \(\varepsilon '=|p'_1-p'_0|\) of \(\mathcal {A}\) against \(P'\) is bounded from below by \(\varepsilon -2\Delta (\varPhi ,\varPhi ')\) which is non-negligible (here \(\varepsilon = |p_1-p_0|\) is the assumed non-negligible advantage of \(\mathcal {A}\) against P).

Unfortunately, for general decision problems \(P,P'\) of the above form, it seems difficult to obtain an RD-based analogue of the above SD-based argument, in the weaker setting when the SD \(\Delta (\varPhi ,\varPhi ')\) is non-negligible, but the RD \(R=R(\varPhi \Vert \varPhi ')\) is small. Indeed, the probability preservation property of the RD in Lemma 2.9 does not seem immediately useful in the case of general decision problems \(P,P'\). With the above notations, it can be used to conclude that \(p'_b \ge p^2_b/R\) but this does not allow us to usefully relate the advantages \(|p'_1 - p'_0|\) and \(|p_1 - p_0|\).

Nevertheless, we now make explicit a special class of “publicly sampleable” problems \(P,P'\) for which such a reduction can be made. In such problems, it is possible to efficiently sample from both distributions \(D_0(r)\) (resp. \(D_1(r)\)) given the single sample x from the unknown \(D_b(r)\). This technique is implicit in the application of RD in the reductions of Lyubashevsky et al. [19]: we abstract it and make it explicit in the following.

Before going ahead to state one of the main results of this paper, we recall Hoeffding’s bound [17]:

Lemma 4.1

Let \(X_1, \ldots , X_N\) be independent random variables for which \(a_i \le X_i \le b_i\). Let \({\overline{X}}\) denotes \(\frac{X_{1}+\cdots +X_{n}}{N}\), then

$$\begin{aligned} {\mathbb {P}}\left( \left| \overline{X}-{\mathrm {E}}\left[ \overline{X}\right] \right| \ge t\right) \le 2\exp \left( -{\frac{2N^{2}t^{2}}{\sum _{{i=1}}^{N}(b_{i}-a_{i})^{2}}}\right) , \end{aligned}$$

is valid for all positive t and \(\mathrm {E}\) denotes the expected value.

Theorem 4.2

Let \(\varPhi ,\varPhi '\) denote two distributions with \(\mathrm {Supp}(\varPhi ) \subseteq \mathrm {Supp}(\varPhi ')\), and \(D_0(r)\) and \(D_1(r)\) denote two distributions determined by some parameter \(r \in \mathrm {Supp}(\varPhi ')\). Let \(P, P'\) be two decision problems defined as follows:

  • Problem P: distinguish whether input x is sampled from distribution \(X_0\) or \(X_1\), where

    $$\begin{aligned} X_0 = \{x : r \hookleftarrow \varPhi , x \hookleftarrow D_0(r) \}, \quad X_1 = \{x : r \hookleftarrow \varPhi , x \hookleftarrow D_1(r) \}. \end{aligned}$$
  • Problem \(P'\): distinguish whether input x is sampled from distribution \(X'_0\) or \(X'_1\), where

    $$\begin{aligned} X'_0 = \{x : r \hookleftarrow \varPhi ', x \hookleftarrow D_0(r) \}, \quad X'_1 = \{x : r \hookleftarrow \varPhi ', x \hookleftarrow D_1(r) \}. \end{aligned}$$

Assume that \(D_0(\cdot )\) and \(D_1(\cdot )\) satisfy the following public sampleability property: there exists a sampling algorithm \(\mathsf {S}\) with run-time \(T_S\) such that for all (rb), given any sample x from \(D_b(r)\):

  • \(\mathsf {S}(0,x)\) outputs a fresh sample distributed as \(D_0(r)\) over the randomness of \(\mathsf {S}\),

  • \(\mathsf {S}(1,x)\) outputs a fresh sample distributed as \(D_1(r)\) over the randomness of \(\mathsf {S}\).

Then, given a T-time distinguisher \(\mathcal {A}\) for problem P with advantage \(\varepsilon \), we can construct a distinguisher \(\mathcal {A}'\) for problem \(P'\) with run-time and distinguishing advantage, respectively, bounded from above and below by (for any \(a \in (1,+\infty ]\)):

$$\begin{aligned} {\frac{64}{\varepsilon ^2} \log \left( \frac{8R_{a}(\varPhi \Vert \varPhi ')}{\varepsilon ^{a/(a-1)+1}}\right) \cdot (T_S + T)} \text{ and } \frac{\varepsilon }{4 \cdot R_{a}(\varPhi \Vert \varPhi ')} \cdot \left( \frac{\varepsilon }{2}\right) ^{\frac{a}{a-1}}. \end{aligned}$$

Proof

For each \(\hat{r}\in \mathrm {Supp}(\varPhi )\), and \(b\in \{0,1\}\), we let \(p_b(\hat{r}) = \mathrm {Pr}_{x\hookleftarrow D_b(\hat{r})}(\mathcal {A}(x)=1)\) and \(p_b = \sum _{\hat{r}\in \mathrm {Supp}(\varPhi )}p_b(\hat{r})\varPhi (\hat{r})\). The advantage of \(\mathcal {A}\) is defined as \(|p_0-p_1|\), which we assume is bigger than \(\varepsilon \). Without loss of generality, we may assume that \(p_0>p_1\). Distinguisher \(\mathcal {A}'\) is given an input x sampled from \(D_b(r)\) for some r sampled from \(\varPhi '\) and some unknown \(b \in \{0,1\}\). For an \(\varepsilon '\) to be determined later, it runs distinguisher \(\mathcal {A}\) on \(N\ge 32\varepsilon ^{-2}\log (4/\varepsilon ')\) independent inputs sampled from \(D_0(r)\) and \(D_1(r)\) calling algorithm \(\mathsf {S}\) on (0, x) and (1, x) to obtain estimates \(\hat{p}_0\) and \(\hat{p}_1\) for the acceptance probabilities \(p_0(r)\) and \(p_1(r)\) of \(\mathcal {A}\) given as inputs samples from \(D_0(r)\) and \(D_1(r)\) (with the r fixed to the value used to sample the input x of \(\mathcal {A}'\)). By letting \(t=\varepsilon /8\) and \(N=32\varepsilon ^{-2}\log (4/\varepsilon ')\) for \(X_i\)’s being Bernoulli with probability \(p_b(r)\) over \([a_i, b_i]=[0, 1]\), for \(1\le i\le N\), the Hoeffding’s bound implies that, the estimation errors \(|\hat{p}_0 - p_0|\) and \(|\hat{p}_1 - p_1|\) are \(<\varepsilon /8\) except with probability \(< 2\exp (-2Nt^2) = \varepsilon '/2\) over the randomness of \(\mathsf {S}\). Then, if \(\hat{p}_1 - \hat{p}_0 > \varepsilon /4\), distinguisher \(\mathcal {A}'\) runs \(\mathcal {A}\) on input x and returns whatever \(\mathcal {A}\) returns, else distinguisher \(\mathcal {A}'\) returns a uniformly random bit. This completes the description of distinguisher \(\mathcal {A}'\).

Let \(\mathcal {S}_1\) denote the set of r’s such that \(p_1(r)- p_0(r) \ge \varepsilon /2\)\(\mathcal {S}_2\) denote the set of r’s that are not in \(\mathcal {S}_1\) and such that \(p_1(r)- p_0(r) \ge 0\), and \(\mathcal {S}_3\) denote all the remaining r’s. Then:

  • If \(r \in \mathcal {S}_1\), then except with probability \(< \varepsilon '\) over the randomness of \(\mathsf {S}\), we will have \(\hat{p}_1 - \hat{p}_0 > \varepsilon /4\) and thus \(\mathcal {A}'\) will output \(\mathcal {A}(x)\). Thus, in the case \(b=1\), we have \(\Pr [\mathcal {A}'(x)=1 | r \in \mathcal {S}_1] \ge p_1(r) - \varepsilon '\) and in the case \(b=0\), we have \(\Pr [\mathcal {A}'(x)=1 | r \in \mathcal {S}_1] \le p_0(r) + \varepsilon '\).

  • Assume that \(r \in \mathcal {S}_2\). Let u(r) be the probability over the randomness of \(\mathsf {S}\) that \(\hat{p}_1 - \hat{p}_0 > \varepsilon /4\). Then \(\mathcal {A}'\) will output \(\mathcal {A}(x)\) with probability u(r) and a uniform bit with probability \(1-u(r)\). Thus, in the case \(b=1\), we have \(\Pr [\mathcal {A}'(x)=1| r \in \mathcal {S}_2] = u(r) \cdot p_1(r) + (1-u(r)) /2\), and in the case \(b=0\), we have \(\Pr [\mathcal {A}'(x)=1| r \in \mathcal {S}_2] = u(r) \cdot p_0(r) + (1-u(r))/2\).

  • If \(r \in \mathcal {S}_3\), except with probability \(<\varepsilon '\) over the randomness of \(\mathsf {S}\), we have \(\hat{p}_1 - \hat{p}_0 < \varepsilon /4\) and \(\mathcal {A}'\) will output a uniform bit. Thus, in the case \(b=1\), we have \(\Pr [\mathcal {A}'(x)=1| r \in \mathcal {S}_3] \ge 1/2 - \varepsilon '\), and in the case \(b=0\), we have \(\Pr [\mathcal {A}'(x)=1| r \in \mathcal {S}_3] \le 1/2 +\varepsilon '\).

Overall, the advantage of \(\mathcal {A}'\) is bounded from below by:

$$\begin{aligned}&\sum _{r \in \mathcal {S}_1} \varPhi '(r) \left( p_1(r) - p_0(r) - 2 \varepsilon ' \right) + \sum _{r \in \mathcal {S}_2} \varPhi '(r) u(r) \left( p_1(r) - p_0(r)\right) \\&\quad - \sum _{r \in \mathcal {S}_3} \varPhi '(r) 2 \varepsilon ' \ge \ \varPhi '(\mathcal {S}_1) \cdot \frac{\varepsilon }{2} - 2 \varepsilon '. \end{aligned}$$

By an averaging argument, the set \(\mathcal {S}_1\) has probability \(\varPhi (\mathcal {\mathcal {S}}_1) \ge \varepsilon /2\) under distribution \(\varPhi \). Hence, by the RD probability preservation property (see Lemma 2.9), we have \(\varPhi '(\mathcal {S}_1) \ge (\varepsilon /2)^{\frac{a}{a-1}}/R_{a}(\varPhi \Vert \varPhi ')\). The proof may be completed by setting \(\varepsilon ' = (\varepsilon /4) \cdot (\varepsilon /2)^{\frac{a}{a-1}}/R_{a}(\varPhi \Vert \varPhi ')\). \(\square \)

4.2 Application to Dual-Regev Encryption

Let \(m,n,q,\chi \) be as in Definition 2.5 and \(\varPhi \) denote a distribution over \({\mathbb {Z}}_q^{m \times n}\). We define the LWE variant \(\mathrm {LWE}_{n,q,\chi ,m}(\varPhi )\) as follows: Sample \(\mathbf{A} \hookleftarrow \varPhi \)\(\mathbf {s} \hookleftarrow U({\mathbb {Z}}_q^n)\)\(\mathbf {e} \hookleftarrow \chi ^m\) and \(\mathbf {u} \hookleftarrow U({\mathbb {T}}^m)\); The goal is to distinguish between the distributions \(\left( \mathbf{A},\frac{1}{q} \mathbf{A}\mathbf {s}+\mathbf {e}\right) \) and \((\mathbf{A},\mathbf {u})\) over \({\mathbb {Z}}_q^{m \times n} \times {\mathbb {T}}^m\). Note that standard LWE is obtained by taking \(\varPhi ' = U({\mathbb {Z}}_q^{m \times n})\).

As an application to Theorem 4.2, we show that LWE with non-uniform and possibly statistically correlated \(\mathbf {a}_i\)’s of the samples \((\mathbf {a}_i,b_i)\)’s (with \(b_i\) either independently sampled from \(U({\mathbb {T}})\) or close to \(\langle \mathbf {a}_i , \mathbf {s} \rangle \) for a secret vector \(\mathbf {s}\)) remains at least as hard as standard LWE, as long as the RD \(R(\varPhi \Vert U)\) remains small, where \(\varPhi \) is the joint distribution of the given \(\mathbf {a}_i\)’s and U denotes the uniform distribution.

To show this result, we first prove in Corollary 4.3 that there is a reduction from \(\mathrm {LWE}_{n,q,\chi ,m}(\varPhi ')\) to \(\mathrm {LWE}_{n,q,\chi ,m}(\varPhi )\) using Theorem 4.2 if \(R_{a}(\varPhi \Vert \varPhi ')\) is small enough. We then describe in Corollary 4.4 how to use this first reduction to obtain smaller parameters for the dual-Regev encryption. This allows us to save an \(\varOmega (\sqrt{\lambda /\log \lambda })\) factor in the Gaussian deviation parameter r used for secret key generation in the dual-Regev encryption scheme [16], where \(\lambda \) refers to the security parameter.

Corollary 4.3

Let \(\varPhi \) and \(\varPhi '\) be two distributions over \({\mathbb {Z}}_q^{m \times n}\) with \(\mathrm {Supp}(\varPhi ) \subseteq \mathrm {Supp}(\varPhi ')\). If there exists a distinguisher \(\mathcal {A}\) against the \(\mathrm {LWE}_{n,q,\chi ,m}(\varPhi )\) with run-time T and advantage \(\varepsilon = o(1)\), then there exists a distinguisher \(\mathcal {A}'\) against the \(\mathrm {LWE}_{n,q,\chi ,m}(\varPhi ')\) with run-time \(T' = O(\varepsilon ^{-2} \log \frac{R_{a}(\varPhi \Vert \varPhi ')}{\varepsilon ^{a/(a-1)}} \cdot (T + \mathrm {poly}(m,\log q)))\) and advantage

$$\begin{aligned} \varOmega \left( \frac{\varepsilon ^{1+a/(a-1)}}{R_{a} (\varPhi \Vert \varPhi ')}\right) , \end{aligned}$$

for any \(a \in (1,+\infty ]\).

Proof

Apply Theorem 4.2 with \(r = {\mathbf{A}\in {\mathbb {Z}}_q^{m \times n}}\)\(x = (\mathbf{A},\mathbf {b}) \in {\mathbb {Z}}_q^{m \times n} \times {\mathbb {T}}^m\)\(D_0(r) = (\mathbf{A},\mathbf{A} \cdot \mathbf {s}+\mathbf {e})\) with \(\mathbf {s} \hookleftarrow U({\mathbb {Z}}_q^n)\) and \(\mathbf {e} \hookleftarrow \chi ^m\), and \(D_1(r) = (\mathbf{A},\mathbf {u})\) with \(\mathbf {u} \hookleftarrow U({\mathbb {Z}}_q^m)\). The sampling algorithm \(\mathsf {S}\) is such that \(\mathsf {S}(0,x)\) outputs \((\mathbf{A}, \mathbf{A} \cdot \mathbf {s}' + \mathbf {e}')\) for \(\mathbf {s}' \hookleftarrow U({\mathbb {Z}}_q^n)\) and \(\mathbf {e}' \hookleftarrow \chi ^m\), while \(\mathsf {S}(1,x)\) outputs \((\mathbf{A},\mathbf {u}')\) with \(\mathbf {u}' \hookleftarrow U({\mathbb {Z}}_q^m)\). \(\square \)

We recall that the dual-Regev encryption scheme has a general public parameter \(\mathbf{A} \in {\mathbb {Z}}_q^{m \times n}\), a secret key of the form \(\hbox {sk} = \mathbf {x}\) with \(\mathbf {x} \hookleftarrow D_{{\mathbb {Z}}^m,r}\) and a public key of the form \(\mathbf {u}= \mathbf{A}^t \mathbf {x} \bmod q\). A ciphertext for a message \(M \in \{0,1\}\) is obtained as follows: Sample \(\mathbf {s} \hookleftarrow U({\mathbb {Z}}_q^n)\)\(\mathbf {e}_1 \hookleftarrow \chi ^m\) and \(e_2 \hookleftarrow \chi \); return ciphertext

$$\begin{aligned} (\mathbf {c}_1,c_2) = \left( \frac{1}{q} \mathbf{A}\mathbf {s} + \mathbf {e}_1, \frac{1}{q} \langle \mathbf {u} , \mathbf {s} \rangle + e_2 + \frac{M}{2}\right) \in {\mathbb {T}}^m \times {\mathbb {T}}. \end{aligned}$$

Corollary 4.4

Suppose that q is prime, \(m \ge 2n \log q\) and \(r \ge 4 \sqrt{\log (12m)/\pi }\). If there exists an adversary against the IND-CPA security of the dual-Regev encryption scheme with run-time T and advantage \(\varepsilon \), then there exists a distinguishing algorithm for \(\mathrm {LWE}_{n,q,\chi , m+1}\) with run-time \(O((\varepsilon ')^{-2} \log (\varepsilon ')^{-1} \cdot (T + \mathrm {poly}(m)))\) and advantage \(\varOmega ((\varepsilon ')^2)\), where \(\varepsilon ' = \varepsilon - 2q^{-n}\).

Proof

Breaking the security of the dual-Regev encryption scheme as described above is at least as hard as \(\mathrm {LWE}_{n,q,\chi ,m+1}(\varPhi )\) where \(\varPhi \) is obtained by sampling \(\mathbf{A} \hookleftarrow U({\mathbb {Z}}_q^{m \times n})\), \(\mathbf {u} \hookleftarrow \mathbf{A}^t \cdot D_{{\mathbb {Z}}^m,r} \bmod q\) and returning the \((m+1) \times n\) matrix obtained by appending \(\mathbf {u}^t\) at the bottom of \(\mathbf{A}\). We apply Corollary 4.3 with \(\varPhi ' = U({\mathbb {Z}}_q^{(m+1) \times n})\).

Since q is prime, if \(\mathbf{A}\) is full rank, then the multiplication by \(\mathbf{A}^t\) induces an isomorphism between the quotient group \({\mathbb {Z}}^m/\varLambda _\mathbf{A}^{\perp }\) and \({\mathbb {Z}}_q^n\), where \(\varLambda _\mathbf{A}^{\perp } = \{ \mathbf {x} \in {\mathbb {Z}}^m: \mathbf{A}^t \cdot \mathbf {x} = \mathbf {0} \bmod {q}\}\). By Lemma 2.2, we have \(\eta _{1/3}\left( \varLambda _\mathbf{A}^{\perp }\right) \le 4 \sqrt{\log (12m)/\pi } \le r\), except for a fraction \(\le q^{-n}\) of the \(\mathbf{A}\)’s. Let \(\mathsf {Bad}\) denote the union of such bad \(\mathbf{A}\)’s and the \(\mathbf{A}\)’s that are not full rank. We have \(\Pr [\mathsf {Bad}] \le 2q^{-n}\).

By the multiplicativity property of Lemma 2.9, we have:

$$\begin{aligned} R_{\infty }(\varPhi \Vert \varPhi ') \le \max _{\mathbf{A} \notin \mathsf {Bad}} R_{\infty }\left( D_{{\mathbb {Z}}^m,r} \bmod \varLambda _\mathbf{A}^{\perp } \Vert U_{{\mathbb {Z}}^m/\varLambda _\mathbf{A}^{\perp }}\right) . \end{aligned}$$

Thanks to Lemma 2.10, we know that the latter is \(\le \)2. The result now follows from Corollary 4.3. \(\square \)

In all applications, we are aware of, the parameters satisfy \(m \le \mathrm {poly}(\lambda )\) and \(q^{-n} \le 2^{-\lambda }\), where \(\lambda \) refers to the security parameter. The \(r = \varOmega (\sqrt{\log \lambda })\) bound of our Corollary 4.4, that results from using \(\delta =1/3\) in the condition \(r \ge \eta _{\delta }\left( \varLambda _\mathbf{A}^{\perp }\right) \) in the RD-based smoothing argument of the proof above, improves on the corresponding bound \(r = \varOmega (\sqrt{\lambda })\) that results from the requirement to use \(\delta =O(2^{-\lambda })\) in the condition \(r \ge \eta _{\delta }\left( \varLambda _\mathbf{A}^{\perp }\right) \) in the SD-based smoothing argument of the proof of [16, Theorem 7.1], in order to handle adversaries with advantage \(\varepsilon = 2^{-o(\lambda )}\) in both cases. Thus our RD-based analysis saves a factor \(\varOmega \left( \sqrt{\lambda /\log \lambda }\right) \) in the choice of r, and consequently of \(a^{-1}\) and q. (The authors of [16] specify a choice of \(r = \omega (\sqrt{\log \lambda })\) for their scheme because they use in their analysis the classical “no polynomial attacks” security requirement, corresponding to assuming attacks with advantage \(\varepsilon =\lambda ^{-O(1)}\), rather than the stronger \(\varepsilon = \omega (2^{-\lambda })\) but more realistic setting we take.)

5 Application to LWE with Uniform Noise

The LWE problem with noise uniform in a small interval was introduced first in Döttling et al. [12]. In that article, the authors exhibit a reduction from LWE with Gaussian noise, which relies on a new tool called lossy codes. The main proof ingredients are the construction of lossy codes for LWE (which are lossy for the uniform distribution in a small interval), and the fact that lossy codes are pseudorandom.

We note that the reduction from Döttling et al. [12] needs the number of LWE samples to be bounded by \(\mathrm {poly}(n)\) and that it degrades the LWE dimension by a constant factor. The parameter \(\beta \) (when the interval of the noise is \([- \beta , \beta ]\)) should be at least \(m n^{\sigma } \alpha \) where \(\alpha \) is the LWE Gaussian noise parameter and \(\sigma \in (0,1)\) is an arbitrarily small constant.

Another hardness result for LWE with uniform noise can be obtained by composing the hardness result for Learning With Rounding (LWR) from Bogdanov et al. [4] (based on RD-based techniques inspired by an earlier version of our paper, see Theorem 6.1 in Sec. 6 and the discussion there) with the reduction of Chow [10] (see Theorem 6 in Bogdanov et al. [4]) from LWR to LWE with uniform noise. The resulting reduction maps the \(\mathrm {LWE}_{n',q,D_{\alpha },m}\) problem to the \(\mathrm {LWE}_{n,q,U([- \beta , \beta ]),m}\) problem with \(n' = n/\log q\) and  \(\beta = \varOmega (m \alpha /\sqrt{\log n})\), and hence, like the reduction of D{”ottling[12], it also degrades the LWE dimension.

We now provide an alternative reduction from the \(\mathrm {LWE}_{n,q,D_{\alpha },m}\) distinguishing problem to the \(\mathrm {LWE}_{n,q,U([- \beta , \beta ]),m}\) distinguishing problem, and analyze it using RD. Our reduction preserves the LWE dimension n, and is hence tighter in terms of dimension than the reductions from Döttling et al. [12] and Bogdanov et al. [4] discussed above. In terms of noise, our reduction requires that \(\beta = \varOmega (m \alpha /\log n)\) (so in this respect is slightly less tight than the reduction of Bogdanov et al. [4] by a factor \(\sqrt{\log n}\)).

We remark that the search-decision equivalence idea in the proof of Theorem 5.1 could be extended to show the hardness of the decision LWE problem with any noise distribution \(\psi \), with respect to the hardness of LWE with Gaussian noise \(D_{\alpha }\) if either \(\psi \) is ‘close’ to \(D_{\alpha }\) in the sense of RD (i.e., \(R(\psi \Vert D_{\alpha })\) is ‘small’), or (as below) if \(\psi \) is sufficiently ‘wider’ than a \(D_{\alpha }\) so that \(R(\psi \Vert \psi + D_{\alpha })\) is ‘small’. The first generalization could be applied to prove the IND-CPA security of LWE-based encryption schemes (such as Regev [30] and Dual-Regev [16]) schemes with low-precision Gaussian sampling, as used for signature schemes in Sect. 3.

Theorem 5.1

Let \(\alpha , \beta > 0\) be real numbers with \(\beta = \varOmega (m \alpha /\log n)\) for positive integers m and n. Let \(m > \frac{n\log q}{\log \left( \alpha +\beta \right) ^{-1}} \ge 1\) with \(q \le \mathrm {poly}(m,n)\) prime. Then there is a polynomial-time reduction from \(\mathrm {LWE}_{n,q,D_{\alpha },m}\) to \(\mathrm {LWE}_{n,q,\phi ,m}\), with \(\phi = \frac{1}{q} \lfloor q U_{\beta } \rceil \).

Proof

Our reduction relies on five steps:

  • A reduction from \(\mathrm {LWE}_{n,q,D_{\alpha },m}\) to \(\mathrm {LWE}_{n,q,\psi ,m}\) with \(\psi = D_{\alpha } + U_{\beta }\),

  • A reduction from \(\mathrm {LWE}_{n,q,\psi ,m}\) to \(\mathrm {sLWE}_{n,q,\psi ,m}\),

  • A reduction from \(\mathrm {sLWE}_{n,q,\psi ,m}\) to \(\mathrm {sLWE}_{n,q,U_{\beta },m}\),

  • A reduction from \(\mathrm {sLWE}_{n,q,U_{\beta },m}\) to \(\mathrm {sLWE}_{n,q,\phi ,m}\), with \(\phi = \frac{1}{q} \lfloor q U_{\beta } \rceil \),

  • A reduction from \(\mathrm {sLWE}_{n,q,\phi ,m}\) to \(\mathrm {LWE}_{n,q,\phi ,m}\).

First Step The reduction is given m elements \((\mathbf {a}_i,b_i) \in {\mathbb {Z}}_q^n \times {\mathbb {T}}\), all drawn from \(A_{\mathbf {s}, D_{\alpha }}\) (for some \(\mathbf {s}\)), or all drawn from \(U({\mathbb {Z}}_q^n \times {\mathbb {T}})\). The reduction consists in adding independent samples from \(U_{\beta }\) to each \(b_i\). The reduction maps the uniform distribution to itself, and \(A_{\mathbf {s}, D_{\alpha }}\) to \(A_{\mathbf {s}, \psi }\).

Second Step Reducing the distinguishing variant of LWE to its search variant is direct. In particular, suppose that there exists a solver, which finds the secret for \(\mathrm {sLWE}_{n,q,\psi ,m}\) with success probability \(\varepsilon \). We use this solver to construct a distinguisher for \(\mathrm {LWE}_{n,q,\psi ,m}\). Let \((\mathbf{A},\mathbf {b})\) be the input to the distinguisher, which comes from either the LWE distribution or the uniform distribution. Let \(\mathbf {s}\) be the output of the solver on input \((\mathbf{A},\mathbf {b})\). Given \(\mathbf {s}\), \(\mathbf{A}\), and \(\mathbf {b}\), we compute \(\Vert \mathbf {b}-\frac{1}{q}\mathbf{A}\mathbf {s}\Vert _\infty \). If this quantity is smaller than

$$\begin{aligned} t_0= \beta + 2\pi \alpha \sqrt{\log \left( 4\varepsilon ^{-1}\right) }, \end{aligned}$$
(7)

then the distinguisher outputs 1; otherwise, it outputs 0. We now analyze the advantage \(\varepsilon _{\text{ adv }}\) of such a distinguisher. On the one hand, if the input to the distinguisher comes from the LWE distribution, the probability that the distinguisher outputs 1 is bounded from below by

$$\begin{aligned} \varepsilon -\mathrm {Pr}_{\mathbf {e}\hookleftarrow \psi }\left( \Vert \mathbf {e}\Vert _\infty \ge t_0\right) . \end{aligned}$$

On the other hand, when the input comes from the uniform distribution, the probability of having 1 as the output of the constructed distinguisher is bounded from above by

$$\begin{aligned} \mathrm {Pr}_{\mathbf {b}\hookleftarrow U(\mathbb {Z}_q^n)} \left[ \exists \mathbf {s}\in \mathbb {Z}_q^n: \left\| \mathbf {b}-\frac{1}{q}\mathbf{A}\mathbf {s}\right\| _\infty \le t_0\right] . \end{aligned}$$

Hence the overall distinguishing advantage satisfies

$$\begin{aligned} \varepsilon _{\text{ adv }} \ge \left( \varepsilon -\mathrm {Pr}_{\mathbf {e}\hookleftarrow \psi }\left[ \Vert \mathbf {e}\Vert _\infty \ge t_0\right] \right) -\mathrm {Pr}_{\mathbf {b}\hookleftarrow U(\mathbb {Z}_q^n)}\left[ \exists \mathbf {s}\in \mathbb {Z}_q^n:\left\| \mathbf {b}-\frac{1}{q}\mathbf{A}\mathbf {s}\right\| _\infty \le t_0\right] . \end{aligned}$$
(8)

Since

$$\begin{aligned} \mathrm {Pr}_{\mathbf {e}\hookleftarrow \psi }\left[ \Vert \mathbf {e}\Vert _\infty \ge t_0\right] \le \mathrm {Pr}_{\mathbf {e}\hookleftarrow D_\alpha }\left[ \Vert \mathbf {e}\Vert _\infty \ge t\right] , \end{aligned}$$

where t is defined to be \(2\pi \alpha \sqrt{\log \left( 4\varepsilon ^{-1}\right) }\), the lower bound on \(\varepsilon _{\text{ adv }}\) given in (8) can be re-written as

$$\begin{aligned} \varepsilon -\mathrm {Pr}_{\mathbf {e}\hookleftarrow D_\alpha }[\Vert \mathbf {e}\Vert _\infty \ge t]-\mathrm {Pr}_{\mathbf {b}\hookleftarrow U(\mathbb {Z}_q^n)}\left[ \exists \mathbf {s}\in \mathbb {Z}_q^n|\left\| \mathbf {b}-\frac{1}{q}\mathbf{A}\mathbf {s}\right\| _\infty \le t_0\right] . \end{aligned}$$
(9)

If both the above probabilities are \(\le \varepsilon /4\), then \(\varepsilon _{\text{ adv }}\) is at least \(\varepsilon /2\). We now give an upper bound to each probability and enforce the parameters to satisfy these bounds. For the first probability, since \(\mathbf {e}\hookleftarrow D_\alpha \), a standard Gaussian tail bound follows

$$\begin{aligned} \mathrm {Pr}_{\mathbf {e}\hookleftarrow D_\alpha }[\Vert \mathbf {e}\Vert _\infty \ge t]\le \exp \left( -\left( \frac{2\pi \alpha }{t}\right) ^2\right) . \end{aligned}$$

To ensure that the latter is less than \(\varepsilon /4\), we need \(t\ge 2\pi \alpha \sqrt{\log \left( 4\varepsilon ^{-1}\right) }\). Our \(t_0\) defined in (7) satisfies the latter condition. For the second probability, by using a union bound argument, we have that

$$\begin{aligned} \mathrm {Pr}_{\mathbf {b}\hookleftarrow U(\mathbb {Z}_q^n)}[\exists \mathbf {s}\in \mathbb {Z}_q^n:\Vert \mathbf {b}-\mathbf{A}\mathbf {s}\Vert _\infty \le t_0]\le q^n\left( \frac{2\lfloor t_0q \rfloor +1}{q}\right) ^m\le \left( q^{n/m}(4t_0)\right) ^m. \end{aligned}$$

To ensure that the right hand side of the above inequality is less than \(\varepsilon /4\), we require to satisfy two conditions. First, we impose that \(q^{n/m}(4t_0)<1/2\), which is equivalent to \((n\log q)/m\le \log \left( \frac{1}{8t_0}\right) \). Now that \(q^{n/m}(4t_0)<1/2\), we impose \((1/2)^m\le \varepsilon /4\) to enforce the second constraint, that is \(m\ge \log \left( 4\varepsilon ^{-1}\right) \). Combining the above two conditions, it suffices to have

$$\begin{aligned} m\ge \max \left( \frac{n\log q}{\log \left( \frac{1}{8t_0}\right) },\log \left( 4\varepsilon ^{-1}\right) \right) . \end{aligned}$$

By replacing \(t_0\) from (7) and inserting \(\varepsilon ^{-1} = O(\mathrm {poly}(n))\), we get

$$\begin{aligned} m\ge \frac{n\log q}{\log \left( \alpha +\beta \right) ^{-1}}, \end{aligned}$$

if \((\alpha +\beta )^{-1} = 2^{o\left( \frac{n\log q}{\log n}\right) }\).

Third Step The reduction from \(\mathrm {sLWE}_{n,q,\psi ,m}\) to \(\mathrm {sLWE}_{n,q,U_{\beta },m}\) is vacuous: by using the RD (and in particular the probability preservation property of Lemma 2.9), we show that an oracle solving \(\mathrm {sLWE}_{n,q,U_{\beta },m}\) also solves \(\mathrm {sLWE}_{n,q,\psi ,m}\).

Lemma 5.2

Let \(\alpha ,\beta \) be real numbers with \(\alpha \in (0, 1/e)\) and \(\beta \ge \alpha \). Let \(\psi = D_{\alpha }+U_{\beta }\). Then

$$\begin{aligned} R_2( U_{\beta } \Vert \psi ) = 1 + \frac{1}{1 - e^{- \pi \beta ^2/\alpha ^2}} \frac{\alpha }{\beta } < \ 1 + 1.05 \cdot \frac{\alpha }{\beta }. \end{aligned}$$

Proof

The density function of \(\psi \) is the convolution of the density functions of \(D_{\alpha }\) and \(U_{\beta }\):

$$\begin{aligned} f_{\psi }(x) = \frac{1}{2 \alpha \beta } \int _{-\beta }^{\beta } e^{\frac{- \pi (x - y)^2}{\alpha ^2} } \mathrm{d}y. \end{aligned}$$

Using Rényi of order 2, we have:

$$\begin{aligned} R_2( U_{\beta } \Vert \psi ) = \int _{- \beta }^{\beta } \frac{\frac{1}{(2 \beta )^2}}{\frac{1}{2 \alpha \beta } \int _{- \beta }^{\beta } e^{ \frac{- \pi (x - y)^2}{\alpha ^2} } \mathrm{d}y} \mathrm{d}x = \frac{\alpha }{\beta } \int _{0}^{\beta } \frac{1}{\int _{- \beta }^{\beta } e^{ \frac{- \pi (x - y)^2}{\alpha ^2} } \mathrm{d}y} \mathrm{d}x. \end{aligned}$$

The denominator in the integrand is a function for \(x \in [0, \beta ]\).

$$\begin{aligned} \phi (x) = \alpha - \int _{\beta +x}^{\infty } \exp \left( \frac{-\pi y^2}{\alpha ^2}\right) \; \mathrm {d} y - \int _{\beta -x}^{\infty } \exp \left( \frac{-\pi y^2}{\alpha ^2}\right) \; \mathrm {d} y . \end{aligned}$$

For standard Gaussian, we use the following tail bound [9]:

$$\begin{aligned} \frac{1}{\sqrt{2 \pi }} \int _{z}^\infty \mathrm{e}^{-x^2/2} \mathrm{d}x \le \frac{1}{2} \mathrm{e}^{-z^2/2}. \end{aligned}$$

Then we have

$$\begin{aligned} \phi (x) \ge \alpha \left( 1 - \frac{1}{2} \exp \left( \frac{- \pi (\beta +x)^2}{\alpha ^2}\right) - \frac{1}{2} \exp \left( \frac{- \pi (\beta -x)^2}{\alpha ^2}\right) \right) . \end{aligned}$$

Taking the reciprocal of above, we use the first-order Taylor expansion. Note here

$$\begin{aligned} ~ t(x) = \frac{1}{2} \exp \left( \frac{- \pi (\beta +x)^2}{\alpha ^2}\right) + \frac{1}{2} \exp \left( \frac{- \pi (\beta -x)^2}{\alpha ^2}\right) . \end{aligned}$$
(10)

We want to bound the function t(x) by a constant \(c \in (0, 1)\). Here t(x) is not monotonic. We take the maximum of the first term and the maximum of the second term of t(x) in (10). Let \(\sigma _{\alpha ,\beta }\) denote \(\frac{1}{2}e^{- \pi \beta ^2/\alpha ^2}\), then an upper bound (\(\beta \ge \alpha \)) is:

$$\begin{aligned} t(x) \le \frac{1}{2}e^{- \pi \beta ^2/\alpha ^2} + \frac{1}{2} = \sigma _{\alpha , \beta } + \frac{1}{2} < 1. \end{aligned}$$

We then use the fact that  \(\frac{1}{1-t(x)}= 1 + \frac{1}{1-t(x)} t(x) \le 1 + \frac{1}{1-2\sigma _{\alpha , \beta }} t(x)\) to bound the Rényi divergence of order 2.

$$\begin{aligned} \begin{aligned} R_2( U_{\beta } \Vert \psi )&= \frac{\alpha }{\beta } \int _{0}^\beta \frac{1}{ \phi (x)} \mathrm {d} x \\&\le \frac{1}{\beta } \int _{0}^\beta \frac{1}{ 1 - \frac{1}{2} \exp \left( \frac{- \pi (\beta +x)^2}{\alpha ^2}\right) - \frac{1}{2} \exp \left( \frac{- \pi (\beta -x)^2}{\alpha ^2}\right) } \mathrm {d} x \\&\le \frac{1}{\beta } \int _{0}^\beta \left( 1 + \frac{1}{1-2\sigma _{\alpha , \beta }} \exp \left( \frac{- \pi (\beta +x)^2}{\alpha ^2}\right) \right. \\&\ \ + \left. \frac{1}{1-2\sigma _{\alpha , \beta }} \exp \left( \frac{- \pi (\beta -x)^2}{\alpha ^2}\right) \right) \mathrm {d} x \\&= 1 + \frac{1}{(1-2\sigma _{\alpha , \beta })\beta } \int _{0}^{2\beta } \exp \left( \frac{- \pi x^2}{\alpha ^2}\right) \mathrm {d} x \\&= 1 + \frac{1}{2(1-2\sigma _{\alpha , \beta })\beta } \int _{-2\beta }^{2\beta } \exp \left( \frac{- \pi x^2}{\alpha ^2}\right) \mathrm {d} x \\&= 1 + \frac{\alpha }{(1-2\sigma _{\alpha , \beta })\beta } (1 - 2 D_\alpha (2\beta )) \le 1 + \frac{1}{1-2\sigma _{\alpha , \beta }}\frac{\alpha }{\beta }.\\ \end{aligned} \end{aligned}$$

Hence we have the bound

$$\begin{aligned} R_2( U_{\beta } \Vert \psi ) \le 1 + \frac{1}{1 - e^{- \pi \beta ^2/\alpha ^2}} \frac{\alpha }{\beta }. \end{aligned}$$

The second bound in the lemma statement follows from the fact that

$$\begin{aligned} \frac{1}{1 - \mathrm{e}^{- \pi \beta ^2/\alpha ^2}} < 1.05, \end{aligned}$$

for \(\beta \ge \alpha \). \(\square \)

The RD multiplicativity property (see Lemma 2.9) implies that for m independent samples, we have \(R_2( U_{\beta }^m \Vert \psi ^m ) \le R_2( U_{\beta } \Vert \psi )^m\). To ensure that the latter mth power is polynomial in n, we use Lemma 5.2 with \(\beta = \varOmega ( m \alpha /\log n )\); with this choice, we have \(R_2( U_{\beta } \Vert \psi ) = 1+ O(\frac{\alpha }{\beta }) \le \exp (O(\frac{\alpha }{\beta }))\) and \(R_2( U_{\beta } \Vert \psi )^m = n^{O(1)}\). The RD probability preservation and data processing properties (see Lemma 2.9) now imply that if an oracle solves \(\mathrm {sLWE}_{n,q,U_{\beta },m}\) with probability \(\varepsilon \), then it also solves \(\mathrm {sLWE}_{n,q,\psi ,m}\) with probability \(\varepsilon ' \ge \varepsilon ^2/R_2 (U_\beta \Vert \phi )^m \ge \varepsilon ^2/n^{O(1)}\).

Fourth step. We reduce \(\mathrm {sLWE}_{n,q,U_{\beta },m}\) with continuous noise \(U_{\beta }\) to \(\mathrm {sLWE}_{n,q,\phi ,m}\) with discrete noise \(\phi = \frac{1}{q} \lfloor q U_{\beta } \rceil \) with support contained in \({\mathbb {T}}_q\), by rounding to the nearest multiple of \(\frac{1}{q}\) any provided \(b_i\) (for \(i \le m\)).

Fifth step. We reduce \(\mathrm {sLWE}_{n,q,\phi ,m}\) to \(\mathrm {LWE}_{n,q,\phi ,m}\) by invoking Theorem 2.7. \(\square \)

6 Application to Learning with Rounding (LWR)

In this section, we first review (in Theorem 6.1) and combine with other results (in Theorem 6.2) the recent hardness result of Bogdanov et al. [4] for the Learning With Rounding (LWR) problem introduced in Banerjee et al. [7], based on the hardness of the standard LWE problem. This result of Bogdanov et al. [4] makes use of RD (inspired by an earlier version of our work) within a proof that can be seen as a variant of the Micciancio-Mol search to decision reduction for LWE [23]. Then, we show (in Theorem 6.4) a new dimension-preserving hardness result for LWR, obtained by composing our RD-based hardness result for LWE with uniform noise from the previous section with another reduction from Bogdanov et al. [4] (which we rephrase in Theorem 6.3) that reduces LWE with uniform noise to LWR. Interestingly, our new reduction for LWR also makes use of the Micciancio-Mol reduction [23], but unlike the LWR reduction in Theorem 6.1, ours uses [23] as a black box within the reduction of Theorem 5.1.

6.1 Adapted Results from [4]

We first recall the main hardness result on LWR from Bogdanov et al. [4].

Theorem 6.1

([4, Theorem 3])  For every \(\varepsilon >0\)nm\(q>2pB\), and algorithm \(\mathrm {Dist}\) such that

$$\begin{aligned} \left| \mathrm {Pr}_{\mathbf{A},\mathbf {s}}\left[ \mathrm {Dist}\left( \mathbf{A},\lfloor \mathbf{A}\mathbf {s} \rceil _p \right) =1\right] -\mathrm {Pr}_{\mathbf {u}}\left[ \mathrm {Dist}\left( \mathbf{A},\lfloor \mathbf {u}\rceil _p\right) =1\right] \right| \ge \varepsilon \end{aligned}$$

where \(\mathbf{A}\hookleftarrow U\left( \mathbb {Z}^{m\times n}_q\right) \)\(\mathbf {s}\hookleftarrow U\left( \{0,1\}^n\right) \) and \(\mathbf {u}\hookleftarrow U\left( \mathbb {Z}^m_q\right) \) there exists an algorithm \(\mathrm {Learn}\) that runs in time polynomial in nm, the number of divisors of q, and the running time of \(\mathrm {Dist}\) such that

$$\begin{aligned} ~ \mathrm {Pr}_{\mathbf{A},\mathbf {s}}\left[ \mathrm {Learn}\left( \mathbf{A},\mathbf{A}\mathbf {s}+\mathbf {e}\right) =\mathbf {s}\right] \ge \left( \frac{\varepsilon }{4qm}-\frac{2^n}{p^m}\right) ^2\cdot \frac{1}{\left( 1+\frac{2Bp}{q}\right) ^m}, \end{aligned}$$
(11)

for any noise distribution \(\mathbf {e}\) that is B-bounded and B-balanced in each coordinate.

We now combine Theorem 6.1 with other results to state it as a reduction from the standard LWE problem, so that it would be comparable with our alternative reduction.

Theorem 6.2

  Let \(qm = O\left( \mathrm {poly}(n)\right) \), and \(n \le m \le O\left( \frac{\sqrt{\log n}}{p\alpha }\right) \). Then there is a polynomial-time reduction from \(\mathrm {LWE}_{n/\log q,q,D_\alpha ,m}\) to \(\mathrm {LWR}_{n,q,p,m}\).

Proof

The reduction can be obtained in the following five steps:

  • A reduction from \(\mathrm {LWE}_{n/\log q,q,D_\alpha ,m}\) to \(\mathrm {binLWE}_{n,q,D_\alpha ,m}\),

  • A trivial reduction from \(\mathrm {binLWE}_{n,q,D_\alpha ,m}\) to \(\mathrm {sbinLWE}_{n,q,D_\alpha ,m}\),

  • A reduction from \(\mathrm {sbinLWE}_{n,q,D_\alpha ,m}\) to \(\mathrm {sbinLWE}_{n,q,D'_{\alpha ,B'},m}\), with \(D'_{\alpha ,B'}\) the distribution \(D_{\alpha }\) truncated (by rejection) to the interval \([-B',B']\),

  • A reduction from \(\mathrm {sbinLWE}_{n,q,D'_{\alpha ,B'},m}\) to \(\mathrm {sbinLWE}_{n,q,\phi ,m}\), with \(\phi =\frac{1}{q} \lfloor q D'_{\alpha ,B'} \rceil \),

  • A reduction from \(\mathrm {sbinLWE}_{n,q,\phi ,m}\) to \(\mathrm {LWR}_{n,q,p,m}\) via Theorem 6.1.

The first reduction is taken from Brakerski et al. [6]. The second one is just the trivial decision to search reduction for binary secret LWE. Note that we provided such a reduction (see the second step of proof of Theorem 5.1) for a more general setting. In fact, there we had binary secret LWE with \(\psi =D_\alpha +U_\beta \) as the error distribution while we have non-binary secret LWE and Gaussian noise \(D_{\alpha }\) here. If we simplify the constraints appeared there, we simply get \(m\ge n/\log \left( \alpha ^{-1}\right) \), which can be further relaxed to \(m\ge n\). The third reduction is vacuous and consists in applying the \(R_{\infty }\) probability preservation property from Lemma 2.9 and the m-sample Gaussian tail-cut Lemma 2.11 that ensures that this reduction preserves success probability up to a constant factor by setting \(B' = \alpha q \sqrt{\ln (2m)/\pi }\). The fourth reduction consists of applying \(\frac{1}{q} \lfloor q (\cdot ) \rceil \) to all samples. With this, we have only changed the noise distribution from Gaussian \(D_\alpha \) with standard deviation \(\alpha q\) to its quantized version \(\phi \). This only adds a rounding error of magnitude \(\le 1/2\). The last step is exactly Theorem 6.2 mentioned above. The last step reduction holds if (i) the distribution \(\phi \) be B-bounded, and, to ensure the reduction is probabilistic polynomial-time, we need that (ii) the right hand side of (11) is at least \(\varepsilon ^{O(1)}/n^{O(1)}\). The obtained distribution \(\phi \) in (i) is both B-bounded and B-balanced with \(B = \alpha q \sqrt{\ln (2m)/\pi } +1/2\). For the second condition (ii), we note that there are two terms in the right hand side of (11). We first claim that

$$\begin{aligned} \frac{\varepsilon }{8qm}>\frac{2^n}{p^m}, \end{aligned}$$

for \(q=\mathrm {poly}(n)\) and \(\varepsilon ^{-1} = 2^{o(n)}\). To prove this claim, first note that

$$\begin{aligned} \frac{\varepsilon }{8qm}>\frac{2^n}{p^m}\Leftrightarrow & {} n-m\log p<\log \left( \frac{\varepsilon }{8qm}\right) \\\Leftrightarrow & {} m>\frac{n+\log \left( 8qm\varepsilon ^{-1}\right) }{\log p}. \end{aligned}$$

Now, \(qm=\mathrm {poly}(n)\) and \(\varepsilon ^{-1} = n^{O(1)}\) and the assumption \(m \ge n \ge 2n/\log (p)\) imply the above condition for sufficiently large n. Hence, for the first term we get

$$\begin{aligned} \left( \frac{\varepsilon }{4qm}-\frac{2^n}{p^m}\right) ^2>\left( \frac{\varepsilon }{8qm}\right) ^2, \end{aligned}$$

which is \(\ge \varepsilon ^{O(1)}/n^{O(1)}\) using \(qm = O\left( \mathrm {poly}(n)\right) \). For the second term, we get

$$\begin{aligned} ~ \left( 1+\frac{2Bp}{q}\right) ^m\le \exp \left( \frac{2Bpm}{q}\right) , \end{aligned}$$
(12)

since for positive x and y, we have \((1+x)^y\le \exp (xy)\). The right hand side of (12) is less than \(n^{O(1)}\) if \(2Bpm/q\le O\left( \log n\right) \). Replacing B by the value derived from condition in (i), and using that \(m \ge n\), we get that some \(m=O\left( \sqrt{\log n}/(p\alpha )\right) \) suffices. \(\square \)

Below, we will give a tighter reduction than above from LWE to LWR. We will make use of the theorem below.

Theorem 6.3

(Adapted from [4, Theorem 13])  Let p and q be two integers such that p divides q and let \(\beta = q/(2p)\). If we have a T-time distinguisher for \(\mathrm {LWR}_{n,q,p,m}\) with advantage \(\varepsilon \), then we can construct a \(T'= O\left( T + m' n \cdot \mathrm {poly}(\log q) \right) \) time distinguisher for \(\mathrm {LWE}_{n,q,U_\beta ,m'}\) with \(m'=m \cdot q/p\) and advantage \(\varepsilon '\ge \varepsilon /2\).

Proof

The proof follows the steps of the proof of Theorem 13 in Bogdanov et al. [4]. Suppose that we have access to a T-time distinguisher which runs over m samples \((\mathbf {a},b)=\left( \mathbf {a}, \langle \mathbf {a},\mathbf {s}\rangle +e\right) \) for \(\mathbf {a}\hookleftarrow U(\mathbb {Z}_q^n)\), and

$$\begin{aligned} e\hookleftarrow \left[ -\frac{q}{2p},\ldots ,\frac{q}{2p}\right) \subseteq \mathbb {Z}_q. \end{aligned}$$

The authors of [4] run the LWE oracle until they hit a ‘good’ sample \((\mathbf {a},b)\) with \(b \in (q/p)\mathbb {Z}_p\) and output the LWR sample \((\mathbf {a},(p/q)b)\in \mathbb {Z}_q^n\times \mathbb {Z}_p\). Since the LWE error e is distributed uniformly in \(U_{\beta }\), each sample output by the LWE oracle is ‘good’ with probability p / q, and the expected number of LWE samples needed by this reduction to produce m LWR samples is therefore \(m' = m \cdot q/p\). Instead, here we modify the reduction to work with a fixed number \(m' = m \cdot q/p\) of LWE samples. Namely, if the \(m' = m \cdot q/p\) given LWE samples contain at least m ‘good’ samples (which we call event \(\mathsf {Good}\)), the modified reduction uses them to compute m LWR samples and runs the LWR distinguisher on them, outputting whatever it outputs, as in Bogdanov et al. [4]. Else, if the \(m'\) given LWE samples contain \(<m\) ‘good’ samples, the LWE distinguisher outputs 0. The proof of Theorem 13 in [4] shows that conditioned on event \(\mathsf {Good}\), the input samples to the LWR distinguisher come from the LWR distribution (resp. uniform distribution) if the LWE oracle generates samples from the LWE distribution (resp. uniform distribution). It follows that the advantage of our LWE distinguisher is \(\ge \Pr [\mathsf {Good}] \cdot \varepsilon \ge \varepsilon /2\), where we have used the fact that \(\Pr [\mathsf {Good}] \ge 1/2\), since the number of ‘good’ samples is binomially distributed with parameters \((m',p/q)\) and has median \(m' \cdot p/q = m\). \(\square \)

6.2 New Results

One can compose the reduction in Theorem 6.3 with ours from LWE with Gaussian noise to LWE with uniform noise (Theorem 5.1) to get a new reduction from LWE to LWR. Hence, this combination can be summarized as:

Theorem 6.4

  Let p divide q\(m'=m \cdot q/p\) with \(m=O\left( \log n/\alpha \right) \) for \(m'\ge m\ge n\ge 1\). There is a polynomial-time reduction from \(\mathrm {LWE}_{n,q,D_\alpha ,m'}\) to \(\mathrm {LWR}_{n,q,p,m}\).

Proof

Let \(\beta = q/(2p)\). The reduction has two steps:

  • A reduction from \(\mathrm {LWE}_{n,q,D_\alpha ,m'}\) to \(\mathrm {LWE}_{n,q,U_\beta ,m'}\),

  • A reduction from \(\mathrm {LWE}_{n,q,U_\beta ,m'}\) to \(\mathrm {LWR}_{n,q,p,m}\).

On the one hand, \(\mathrm {LWE}_{n,q,U_\beta ,m'}\) is at least as hard as \(\mathrm {LWE}_{n,q,D_\alpha ,m'}\) where \(\beta = \varOmega \left( m'\alpha /\log n\right) \) (see Theorem 5.1). On the other hand, the second phase of the reduction follows from Theorem 6.3; namely we have a reduction from \(\mathrm {LWE}_{n,q,U_\beta ,m'}\) to \(\mathrm {LWR}_{n,q,p,m}\) subject to the condition that p divides q and \(m'= m \cdot q/p\). Combining these two reductions completes the proof. Note that, by putting all the conditions together, it turns out that

$$\begin{aligned} \beta =\varOmega \left( \frac{m'\alpha }{\log n}\right) \Leftrightarrow \frac{q}{2p}\ge \frac{\frac{mq}{p}\alpha }{\log n} \Leftrightarrow m=O\left( \frac{\log n}{\alpha }\right) , \end{aligned}$$

where the first equivalence is derived by replacing \(\beta \) and \(m'\), by q / (2p) and mq / p. \(\square \)

Table 3 compares the parameters of Theorems 6.2 and 6.4, and a reduction from [3]. The reduction in Theorem 6.2 loses a \(\log q\) factor in dimension, while our uniform noise reduction preserves the dimension, which is the first of its kind without resorting to the noise-flooding technique (as [7]). On the downside, our reduction does not preserve the number of samples.

Table 3 Comparing the main parameters of different reductions from \(\mathrm {LWE}_{n',q,D_\alpha ,m'}\) to \(\mathrm {LWR}_{n,q,p,m}\) for a fixed  n and another flexible parameter \(\gamma \ge 1\)

Note that setting \(\gamma =1\) gives \(n'\) equal to that of Theorem 6.2, while it loses an extra factor n in the denominator of m. On the other hand, setting \(\gamma =q\) allows for approximately \(n=n'\), however for an expense of much smaller m. The reduction in Theorem 6.2 also restricts the number of LWR samples m by a further \(O\left( p\sqrt{\log n}\right) \) factor in comparison with our results. This factor is equal to \(O\left( \gamma pn\sqrt{\log n}\right) \) if we compare our result with that of Theorem 4.1 from [3].

7 Open Problems

Our results show the utility of the Rényi divergence in several areas of lattice-based cryptography. A natural question is to find further new applications of RD to improve the efficiency of cryptosystems. Our results suggest some natural open problems, whose resolution could open up further applications. In particular, can we extend the applicability of RD to more general distinguishing problems than those satisfying our ‘public sampleability’ requirement? This may extend our results further. For instance, can we use RD-based arguments to prove the hardness of LWE with uniform noise without using the search to decision reduction of Micciancio and Mol [23]? This may allow the proof to apply also to Ring-LWE with uniform noise and Ring-LWR. Another open problem will be discussed in the next section.

7.1 GPV Signature Scheme

The RD can also be used to reduce the parameters obtained via the SD-based analysis of the GPV signature scheme in Gentry et al. [16].

In summary, the signature and the security proof from Gentry et al. [16] work as follows. The signature public key is a matrix \(\mathbf{A} \in {\mathbb {Z}}_q^{n \times m}\) with n linear in the security parameter \(\lambda \)\(q = \mathrm {poly}(n)\), and \(m=O(n \log q)\). The private signing key is a short basis matrix \(\mathbf{T}\) for the lattice \(\varLambda _\mathbf{A}^{\perp } = \{\mathbf {x} \in {\mathbb {Z}}^m: \mathbf{A}\cdot \mathbf {x} = \mathbf {0} \bmod q\}\), whose last successive minimum satisfies \(\lambda _m\left( \varLambda _\mathbf{A}^{\perp }\right) \le O(1)\) when \(m = \varOmega (n\log q)\) (see [16]). A signature \((\mathbf {\sigma }, s)\) on a message M is a short vector \(\mathbf {\sigma }\in {\mathbb {Z}}^m\) and a random salt \(s \in \{0,1\}^{\lambda }\), such that \(\mathbf{A} \cdot \mathbf {\sigma }= H(M,s) \bmod q\), where H is a random oracle hashing into \({\mathbb {Z}}_q^n\). The short vector \(\mathbf {\sigma }\) is sampled by computing an arbitrary vector \(\mathbf {t}\) satisfying \(\mathbf{A} \cdot \mathbf {t} = H(M,s) \bmod q\) and using \(\mathbf{T}\) along with a Gaussian sampling algorithm (see [6, 16]) to produce a sample from \(\mathbf {t}+D_{\varLambda _\mathbf{A}^{\perp },r, -\mathbf {t}}\).

The main idea in the security proof from the SIS problem [16] is based on simulating signatures without \(\mathbf{T}\), by sampling \(\mathbf {\sigma }\) from \(D_{{\mathbb {Z}}^m,r}\) and then programming the random oracle H at (Ms) according to \(H(M,s) = \mathbf{A} \cdot \mathbf {\sigma }\bmod q\). As shown in Gentry et al. [16, Lemma 5.2], the conditional distribution of \(\mathbf {\sigma }\) given \(\mathbf{A} \cdot \mathbf {\sigma }\bmod q\) is exactly the same in the simulation and in the real scheme. Therefore, the SD between the simulated signatures and the real signatures is bounded by the SD between the marginal distribution \(D_1\) of \(\mathbf{A} \cdot \mathbf {\sigma }\bmod q\) for \(\mathbf {\sigma }\hookleftarrow D_{{\mathbb {Z}}^m,r}\) and \(U({\mathbb {Z}}_q^m)\). This SD for one signature is bounded by \(\varepsilon \) if \(r \ge \eta _{\varepsilon }\left( \varLambda _\mathbf{A}^{\perp }\right) \). This leads, over the \(q_s\) sign queries of the attacker, in the SD-based analysis of Gentry et al. [16], to take \(\varepsilon = O(2^{-\lambda }q^{-1}_s)\) and thus \(r = \varOmega (\sqrt{\lambda + \log q_s})\) (using Lemma 2.2), in order to handle attackers with success probability \(2^{-o(\lambda )}\).

Now, by Lemma 2.10, we have that the RD \(R_{\infty }(D_1\Vert U)\) is bounded by \(1+c \cdot \varepsilon \) for one signature, for some constant c. By the multiplicativity property of Lemma 2.9, over \(q_s\) queries, it is bounded by \((1+c\varepsilon )^{q_s}\). By taking \(\varepsilon = O(q^{-1}_s)\), we obtain overall an RD bounded as O(1) between the view of the attacker in the real attack and simulation, leading to a security proof with respect to SIS but with a smaller \(r = \varOmega (\sqrt{\log (nq_s)}) =\varOmega (\sqrt{\log \lambda +\log q_s})\). When the number of sign queries \(q_s\) allowed to the adversary is much smaller than \(2^{\lambda }\), this leads to significant parameter savings, because SIS’s parameter \(\beta \) is reduced and hence nmq may be set smaller for the same security parameter \(\lambda \).

The above analysis indeed reduces the smoothing condition in the security proof from \(r = \varOmega (\sqrt{\lambda })\) to \(r= \varOmega (\sqrt{\log \lambda })\). But to make Gaussian sampling on \(\varLambda _\mathbf{A}^{\perp }\) efficient in signature generation, we also need r lower bounded by the Euclidean norm of the trapdoor basis for \(\varLambda _\mathbf{A}^{\perp }\). The latter is lower bounded by \(\lambda _1\left( \varLambda _\mathbf{A}^{\perp }\right) \), which is \(\varOmega (\sqrt{m}) \ge \varOmega (\sqrt{\lambda })\) with high probability. That is actually similar to (or even larger than) the old SD-based smoothing condition. Overall, we relaxed the smoothing condition while the sampling condition remained unchanged. Hence, relaxing both conditions together is left as an open problem.