Efficient Reductions in Cyclotomic Rings - Application to Ring-LWE Based FHE Schemes

Bajard, Jean-Claude; Eynard, Julien; Hasan, Anwar; Martins, Paulo; Sousa, Leonel; Zucca, Vincent

doi:10.1007/978-3-319-72565-9_8

Jean-Claude Bajard¹⁵,
Julien Eynard¹⁶,
Anwar Hasan¹⁶,
Paulo Martins¹⁷,
Leonel Sousa¹⁷ &
…
Vincent Zucca¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10719))

Included in the following conference series:

International Conference on Selected Areas in Cryptography

1301 Accesses
1 Citations

Abstract

With Fully Homomorphic Encryption (FHE), it is possible to process encrypted data without having an access to the private-key. This has a wide range of applications, most notably the offloading of sensitive data processing. Most research on FHE has focused on the improvement of its efficiency, namely by introducing schemes based on Ring-Learning With Errors (RLWE), and techniques such as batching, which allows for the encryption of multiple messages in the same ciphertext. Much of the related research has focused on RLWE relying on power-of-two cyclotomic polynomials. While it is possible to achieve efficient arithmetic with such polynomials, one cannot exploit batching. Herein, the efficiency of ring arithmetic underpinned by non-power-of-two cyclomotic polynomials is analyzed and improved. Two methods for polynomial reduction are proposed, one based on the Barrett reduction and the other on a Montgomery representation. Speed-ups up to 2.66 are obtained for the reduction operation using an i7-5960X processor when compared with a straightforward implementation of the Barrett reduction. Moreover, the proposed methods are exploited to enhance homomorphic multiplication of Fan-Vercauteren (FV) and Brakerski-Gentry-Vaikuntantahan (BGV) encryption schemes, producing experimental speed-ups up to 1.37.

You have full access to this open access chapter, Download conference paper PDF

Batched Fully Dynamic Multi-key FHE from FHEW-Like Cryptosystems

Multi-Key Fully Homomorphic Encryption Scheme Over the Integers

F-FHEW: High-Precision Approximate Homomorphic Encryption with Batch Bootstrapping

Keywords

1 Introduction

There is an increasing discord between the convenience provided by cloud services and privacy concerns. Privacy can be achieved by encrypting data before uploading it to the cloud. However, with traditional cryptosystems, one is not able to process encrypted data [14], nullifying the benefits of cloud computing. A solution to this problem is the application of FHE, which allows for the creation of malleable cryptograms [14]. Homomorphic operations can afterwards be applied to these cryptograms. Despite its wide range of applicability, FHE has seldom been applied in practice due to its low computational performance.

Most recent research on FHE has focused on improving its efficiency [23]. LWE [27] and its ring-variant RLWE [21] have been suggested in this context as a framework for improving the complexity of FHE. The benefits brought forth by RLWE are twofold. First, operations are executed in a cyclotomic ring, and therefore benefit from its algebraic structure. Second, the plaintext space is isomorphic to the Cartesian product of multiple smaller spaces, designated batching slots, allowing for multiple messages to be encrypted and processed in a single ciphertext in parallel. Most of the related research has focused on cyclotomic rings of the form $\mathbb {Z}[X]/(X^N + 1)$, for N a power of two, due to the simple arithmetic associated with the resulting ring. However, they only allow for a single batching slot. While RLWE modulo other cyclotomic polynomials, providing for a large number of batching slots, has been previously considered [12, 28], it remains very much unused much because of its less efficient arithmetic. In [22], Lyubashevsky et al. proposed algorithms using a multivariate/tensored representation of polynomials, natively supporting operations on any cyclotomic rings.

In this work we analyse and improve the arithmetic associated with the univariate representation of polynomials in general cyclotomic rings. Efficient reduction algorithms are proposed, which have the same asymptotic complexities as those of [21]. The Barrett reduction, which is a generic technique to perform polynomial reduction, had been considered in [9] to perform such reductions. However, the algorithms used in [9] do not take into account the characteristics of cyclotomic polynomials, and hence are sub-optimal. Herein, the degree of the polynomials is reduced using cyclotomic properties before applying Barrett’s algorithm, decreasing the complexity of the reduction and leading to theoretical speed-ups up to 2.06. We also show that in this context a Montgomery representation leads to more efficient reductions than generic Barrett algorithms, leading to theoretical speed-ups up to 2.63.

These gains have been confirmed in practice, with experimental speed-ups of up to 1.95 and 2.55 for the improved Barrett and Montgomery reductions, respectively, when compared with a straightforward Barrett algorithm in an i7-5960X processor. Moreover we have tested the applicability of our algorithms to two of the most currently used homomorphic encryption schemes, namely BGV [8] developed in HELib [16] and FV [10] developped in SEAL 2.0 [19] resulting in speed-ups up to 1.37 for homomorphic multiplication.

2 Background

Throughout the paper, $\varvec{\phi }_m(X)\in \mathbb {Z}[X]$ will denote the m-th cyclotomic polynomial of degree $n = \varphi (m)$, where $\varphi $ is Euler’s totient function. The ring $\mathcal {R}=\mathbb {Z}[X]/(\varvec{\phi }_m(X))$ is the main structure of R-LWE-based schemes such as FV and BGV. An element of $\mathcal {R}$ can be thought of as a polynomial with integer coefficients and a degree strictly smaller than n. Unless mentioned otherwise, polynomials are represented in the power-basis $\{1,X,\ldots X^{n-1}\}$. For $\varvec{a} = \sum _{i=0}^{n-1}a_iX^i\in \mathbb {Z}[X]$, we denote $\Vert \varvec{a}\Vert _\infty =\text {max}\{|a_i|, 0\leqslant i< n\}$. The underlying space for ciphertexts is $\mathcal {R}_q = \mathcal {R}/q\mathcal {R}=\mathbb {Z}/q\mathbb {Z}[X]/(\varvec{\phi }_m(X))$, which is composed of elements of $\mathcal {R}$ with coefficients reduced modulo q. The notation $|\cdot |_q$ denotes the classical residue modulo q in [0, q), while the centered residue in $[-q/2,q/2)$ is denoted by $[\cdot ]_q$. Moreover $\lfloor \cdot \rfloor $ denotes flooring while $\lfloor \cdot \rceil $ denotes rounding to the nearest integer.

2.1 Residue Number System (RNS)

In practice, the value of q underpinning $\mathcal {R}_q$ is chosen to be the product of small different prime numbers $q=q_1\cdots q_k$. Therefore, thanks to the Chinese Remainder Theorem (CRT), $\mathcal {R}_q$ splits into a cartesian product of smaller rings through the following isomorphism:

$$\begin{aligned} \text {RNS}_{q=q_1\ldots q_k}:\left| \begin{array}{rcl} \mathcal {R}_q&{}\rightarrow &{}\mathcal {R}_{q_1}\times \ldots \times \mathcal {R}_{q_k}\\ \varvec{a} &{} \mapsto &{} (\varvec{a}\bmod q_1,\ldots ,\varvec{a}\bmod q_k) \end{array}\right. \end{aligned}$$

(1)

The RNS exploits (1) to transfer the arithmetic modulo q of the coefficients to k smaller arithmetics modulo each $q_i$. Thus, better performance is achieved due to smaller arithmetics and parallel computations over each $\mathcal {R}_{q_i}$.

2.2 Product of Elements in $\mathcal {R}_q$

Since the degree n of the polynomials is large in practice, the polynomial product is a major bottleneck for the efficiency of R-LWE based schemes. However, when n is close or equal to a power of two, it can be performed efficiently thanks to the Number Theoretic Transform (NTT). Let ${\mathcal {N}_2}$ be the function defined over $\mathbb {N}$ such that for any $n\in \mathbb {N} \ {\mathcal {N}_2}(n)$ is the smallest power of two greater than or equal to n. In our context the products have a degree strictly smaller than 2n, therefore we denote $N={\mathcal {N}_2}(2n)$. For a prime q, a primitive $N^{\text {th}}$-root of unity $\omega \in \mathbb {F}_q$ exists if and only if $q\equiv 1\bmod N$. For a ring $\mathbb {Z}/q\mathbb {Z}$ equipped with $\omega $, the following ring morphism is a bijection:

$$\begin{aligned} \text {NTT}_{q,N,\omega }:\left| \begin{array}{rcl} \mathbb {Z}/q\mathbb {Z}[X]/(X^N-1)&{}\rightarrow &{} \mathbb {Z}/q\mathbb {Z}[X]/(X-1)\times \cdots \times \mathbb {Z}/q\mathbb {Z}[X]/(X-\omega ^{N-1}) \\ \varvec{a} &{} \mapsto &{} \varvec{\widehat{a}} = \left( \varvec{a}(1),\varvec{a}(\omega ),\ldots ,\varvec{a}(\omega ^{N-1})\right) \end{array}\right. \end{aligned}$$

(2)

Once (2) is applied to obtain the $\text {NTT}$ representation of polynomials, their product can be performed coordinate-wise. The time-complexity of arithmetic becomes linear in N and the bottleneck changes to the evaluation of (2), as well as of its inverse, whose evaluations require $O(N\log (N))$ multiplications in $\mathbb {Z}/q\mathbb {Z}$ by state-of-the-art algorithms [17, 20]. When the context is clear, we will denote an NTT transformation of degree N by $\text {NTT}_{N}$ instead of $\text {NTT}_{q,N,\omega }$.

In order to efficiently compute the product of elements $\varvec{a}$ and $\varvec{b}$ of $\mathcal {R}_q$, seen as polynomials over $\mathbb {Z}/q\mathbb {Z}[X]$ of degree at most $n-1$, one has to compute the NTT representation of $\varvec{c}=\varvec{a}\times \varvec{b}$ of degree $2n-2$ through the following formula:

$$\begin{aligned} \text {NTT}_{N}(\varvec{c}) =\text {NTT}_{N}(\varvec{a}) \odot \text {NTT}_{N}(\varvec{b}) \end{aligned}$$

(3)

where $\odot $ denotes the component-wise multiplication in $\mathbb {Z}/q\mathbb {Z}$. To obtain $\varvec{c} = \varvec{a} \times \varvec{b} \in \mathcal {R}_q$, a second step is needed, which consists in reducing the result of (3) modulo $\varvec{\phi }_m(X)$. Notice that when m is a power of two $(\varvec{\phi }_m(X)=X^{m/2}+1)$, one can use a negatively-wrapped convolution for the evaluation of the $\text {NTT}$ and its inverse [20]. This allows us to use an $\text {NTT}$ of size $\mathcal {N}_2(n)$ instead of $\mathcal {N}_2(2n)$ for the evaluation of (3) and also to recover the polynomial reduced modulo $X^{m/2}+1$ at the end of (3) just with an inverse $\text {NTT}$ of size $\mathcal {N}_2(n)$. However, for other values of m this method cannot be applied and a more complex reduction has to be carried out after applying (3).

Barrett’s strategy for modular reduction over the integers [5] can be adapted to polynomial modular arithmetic to reduce a polynomial $\varvec{c}$ of degree $n+\alpha $ by $\varvec{\phi }_m$ of degree n. The quotient of the Euclidean division $\lfloor \varvec{c} / \varvec{\phi }_m \rfloor $ is computed through multiplications by precomputed constants and shifts. We present this strategy in Algorithm 1. The performance of the algorithm is directly related to the size of the polynomial to be reduced: the algorithm is more efficient when $\alpha $ is small. By denoting $\tilde{n}=\mathcal {N}_2(n)$ and $A=\mathcal {N}_2(2\alpha +1)$ the cost of the algorithm is:

$$\begin{aligned} \mathcal {C}_{\text {NTT}}(N)+2\mathcal {C}_{\text {NTT}}(\tilde{n}) + 2\mathcal {C}_{\text {NTT}}(A) + (\tilde{n}+A)\texttt {MMult}_q \end{aligned}$$

where $\mathcal {C}_{\text {NTT}}(x)$ denotes the cost for evaluating (2) (or its inverse) of size x and $\texttt {MMult}_q$ the cost of a modular multiplication modulo q. One may also notice that Barrett’s reduction used in [9] uses NTT of size $N=\mathcal {N}_2(2n)$ to perform the second product while in fact it only requires NTT of size $\mathcal {N}_2(n)$. For the sake of completeness a proof of the correctness of Algorithm 1 is given in Appendix A.1.

2.3 RNS Variant of the FV and BGV Encryption Schemes

Fan and Vercauteren [10] have adapted Brakerski’s scale invariant FHE scheme [7] to the RLWE framework. More recently, Bajard et al. have provided a full RNS variant of FV [3]. We briefly recall how this RNS variant works.

We first need to introduce the two functions $\xi _q: \mathcal {R}_q\rightarrow \mathcal {R}_{q_1}\times \cdots \times \mathcal {R}_{q_k}$ and $\mathcal {P}_{\texttt {RNS},q}:\mathcal {R}_q\rightarrow \mathcal {R}_q^k$ such that for any $\varvec{a}\in \mathcal {R}_q$, $\xi _q(\varvec{a}) = (|\varvec{a}\cdot q_i/q|_{q_i})_{i\in [1,k]}$ and $\mathcal {P}_{\texttt {RNS},q}(\varvec{a}) = (|\varvec{a}\cdot q/q_i|_q)_{i\in [1,k]}$. It is straightforward to notice that for any $(\varvec{a},\varvec{b})\in \mathcal {R}_q^2$, $\left\langle \xi _q(\varvec{a}),\mathcal {P}_{\texttt {RNS},q}(\varvec{b})\right\rangle \equiv \varvec{ab} \bmod q$. This scalar product occurs in $\mathcal {R}_q$, and so it is composed of polynomial products.

In the context of the FV scheme, the secret-key $\varvec{s} \in \mathcal {R}$ is defined as a “small” polynomial drawn from a distribution $\chi _{key}$. An encryption of $\varvec{m}\in \mathcal {R}_t$ corresponds to a pair of polynomials ${{{\mathbf {\mathtt{{ct}}}}}}= (\varvec{c_0},\varvec{c_1}) \in \mathcal {R}_q^2$ satisfying:

$$\begin{aligned}{}[\varvec{c_0} + \varvec{c_1} \varvec{s}]_q = [\lfloor q/t\rfloor \cdot [\varvec{m}]_t+\varvec{v}]_q \end{aligned}$$

(4)

where $\varvec{v}$ is a noise term that is originally introduced during encryption (which is related to a distribution $\chi _{err}$) and that grows as homomorphic operations are applied. Decryption works correctly so long as this noise is below a certain bound, which limits the amount of operations one can perform.

The homomorphic addition of two ciphertexts corresponds to the pairwise addition of the ciphertexts’ polynomials. Regarding homomorphic multiplication, it is useful to see ciphertexts as polynomials of degree 1 with coefficients in $\mathcal {R}$. In this context, homomorphic multiplication takes place in two steps. First, ${{{\mathbf {\mathtt{{ct}}}}}}_{mult}\leftarrow \left( \left[ \left\lfloor \tfrac{t}{q} \varvec{c_0}^1 \varvec{c_0}^2 \right\rceil \right] _q,\left[ \left\lfloor \tfrac{t}{q} \left( \varvec{c_0}^1 \varvec{c_1}^2+\varvec{c_1}^1\varvec{c_0}^2\right) \right\rceil \right] _q,\left[ \left\lfloor \tfrac{t}{q} \varvec{c_1}^1\varvec{c_1}^2 \right\rceil \right] _q \right) $ is computed with a Karatsuba like algorithm. During this procedure, the RNS representations of the input polynomials are extended to bases with larger dynamic ranges so as to compute the products over $\mathcal {R}$ instead of over $\mathcal {R}_q$. Moreover, the division operation is achievable using [3, Sect. 4.4]. Finally, one has to convert the three-element ciphertext back to a two-element ciphertext, through a process called relinearisation. This is done by multiplying $\xi _q({{{\mathbf {\mathtt{{ct}}}}}}_{\texttt {mult},2})$ by pseudo-encryptions of $\mathcal {P}_{\texttt {RNS},q}(\varvec{s}^2)$ (designated $\overrightarrow{{{{\mathbf {\mathtt{{rlk}}}}}}}$), and adding the result to the other two elements:

$$\begin{aligned} \small {{{{\mathbf {\mathtt{{ct}}}}}}_{relin} \leftarrow \left( \left[ {{{\mathbf {\mathtt{{ct}}}}}}_{\texttt {mult},0}+\left\langle \xi _{q}({{{\mathbf {\mathtt{{ct}}}}}}_{\texttt {mult},2}), \overrightarrow{{{{\mathbf {\mathtt{{rlk}}}}}}}_0\right\rangle \right] _q, \left[ {{{\mathbf {\mathtt{{ct}}}}}}_{\texttt {mult},1}+\langle \xi _{q}({{{\mathbf {\mathtt{{ct}}}}}}_{\texttt {mult},2}), \overrightarrow{{{{\mathbf {\mathtt{{rlk}}}}}}}_1\rangle \right] _q\right) } \end{aligned}$$

(5)

The scheme introduced by Brakerski, Gentry and Vainkuntanathan [8] shares many features of FV, and can be similarly adapted to the techniques in [3]. A secret-key is also defined to be a “small” polynomial $\varvec{s} \in \mathcal {R}_q$, and ciphertexts correspond to pairs $(\varvec{c}_0, \varvec{c}_1) \in \mathcal {R}_q^2$, but messages are encrypted in the Least Significant Bits of (6):

$$\begin{aligned}{}[\varvec{c_0} + \varvec{c_1} \varvec{s}]_q = [[\varvec{m}]_t+t\varvec{v}]_q \end{aligned}$$

(6)

The change in the positioning of the message bits leads to simpler homomorphic multiplications. First, we compute the degree 2 ciphertext ${{{\mathbf {\mathtt{{ct}}}}}}_{mult}\leftarrow \left( \left[ \varvec{c_0}^1 \varvec{c_0}^2 \right] _q,\left[ \left( \varvec{c_0}^1 \varvec{c_1}^2+\varvec{c_1}^1\varvec{c_0}^2\right) \right] _q,\left[ \varvec{c_1}^1\varvec{c_1}^2 \right] _q \right) $. Since operations are performed modulo q, no RNS base extension is required. Afterwards, an operation similar to (5) is applied, so as to convert the three-element ciphertext to a classical two-element ciphertext. Finally, a noise management technique is applied to reduce the growth rate of the norm of $\varvec{v}$ in (6). This technique consists of scaling the ciphertext to a smaller ring $R_{q'}$ with an appropriate rounding, and is performed in two steps:

$$\begin{aligned} \begin{array}{rcl} \varvec{\delta }_i &{} \leftarrow &{} t\cdot [-{{{\mathbf {\mathtt{{ct}}}}}}_{\texttt {mult},i}/t]_{q/q'} \text { for } i=0,1\\ \texttt {ct} &{} \leftarrow &{} \left( \left[ q'/q\cdot ({{{\mathbf {\mathtt{{ct}}}}}}_{\texttt {mult},0}+\varvec{\delta }_0)\right] _{q'}, \left[ q'/q\cdot ({{{\mathbf {\mathtt{{ct}}}}}}_{\texttt {mult},1}+\varvec{\delta }_1)\right] _{q'}\right) \end{array} \end{aligned}$$

In certain steps of the aforementioned schemes, one needs to add the result of multiple polynomial products. In this case, it is possible to sum the NTT forms of all the products, so that one has to apply only a single polynomial reduction in the end.

2.4 Batching

A common way to improve the efficiency of RLWE schemes is to encrypt several plaintexts in a single ciphertext, through a technique called batching [28]. Under some conditions, $\varvec{\phi }_m$ splits modulo t into $\ell $ distinct irreducible polynomials $f_1,\ldots ,f_\ell $ of degree $n/\ell $. This leads to the following ring isomorphism of the plaintext space $R_t$: $\mathcal {R}_t\cong \mathbb {Z}/t\mathbb {Z}[X]/(\varvec{f_1})\times \cdots \times \mathbb {Z}/t\mathbb {Z}[X]/(\varvec{f_\ell })$. In this manner, $\ell $ plaintexts $\varvec{m}_1, \ldots , \varvec{m}_\ell $ can be compactly represented as a single polynomial $\varvec{m}\in \mathcal {R}_t$. Afterwards $\varvec{m}$ is encrypted and homomorphic operations applied to this ciphertext operate on each slot individually. This technique is mostly used when evaluating Boolean circuits, i.e. with $t=2$, to pack $\ell $-bits in a single ciphertext. However, since $X^{m/2}+1 \equiv (X+1)^{m/2}\text { mod }2$, this technique cannot be used when m is a power of two. Thus the efficient arithmetic associated with power-of-two cyclotomic polynomials has limited applicability.

3 Improving Polynomial Reduction Modulo ${\phi }_m$

In this section, we propose two efficient methods to compute polynomial reductions. The first method takes advantage of the properties of the cyclotomic polynomials to improve the efficiency of the Barrett algorithm. The second reduction rests on an adaptation of the Montgomery modular reduction [25].

3.1 Improving Barrett’s Reduction for Cyclotomic Polynomials

As explained in Sect. 2, Barrett’s algorithm is sensitive to the difference between the degree of the polynomial to be reduced and that of the polynomial we want to reduce by. The smaller the difference, the more efficient the algorithm will be. Herein we propose an efficient method to reduce this difference.

Polynomials to be reduced modulo $\varvec{\phi }_m$ have a degree of at most $2n-2$. Let $\varvec{c}$ be such a polynomial. If $\varvec{c}$ were reduced by a polynomial $\varvec{Q}_{sp}$ of degree $n+\alpha +1$, the difference between the degree of the polynomial and the degree of $\varvec{\phi }_m$ would be reduced to $\alpha $. However, in order to obtain the correct value of $\varvec{c}\bmod \varvec{\phi }_m$ in the end, $\varvec{\phi }_m$ has to divide $\varvec{Q}_{sp}$ and for the efficiency of the reduction $\varvec{Q}_{sp}$ should be sparse enough so that its reduction can be handled through few operations in $\mathbb {Z}/q\mathbb {Z}$. Thanks to the cyclotomic property $\prod _{d/m}\varvec{\phi }_d(X) = X^m-1$, $\varvec{Q}_{sp}$ can be taken as the product of $\varvec{\phi }_m$ and some $\varvec{\phi }_d$ for d dividing m. Good candidates can be found by recursively using the fact that if p is a prime not dividing $m'$ then $\varvec{\phi }_{m'\cdot p}(X)\cdot \varvec{\phi }_{m'}(X)=\varvec{\phi }_{m'}(X^p)$. If $\varvec{Q}_{sp}$ is found in this way in less than 2 recursions then, since it will correspond to a cyclotomic with at most two distinct odd prime factors, it will have coefficients in $\{-1,0,1\}$. In this case, the reduction modulo $\varvec{Q}_{sp}$ only requires additions in $\mathbb {Z}/q\mathbb {Z}$ and can be done very efficiently.

In addition, when $m < 2n-2$, $\varvec{c}$ can initially be reduced by $X^m-1$ with $2n-m-1$ additions in $\mathbb {Z}/q\mathbb {Z}$. Since $\varvec{\phi }_m(X) \vert \varvec{Q}_{sp}(X) \vert X^m-1$ the strategy remains correct, and the complexity of the reduction by $\varvec{Q}_{sp}(X)$ is further reduced. Let $\texttt {HW}(\varvec{Q}_{\text {sp}})$ be the Hamming weight of $\varvec{Q}_{sp}$. The cost of the reduction of $\varvec{c}$ by $\varvec{Q}_{sp}$ is $(\texttt {HW}(\varvec{Q}_{sp})-1)(m - \deg (\varvec{Q}_{sp}))$ additions in $\mathbb {Z}/q\mathbb {Z}$. At this point, we obtain $\varvec{c'}=\varvec{c}\bmod \varvec{Q}_{sp}$ (with $\deg (\varvec{c'})\le n+\alpha $) and $\varvec{c'}\equiv \varvec{c}\bmod \varvec{\phi }_m$.

The final algorithm is depicted in Algorithm 2. It starts by recovering $\varvec{c}$ in power-basis from the NTT representation output by (3). Then it consecutively reduces $\varvec{c}$ of degree $2n-2$ by $X^m-1$ and by the sparse polynomial $\varvec{Q}_{sp}$. This allows to recover $\varvec{c'}=\varvec{c}\bmod \varvec{Q}_{sp}$ of degree $n+\alpha $ very efficiently. Afterwards, steps 2 to 7 of Algorithm 1 are applied to $\varvec{c'}$ to get $\varvec{c''}=\varvec{c}\bmod \varvec{\phi }_m$.

The impact of this sparse reduction is illustrated in Table 1, where polynomials $\varvec{Q}_{sp}$ are presented for different cyclotomic polynomials. Cyclotomics have been chosen with a degree $n=\varphi (m)$ close or equal to a power of two. The number of batching slots $\ell $ associated with each cyclotomic is also presented. The degree of $\varvec{Q}_{sp}$ is $n+\alpha +1$ thus NTTs of size $A=\mathcal {N}_2(2\alpha +1)$ are required to compute the first polynomial product in Algorithm 2. This is in contrast with $N=\mathcal {N}_2(2n)$ which would have been the size required for the Barrett algorithm without using the sparse reduction. In order to highlight the sparsity of $\varvec{Q}_{sp}$ we give $\texttt {HW}(\varvec{Q}_{sp})$ which is the number of non-zero coefficients of $\varvec{Q}_{sp}$.

Complexity. Since the complexity of computing multiplications in $\mathbb {Z}/{q}\mathbb {Z}$ is much higher than additions, the cost of the reduction by the sparse polynomial can be neglected. Moreover, with the RNS, each multiplication in $\mathcal {R}_q$, with $q=q_1\ldots q_k$ can be decomposed into k independent and smaller multiplications. Therefore the cost, in terms of modular multiplications, to reduce the polynomial $\varvec{c}$ output by (3) is essentially k times the cost of Algorithm 2:

$$\begin{aligned} k\cdot \left( \mathcal {C}_{\text {NTT}}(N) + 2\cdot \mathcal {C}_{\text {NTT}}(A) + 2\cdot \mathcal {C}_{\text {NTT}}(\tilde{n}) + A+\tilde{n} \right) . \end{aligned}$$

While the cost of the method by using directly Barrett’s algorithm, i.e. without performing the reduction by the sparse polynomial, is:

$$\begin{aligned} k\cdot \left( 3\cdot \mathcal {C}_{\text {NTT}}(N)+2\cdot \mathcal {C}_{\text {NTT}}(\tilde{n}) + N + \tilde{n} \right) . \end{aligned}$$

Based on this analysis, we also provide in Table 1 the theoretical speed-up obtained with the use of the sparse reduction.

Table 1. Sparse polynomials used for partial reduction with their related parameters.

Full size table

3.2 NTT-based Montgomery’s Reduction

We propose a Montgomery reduction of a polynomial given in NTT representation, inspired by [4]. The bottleneck of the previous optimized Barrett algorithm is the computation of the inverse NTT of size N of (3). Our Montgomery reduction takes advantage of the presence of the NTT basis of size N / 2 (seen as an RNS basis in [4]) in the basis of size N allowing to perform all the computations, in particular the inverse NTT evaluation, in the basis of size N / 2 instead of N.

The NTT representation of a polynomial of size N was defined in (2) as the set $\{\varvec{c} \bmod (X-\omega ^j) | 0 \le j < N\}$. This representation can be seen as a polynomial-RNS representation of $\varvec{c} \bmod (X^N-1)$ since $X^N-1 = \prod _{0 \le j < N} (X-\omega ^j) \bmod q$, with respect to the following NTT-basis:

$$\begin{aligned} \mathcal {B}_{\omega ,N}=\{|X-1|_{q},|X-\omega |_{q},\ldots ,|X-\omega ^{N-1}|_{q}\} \end{aligned}$$

As $X^N-1$ splits in $(X^{N/2}-1)(X^{N/2}+1)$ when N is even, half of the $\text {NTT}_N$ representation of $\varvec{c}$ corresponds to its $\text {NTT}_{N/2}$ representation. Hence, the basis $\mathcal {B}_{\omega ,N}$ is split along even and odd powers of $\omega $. We can then define two sub-bases defining two polynomials:

$$\begin{aligned} \left\{ \begin{array}{rcllrcl} \mathcal {B}_{\omega ,N}^{(e)} &{}= &{}\{[X-\omega ^{2j}|_{q}, 0\le j \le \frac{N}{2}-1\} &{} \text {and} &{}\varvec{\Psi }^{(e)}&{}=&{}\left| \mathop {\prod }\nolimits _{j=0}^{\frac{N}{2}-1}(X-\omega ^{2j})\right| _{q}\\ \mathcal {B}_{\omega ,N}^{(o)} &{}= &{}\{|X-\omega ^{2j+1}|_{q},0\le j\le \frac{N}{2}-1\} &{} \text {and} &{}\varvec{\Psi }^{(o)}&{}=&{}\left| \mathop {\prod }\nolimits _{j=0}^{\frac{N}{2}-1}(X-\omega ^{2j+1})\right| _{q} \end{array}\right. \end{aligned}$$

(7)

It is straightforward to notice that $\varvec{\Psi }^{(e)}\equiv |X^{N/2}-1|_q$ and $\varvec{\Psi }^{(o)}\equiv |X^{N/2}+1|_q$. We also note that since N is a power of two, one has $\varvec{\Psi }^{(o)}\equiv \varvec{\phi }_N\bmod q$. Thanks to Lemma 1, whose first point is a direct consequence of Lemma 2 in [11], we can choose $X^{N/2}+1$ as the Montgomery factor.

Lemma 1

Let $\varvec{\phi }_m$ be the m-th cyclotomic polynomial of degree n and N be the smallest power of two greater than or equal to 2n. If m is not a power of two then:

there exists $(\varvec{U},\varvec{V})\in \mathbb {Z}[X]^2$ such that $\varvec{U}(X)\cdot \varvec{\phi }_m(X)+\varvec{V}(X)\cdot \varvec{\phi }_N(X)=1$;
for any prime p, $\varvec{\phi }_m$ and $(X^N-1)$ are coprime in $\mathbb {Z}/p\mathbb {Z}$. In particular $\varvec{\phi }_m$ is a unit in $\mathbb {Z}/p\mathbb {Z}[X]/(\varvec{\phi }_N)$.

One can extract from the coordinates of $\varvec{c}$ in $\mathcal {B}_{\omega ,N}$ the representation $\varvec{\widehat{c}}^{(e)}$ of $\varvec{c}$ in $\mathcal {B}_{\omega ,N}^{(e)}$ (resp. $\varvec{\widehat{c}}^{(o)}$ in $\mathcal {B}_{\omega ,N}^{(o)}$). So, given $\varvec{\widehat{c}}^{(o)}$ and $\varvec{\widehat{c}}^{(e)}$, we can use the NTT operator to get:

$$\begin{aligned} \texttt {NTT}_{N/2}^{-1}(\varvec{\widehat{c}}^{(e)})=\varvec{c}\bmod (q,X^{N/2}-1). \end{aligned}$$

Definition 1

We define the following function, which takes in as input the residues of the polynomial $\varvec{c}$ ($\deg (\varvec{c})<N$) modulo a prime p:

$$\begin{aligned} \texttt { modMg}_{\varvec{\phi }_m,\varvec{\Psi }^{(o)},p}(\varvec{c})=\frac{\varvec{c}+\varvec{\phi }_m\times |-\varvec{c}/\varvec{\phi }_m|_{\varvec{\Psi }^{(o)}}}{\varvec{\Psi }^{(o)}}\bmod p. \end{aligned}$$

(8)

The $\texttt {modMg}_{\varvec{\phi }_m,\varvec{\Psi }^{(o)},p}$ function defined in (8) is a classical Montgomery reduction with factor $\varvec{\Psi }^{(o)}$ and consisting in an exact polynomial division. It always outputs a polynomial congruent to $|\varvec{c}/\varvec{\Psi }^{(o)}|_{\varvec{\phi }_m}$ but when $\deg (\varvec{c})\leqslant N/2+n-1$ the output is exactly $|\varvec{c}/\varvec{\Psi }^{(o)}|_{\varvec{\phi }_m}$.

Lemma 2

If $\deg (\varvec{c})\leqslant \frac{N}{2}+n-1$, then $\texttt {modMg}_{\varvec{\phi }_m,\varvec{\Psi }^{(o)},p}(\varvec{c})=\varvec{c}/\varvec{\Psi }^{(o)}\bmod (p,\varvec{\phi }_m)$.

First the degree of the numerator in (8) is bounded by $\max (\deg (\varvec{c}), \deg (\varvec{\phi }_m)+\deg (\varvec{\Psi }^{(o)})-1)\leqslant n+N/2-1$. Thus, the degree of the resulting quotient is bounded by $n-1 < N/2$. Therefore, the output is $|\varvec{c}/\varvec{\Psi }^{(o)}|_{\varvec{\phi }_m}$ and the computation of (8) can be made modulo $X^{N/2}-1$, i.e. in an NTT representation of size N / 2 when using primes $q_i\equiv 1\bmod N$. Algorithm 3 details the computation of (8). The following precomputations are used therein:

$$\begin{aligned} \left\{ \begin{array}{rcl} \varvec{\widehat{W}}^{(o)} &{} : &{} -1/\varvec{\phi }_m\bmod (q,\varvec{\Psi }^{(o)})\text { in base }\mathcal {B}_{\omega ,N}^{(o)}\\ \varvec{\widehat{Y}}^{(e)} &{} : &{} 1/\varvec{\Psi }^{(o)}\equiv 1/2 \bmod (q,\varvec{\Psi }^{(e)})\text { in base }\mathcal {B}_{\omega ,N}^{(e)}\\ \varvec{\widehat{Z}}^{(e)} &{} : &{} \varvec{\phi }_m/\varvec{\Psi }^{(o)}\equiv \varvec{\phi }_m/2 \bmod (q,\varvec{\Psi }^{(e)})\text { in base }\mathcal {B}_{\omega ,N}^{(e)}\\ \end{array}\right. \end{aligned}$$

In line 3 of Algorithm 3, we require an operator which takes in as input a vector of coefficients in base $\mathcal {B}_{\omega ,N}^{(o)}$. This vector defines a unique polynomial $\varvec{Q}$ with $\deg (\varvec{Q})<N/2$. Then this operator must output the vector of coefficients of $\varvec{Q}$ in base $\mathcal {B}_{\omega ,N}^{(e)}$. The function $\texttt {BaseConv}$ is defined for any $\varvec{Q}$ with $\deg (\varvec{Q})<N/2$ by:

$$\begin{aligned} \texttt {BaseConv} : (\varvec{Q}(\omega ),\varvec{Q}(\omega ^3),\ldots ,\varvec{Q}(\omega ^{N-1})) \mapsto (\varvec{Q}(1),\varvec{Q}(\omega ^2),\ldots ,\varvec{Q}(\omega ^{N-2}))\ [q]. \end{aligned}$$

(9)

In [4], (9) is computed with a classical Lagrange interpolation. Our context is more specific, because the points in which polynomials are evaluated are powers of a $N^{\text {th}}$ root of unity $\omega $ in $\mathbb {Z}/q\mathbb {Z}$. With this purpose, Algorithm 4 implements such base conversion by only using NTTs of degree N / 2, with $\omega ^{2}$ as a primitive $N/2^{\text {th}}$ root of unity. The proof of correctness of Algorithm 4 is provided in Appendix A.3.

Complexity. The total cost of Algorithm 3 in terms of modular multiplications is:

$$\begin{aligned} k\cdot \left( 3\cdot \mathcal {C}_{\text {NTT}}(\mathcal {N}_2(n)) + 4\cdot \mathcal {N}_2(n) \right) . \end{aligned}$$

One can find in Table 2 the predicted speed-up of the proposed Montgomery reduction. Despite its lower complexity, the Montgomery algorithm suffers from one main drawback which is the presence of the Montgomery factor in the output.

Table 2. Theoretical speed-up of Algorithm 3 when compared with Algorithm 1

Full size table

4 Adaptation of FV and BGV to the Montgomery Representation

In this section, we show how the Montgomery representation impacts the BGV and FV schemes and suggest modifications to handle these changes. For simplicity, we denote by $\varvec{M}$ the Montgomery factor $(X^{N/2}+1)\bmod \varvec{\phi }_m$. Thanks to Lemma 1 we also know that $\varvec{M}^{-1}$ exists in $\mathcal {R}$. We assume that ciphertexts $\widetilde{\texttt {ct}}$ are given in Montgomery representation, i.e. such that $\widetilde{\texttt {ct}}=(\varvec{\widetilde{c}_0},\varvec{\widetilde{c}_1})=(\varvec{c}_0\varvec{M},\varvec{c}_1\varvec{M})$. The conversion to the Montgomery domain can be integrated in the encryption procedure for increased efficiency, and the $\varvec{M}$ factor can be removed during decryption by applying a Montgomery reduction to $[\varvec{\widetilde{c}_0} + \varvec{\widetilde{c}_1} \varvec{s}]_q$. The Montgomery reduction only impacts procedures involving multiplications in $\mathcal {R}_q$ by multiplying the product by $\varvec{M}^{-1}$. Therefore the Montgomery representation is stable with respect to multiplication. Homomorphic additions are not affected by this different representation. Thus the only impact one has to consider is on homomorphic multiplication. For further discussions, we recall that the expansion factor of the ring $\mathcal {R}$ is the quantity defined by $\delta _\mathcal {R}=\sup \{\Vert \varvec{ab}\Vert _\infty /\Vert \varvec{a}\Vert _\infty \Vert \varvec{b}\Vert _\infty \ (\varvec{a},\varvec{b})\in \mathcal {R}-\{0\} \}$.

4.1 Impact of the Montgomery Representation in FV

We note that the first step of the FV homomorphic multiplication corresponds to the extension of polynomials of ciphertexts to a larger RNS basis, so that multiplications are computed over $\mathcal {R}$ instead of $\mathcal {R}_q$. In order to improve efficiency, an approximate extension is used [3] and thus the norm of the polynomials is bounded by $\tfrac{q}{2}(1+\rho )$ for a parameter $\rho >0$ [3]. A bound on the noise associated to $\widetilde{{{{\mathbf {\mathtt{{ct}}}}}}}_{mult}\leftarrow \left( \left[ \left\lfloor \tfrac{t}{q} M\varvec{c_0}^1 \varvec{c_0}^2 \right\rceil \right] _q,\left[ \left\lfloor \tfrac{t}{q} M(\varvec{c_0}^1 \varvec{c_1}^2+\varvec{c_1}^1\varvec{c_0}^2) \right\rceil \right] _q,\left[ \left\lfloor \tfrac{t}{q} M\varvec{c_1}^1\varvec{c_1}^2 \right\rceil \right] _q \right) $ is given in Proposition 1 whose proof can directly be derived from the one of [3].

Proposition 1

Let $(\widetilde{\varvec{c}}_0,\widetilde{\varvec{c}}_1,\widetilde{\varvec{c}}_2) = \widetilde{{{{\mathbf {\mathtt{{ct}}}}}}}_{mult}$, $r_\infty =\tfrac{1+\rho }{2}(1+\delta _\mathcal {R}\Vert \varvec{s}\Vert _\infty )+\delta _R\Vert M\Vert _\infty $ and $\varvec{v}_i$ be the inherent noise of $\texttt {ct}_i=(\varvec{c_0}^i,\varvec{c_1}^i)$. Then

$$\begin{aligned} (\widetilde{\varvec{c}}_0+\widetilde{\varvec{c}}_1\varvec{s}+\widetilde{\varvec{c}}_2\varvec{s}^2)\varvec{M^{-1}}\equiv \Delta \left[ \varvec{m}_1\varvec{m}_2\right] _t+\varvec{\widetilde{v}}_{mult}\bmod q \end{aligned}$$

with the following bound on the noise:

$$\begin{aligned} \Vert \varvec{\widetilde{v}}_{mult}\Vert _\infty&<\delta _\mathcal {R}t \left( \delta _\mathcal {R}\left\| \varvec{M}^{-1}\right\| _\infty r_\infty +\tfrac{1}{2}\right) \left( \Vert \varvec{v}_1\Vert _\infty +\Vert \varvec{v}_2\Vert _\infty \right) +\tfrac{\delta }{2}\min \Vert \varvec{v}_i\Vert _\infty \\&+\delta t |q|_t\left( r_\infty +1\right) +\tfrac{1}{2}\delta _\mathcal {R}\left\| \varvec{M}^{-1}\right\| _\infty \left( 1+\delta _\mathcal {R}\Vert \varvec{s}\Vert _\infty \left( 1+\delta _\mathcal {R}\Vert \varvec{s}\Vert _\infty \right) \right) +\tfrac{|q|_t}{2}+1\nonumber . \end{aligned}$$

(10)

Now, we assume that we need to relinearize the ciphertext $\widetilde{\texttt {ct}}_{mult}=(\widetilde{\varvec{c}}_0,\widetilde{\varvec{c}}_1,\widetilde{\varvec{c}}_2)$. Within the original variant, the following dot products were computed over $\mathcal {R}_{q}$: $\langle \xi _{q}(\varvec{c}_2), \texttt {\text {rlk}}_{i} \rangle $, where $\texttt {\text {evk}}_{0}=\left[ \mathcal {P}_{RNS,q}(\varvec{s}^2)+\overrightarrow{\varvec{a}}\varvec{s}+\overrightarrow{\varvec{e}}\right] _q$ and $\texttt {\text {evk}}_{1}=\left[ -\overrightarrow{\varvec{a}}\right] _q$. The goal of relinearisation is to obtain $\left\langle \xi _{q}(\varvec{c}_2),\mathcal {P}_{RNS,q}(\varvec{s}^2) \right\rangle \equiv \varvec{c}_2\varvec{s}^2\bmod q$ with a limited increase of the noise. Indeed, one can write:

$$\begin{aligned} \left\{ \begin{array}{rcl} \left\langle \xi _{q}(\varvec{c}_2), \texttt {evk}_0 \right\rangle &{}\equiv &{} \varvec{c}_2\varvec{s}^2+ \left\langle \xi _{q}(\varvec{c}_2), \overrightarrow{\varvec{a}}\right\rangle \varvec{s}+ \left\langle \xi _{q}(\varvec{c}_2), \overrightarrow{\varvec{e}}\right\rangle \equiv \varvec{c}_2\varvec{s}^2+ \varvec{a'}\varvec{s}+\varvec{e'} \bmod q \\ \left\langle \xi _{q}(\varvec{c}_2), \texttt {evk}_1 \right\rangle &{}\equiv &{} -\left\langle \xi _{q}(\varvec{c}_2), \overrightarrow{\varvec{a}}\right\rangle \equiv -\varvec{a'} \bmod q \end{array}\right. \end{aligned}$$

(11)

Now, we need to obtain the Montgomery representation of the output of this relinearisation, i.e. a cryptogram like $\left( \left( \varvec{c}_2\varvec{s}^2+ \varvec{a'}\varvec{s}+\varvec{e'} \right) \varvec{M},-\varvec{Ma'}\right) $.

When the Montgomery representation is used, $\varvec{\widetilde{c}}_2$ replaces $\varvec{c}_2$ in (11). Hence, the relinearisation key has to be modified as follows:

$$\begin{aligned} \texttt {evkM}_{0}=\left[ \left( \mathcal {P}_{RNS,q}\left( \varvec{s}^2/\varvec{M}\right) +\overrightarrow{\varvec{a}}\varvec{s}+\overrightarrow{\varvec{e}}\right) \varvec{M}^2\right] _q,\ \texttt {evkM}_{1}=\left[ -\overrightarrow{\varvec{a}}\varvec{M}^2\right] _q \end{aligned}$$

In the following equations, we simulate the effect of the Montgomery reduction by introducing a factor $\varvec{M}^{-1}$ ($\bmod \varvec{\phi }_m$):

$$\begin{aligned} \begin{array}{rcl} \left\langle \xi _{q}\left( \varvec{\widetilde{c}}_2\right) , \texttt {evkM}_0 \right\rangle \varvec{M}^{-1} &{}=&{} \left( \varvec{\widetilde{c}}_2\varvec{s}^2\varvec{M}+ \left\langle \xi _{q}\left( \varvec{\widetilde{c}}_2\right) , \overrightarrow{\varvec{a}}\right\rangle \varvec{s}\varvec{M}^2+ \left\langle \xi _{q}\left( \varvec{\widetilde{c}}_2\right) , \overrightarrow{\varvec{e}}\right\rangle \varvec{M}^2 \right) \varvec{M}^{-1}\\ &{}=&{} \varvec{\widetilde{c}}_2\varvec{s}^2+ \left( \varvec{a''}\varvec{s}+\varvec{e''} \right) \varvec{M}\\ &{}=&{} \left( \varvec{c}_2\varvec{s}^2+ \varvec{a''}\varvec{s}+\varvec{e''} \right) \varvec{M} \end{array} \end{aligned}$$

Similarly, we get $\left\langle \xi _{q}\left( \varvec{\widetilde{c}}_2\right) , \texttt {evkM}_1 \right\rangle \varvec{M}^{-1}=-\varvec{a''}\varvec{M}$. Hence, we have obtained the Montgomery representation of the output of relinearisation step at no extra cost - both computationally and in terms of noise growth.

4.2 Impact of the Montgomery Representation in BGV

For the first step of the BGV homomorphic multiplication, no scaling operation is required. Thus, the noise is not affected by a change in representation. Next, relinearisation is applied. An analysis similar to the one in Sect. 4.1 can be performed, with minor adaptations for the relinearisation key. Similarly, one concludes that the Montgomery reduction introduces no cost neither in terms of computation nor in noise growth.

Finally, one needs to apply scaling so as to manage noise growth. We consider the ciphertext $(\varvec{\widetilde{c}}_0,\varvec{\widetilde{c}}_1)$ encrypting $\varvec{m}$ and given in Montgomery representation. Let $q'\mid q$ and $\varvec{\delta }_i=[-\varvec{\widetilde{c}}_i/t]_{q/q'}\times t$. Then the BGV-scaling function applied to $\varvec{\widetilde{c}}_i$ outputs $\varvec{\widehat{c}}_i=(\varvec{\widetilde{c}}_i+\varvec{\delta }_i)\times \frac{q'}{q}$.

Lemma 3

If $\left\| \left[ \varvec{c}_0+\varvec{c}_1\cdot \varvec{s}\right] _q \right\| _\infty < \frac{q}{2}-\delta _\mathcal {R}\left\| \varvec{M}^{-1}\right\| _\infty \frac{qt}{2q'}\left( 1+\delta _\mathcal {R}\cdot \Vert \varvec{s}\Vert _\infty \right) $ and $q=q'\bmod t$, then

$$\begin{aligned} \left[ \left( \varvec{\widehat{c}}_0+\varvec{\widehat{c}}_1\cdot \varvec{s}\right) \varvec{M}^{-1}\right] _{q'} = \left[ \varvec{c}_0+\varvec{c}_1\cdot \varvec{s}\right] _{q}\bmod t \end{aligned}$$

(12)

and

$$\begin{aligned} \left\| \left[ \left( \varvec{\widehat{c}}_0+\varvec{\widehat{c}}_1\cdot \varvec{s}\right) \varvec{M}^{-1}\right] _{q'}\right\| _\infty \leqslant \frac{q'}{q} \left\| \left[ \varvec{c}_0+\varvec{c}_1\cdot \varvec{s}\right] _q \right\| _\infty + \delta _\mathcal {R}\left\| \varvec{M}^{-1}\right\| _\infty \frac{t}{2}\left( 1+\delta _\mathcal {R}\Vert \varvec{s}\Vert _\infty \right) \end{aligned}$$

(13)

Proof

It is similar to the proof of lemma 4 in [8]. By definition of $\varvec{\widetilde{c}}_i$, we have:

$$\begin{aligned} \left[ \left( \varvec{\widetilde{c}}_0+\varvec{\widetilde{c}}_1\cdot \varvec{s}\right) \varvec{M}^{-1}\right] _q=\left[ \varvec{c}_0+\varvec{c}_1\cdot \varvec{s}\right] _q=\varvec{c}_0+\varvec{c}_1\cdot \varvec{s}-q\varvec{u}. \end{aligned}$$

By definition of $\varvec{\widehat{c}}_i$, we can write:

$$\begin{aligned} \begin{array}{rcl} \left( \varvec{\widehat{c}}_0+\varvec{\widehat{c}}_1\cdot \varvec{s}\right) \varvec{M}^{-1}&{}=&{}\frac{q'}{q}\left( \varvec{\widetilde{c}}_0+\varvec{\widetilde{c}}_1\cdot \varvec{s}+\varvec{\delta }_0+\varvec{\delta }_1\cdot \varvec{s}\right) \varvec{M}^{-1}\\ &{}=&{}\frac{q'}{q}\left( \varvec{c}_0+\varvec{c}_1\cdot \varvec{s}\right) +\frac{q'}{q}\left( \varvec{\delta }_0+\varvec{\delta }_1\cdot \varvec{s}\right) \varvec{M}^{-1}\\ &{}=&{}\frac{q'}{q}\left[ \varvec{c}_0+\varvec{c}_1\cdot \varvec{s}\right] _q+q'\varvec{u}+\frac{q'}{q}\left( \varvec{\delta }_0+\varvec{\delta }_1\cdot \varvec{s}\right) \varvec{M}^{-1}. \end{array} \end{aligned}$$

(14)

Moreover, since $\left\| \varvec{\delta }_i\right\| _\infty \leqslant \frac{qt}{2q'}$, we get the following bound $\left\| \left( \varvec{\delta }_0+\varvec{\delta }_1\cdot \varvec{s}\right) \varvec{M}^{-1}\right\| _\infty \leqslant \delta _\mathcal {R}\frac{qt}{2q'}\left\| \varvec{M}^{-1}\right\| _\infty \left( 1+\delta _\mathcal {R}\Vert \varvec{s}\Vert _\infty \right) $. Thus, from the above and by considering the hypothesis on the norm of $\left[ \left( \varvec{\widetilde{c}}_0+\varvec{\widetilde{c}}_1\cdot \varvec{s}\right) \varvec{M}^{-1}\right] _q =\left[ \varvec{c}_0+\varvec{c}_1\cdot \varvec{s}\right] _q$, we deduce that:

$$\begin{aligned} \left\| \left( \varvec{\widehat{c}}_0+\varvec{\widehat{c}}_1\cdot \varvec{s}\right) \varvec{M}^{-1}-q'\varvec{u} \right\| _\infty <q'/2 \end{aligned}$$

and then that

$$\begin{aligned} \left( \varvec{\widehat{c}}_0+\varvec{\widehat{c}}_1\cdot \varvec{s}\right) \varvec{M}^{-1}-q'\varvec{u}=\left[ \left( \varvec{\widehat{c}}_0+\varvec{\widehat{c}}_1\cdot \varvec{s}\right) \varvec{M}^{-1}\right] _{q'}. \end{aligned}$$

Hence, from this previous equality and by bounding the norm of last member of (14), we obtain (13). Finally, we get (12) by:

$$\begin{aligned} \begin{array}{rcl} \left[ \left( \varvec{\widehat{c}}_0+\varvec{\widehat{c}}_1\cdot \varvec{s}\right) \varvec{M}^{-1}\right] _{q'}&{}=&{} \left( \varvec{\widehat{c}}_0+\varvec{\widehat{c}}_1\cdot \varvec{s}\right) \varvec{M}^{-1}-q'\varvec{u}\\ &{}=&{} \left( \varvec{\widetilde{c}}_0+\varvec{\widetilde{c}}_1\cdot \varvec{s}\right) /\varvec{M}-q\varvec{u}\bmod t\ (\varvec{\widehat{c}}_i=\varvec{\widetilde{c}}_i\bmod t; q=q'\bmod t)\\ &{}=&{}\varvec{c}_0+\varvec{c}_1\cdot \varvec{s}-q\varvec{u}\bmod t\ (\text {def. of }\varvec{\widetilde{c}})\\ &{}=&{}\left[ \varvec{c}_0+\varvec{c}_1\cdot \varvec{s}\right] _q\bmod t. \end{array} \end{aligned}$$

$\square $

From this lemma, we can see that the Montgomery representation of the ciphertext impacts the modulus switching procedure by the addition of an extra factor $\delta _\mathcal {R}\Vert \varvec{M}^{-1}\Vert _\infty $ to the last term on the bound of the hypothesis and of (13).

4.3 Overall Impact on Noise Growth

For both BGV and FV, the norm $\Vert \varvec{M}^{-1}\Vert _\infty $ and the expansion factor $\delta _\mathcal {R}$ are involved in the noise growth due to the scaling steps performed with the Montgomery reduction. We report some observations concerning the size of the coefficients of $\varvec{M}^{-1}$ in Appendix B, but they seem to remain small for most of the cases. When m is a power of two, $\delta _\mathcal {R}$ is equal to n, but for other m it can be larger than that. Let us consider $\mathcal {F}_m:\mathbb {Q}_{2n-2}[X]\rightarrow \mathbb {Q}[X]/(\varvec{\phi }_m(X))$, so that $\mathcal {F}_m(\varvec{a}) = \varvec{a}\bmod \varvec{\phi }_m$ for every $\varvec{a}\in \mathbb {Q}[X]$ of degree lesser than or equal to $2n-2$.

Lemma 4

Let m be a positive integer and let $\mathcal {R}= \mathbb {Z}[X]/(\varvec{\phi }_m(X))$, with $\deg (\varvec{\phi }_m)=n$. If $\delta _\mathcal {R}$ denotes the expansion factor of the ring $\mathcal {R}$, then $\delta _\mathcal {R}\le n \cdot \Vert \mathcal {F}_m \Vert _\infty .$

These three parameters are given in Table 3 for the different cyclotomic polynomials considered in this paper. Assuming that distributions $\chi _{key}$ and $\chi _{err}$ output elements whose infinite norms are bounded by $B_{key}$ and $B_{err}=6\sigma _{err}$, we analyse which depth can be reached in a multiplicative tree.

FV: The initial noise of a ciphertext is at most $V_{init}=B_{err}(1+2\delta B_{key})$ [10]. We recall that $r_\infty =\tfrac{1+\rho }{2}(1+\delta _\mathcal {R}B_{key})+\delta _R\Vert \varvec{M}\Vert _\infty $. The output of a tree of depth L has a noise bounded by $C_1^L V + L C_1^{L-1}C_2$ (cf. [6], Lemma 9) with:

$$\begin{aligned} \left\{ \begin{array}{rcl} C_{1}&{}=&{}2\delta _\mathcal {R}t \left( \delta _\mathcal {R}\left\| \varvec{M}^{-1}\right\| _\infty r_\infty +\tfrac{1}{2}\right) +\tfrac{\delta _\mathcal {R}}{2}\\ C_{2}&{}=&{}\delta _\mathcal {R}t |q|_t (r_\infty +1)+\tfrac{1}{2}\delta _\mathcal {R}\left\| \varvec{M}^{-1}\right\| _\infty \left( 1+\delta _\mathcal {R}B_{key} (1 + \delta B_{key})\right) + \tfrac{|q|_t}{2}+1\\ &{} &{}+k(1+\delta _\mathcal {R}B_{key} (1+\delta _\mathcal {R}B_{key})) + \delta _\mathcal {R}k B_{err} 2^{\nu +1} \end{array}\right. \end{aligned}$$

We denote by $L_{max}=\max \{L\in \mathbb {N}\mid C_{1}^LV+LC_{1}^{L-1}C_{2}\leqslant B_{dec}\}$ the depth allowed by the homomorphic multiplication where $B_{dec}$ corresponds to the decryption bound given by the full RNS version of FV [3].

BGV: As long as a ciphertext satisfies the condition on Lemma 3, one can perform a scaling operation and thus an homomorphic multiplication. Initially, one has $\Vert \varvec{c}_0+\varvec{c}_1\varvec{s}\Vert _\infty \leqslant V_{init} = t/2+tB_{err}(2\delta _\mathcal {R}B_{key}+1)$. Let $\ell _{\omega ,q} = \lceil \log _{\omega } q \rceil $, by assuming that after each scaling, the size of q is reduced by $\omega $ bits, the growth of the size V of $\varvec{c}_0+\varvec{c}_1\varvec{s}$, after one multiplication can be expressed by:

$$\begin{aligned} \tfrac{q'}{q}(\delta _\mathcal {R}V^2+B_{relin}) + \tfrac{1}{2}\delta _R t \left\| M^{-1}\right\| _\infty \left( 1+\delta _\mathcal {R}B_{key}\right) \end{aligned}$$

with $B_{relin}=\tfrac{\delta _\mathcal {R}}{2} \ell _{\omega ,q}\omega B_{err}B_{key}$ the bound on the noise due to the relinearisation.

In Table 3 we present the maximal theoretical depths for BGV and FV with and without the use of the Montgomery reduction. For these computations we have taken parameters $B_{key}=1$, $\sigma _{err}=2\sqrt{n}$ and a number k of 62-bits moduli to get the largest size for q ensuring at least 80-bits of security according to [2].

Table 3. Theoretical depths with and without Montgomery reduction. Values in parenthesis are the depths observed in practice.

Full size table

We notice that the depths of BGV are almost unchanged with the usage of the Montgomery reduction. However for FV the depths are far smaller with a Montgomery representation. These behaviours have been confirmed in practice.

4.4 Mixing Optimized Barrett and Montgomery Reductions

Considering the non-negligible impact of the Montgomery representation on the multiplicative depth of FV, a more robust strategy for this cryptosystem corresponds to a mixed Barrett/Montgomery approach. Algorithm 2 is used during the first stage of homomorphic multiplication, with ciphertexts not exploiting a Montgomery representation. This avoids the noise growth caused by the Montgomery factor. Nonetheless, the Montgomery reduction can still be used during the relinearisation stage, since we have seen that this does not cause a larger noise growth. To obtain a valid result, the relinearisation key needs to be modified, by replacing the factor $\varvec{M}^2$ of the Montgomery approach by $\varvec{M}$.

5 Experimental Results

The proposed methods for polynomial reduction have been implemented using C++, and compiled with gcc using the optimization flag -O3. All the experimental results presented herein were measured on an i7-5960X, running at 3.0 GHz with 32 GB of main memory. No parallelism was exploited.

In Fig. 1, one can find the execution timings of polynomial reduction, using NFL for power-of-two cyclotomics [1]; Mayer’s implementation of $CRT^{-1}$ operator exploiting a tensored representation [24]; the unoptimized and optimized Barrett reductions and the Montgomery reduction for non-power-of-two cyclotomics. In order to highlight the gain brought by our algorithms compared to generic ones we also compare with NTL’s reduction using preconditioning [15]. All timings were normalized based on the number of batching slots $\ell $, and executed for a single modulus of 62-bits. One finds the straightforward application of Barrett reduction, to be more efficient than the preconditioned methods employed in the NTL library. Moreover, speed-ups up to 1.95 and 2.55 were achieved for the optimized Barrett and Montgomery algorithms when compared with the unoptimized Barrett reduction. The figure suggests that using power-of-two cyclotomics is not scalable with respect to the throughput. In contrast, the remaining approaches present very little variation when considering the execution timing per batching slot for the different m given in Table 3. It should be noted that using larger values of m enables FHE parameters with a larger multiplicative depth. Finally, while tensored representations natively support operations on cyclotomic rings, research on its algorithmic efficiency is still in its infancy, and hence a fair comparison is not possible.

The aforementioned reduction methods were used to implement the homomorphic multiplication operations of the FV and BGV schemes. One can find in Figs. 2 and 3 the execution times of the homomorphic multiplication of two freshly encrypted ciphertexts for FV and BGV, respectively with the parameters given in Table 3. The experimental results for NFL are omitted due to its low performance with respect to the timing per batching slot.

Unlike with Fig. 1, the execution time of homomorphic multiplication increases significantly with increasing m. This trend is explained by the relinearisation procedure, which requires a number of $\text {NTT}$s that increases quadratically with $\log _2 q$. Nevertheless, the employed reduction procedure plays a preponderant role in the efficiency of the homomorphic multiplication. Indeed, one achieves speed-ups up to 1.37 and 1.24 when comparing the homomorphic multiplication exploiting the optimized Barrett reduction with the one exploiting the unoptimized Barrett method for the FV and BGV schemes, respectively. Since with the mixed Barrett/Montgomery approach, required by the FV scheme, one is not able to fully take advantage of the gains brought forth by the Montgomery representation, one achieves speed-ups similar to those of the optimized Barrett reduction. In contrast, for BGV, one can exploit the Montgomery representation throughout the whole procedure, leading to speed-ups up to 1.32.

The speed-up of the proposed methods decreases as the degree n of the cyclotomic, and thus $\log _2 q$ gets larger due to the increasing complexity of the relinearisation procedure. This suggests that they are most beneficial when one needs to homomorphically evaluate small circuits. Since most of practical applications of FHE [13, 18, 26] have circuits with small depth, the proposed methods can potentially have a wide range of applicability.

6 Conclusion

In this paper, the arithmetic of non-power-of-two cyclotomics has been considered and improved. Two methods for polynomial reduction have been proposed: one based on the Barrett reduction and the other on a Montgomery representation. The optimized Barrett algorithm does not offer a better computational complexity than the Montgomery reduction. However, since it does not require changes in the representation of ciphertexts, it provides for a slower noise growth, making it more amenable to application in the FV homomorphic scheme than the Montgomery approach. In contrast, the Montgomery approach is more suitable for the BGV scheme. Speed-ups up to 1.95 and 2.55 have been obtained on an i7-5960X when comparing the optimized Barrett and Montgomery reductions with the unoptimized Barrett reduction, respectively. Finally, the polynomial reductions have been incorporated into the homomorphic multiplication procedures of FV and BGV, producing speed-ups up to 1.37.

References

Aguilar-Melchor, C., Barrier, J., Guelton, S., Guinet, A., Killijian, M.-O., Lepoint, T.: NFLlib: NTT-based fast lattice library. In: Sako, K. (ed.) CT-RSA 2016. LNCS, vol. 9610, pp. 341–356. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29485-8_20
Chapter Google Scholar
Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Cryptol. 9, 169–203 (2015)
Article MathSciNet MATH Google Scholar
Bajard, J.-C., Eynard, J., Hasan, M.A., Zucca, V.: A full RNS variant of FV like somewhat homomorphic encryption schemes. In: Avanzi, R., Heys, H. (eds.) SAC 2016. LNCS, vol. 10532, pp. 423–442. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69453-5_23
Chapter Google Scholar
Bajard, J.-C., Imbert, L., Negre, C.: Arithmetic operations in finite fields of medium prime characteristic using the lagrange representation. IEEE Trans. Comput. 55, 1167–1177 (2006)
Article Google Scholar
Barrett, P.: Implementing the rivest shamir and adleman public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987). https://doi.org/10.1007/3-540-47721-7_24
Chapter Google Scholar
Bos, J.W., Lauter, K., Loftus, J., Naehrig, M.: Improved security for a ring-based fully homomorphic encryption scheme. In: Stam, M. (ed.) IMACC 2013. LNCS, vol. 8308, pp. 45–64. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45239-0_4
Chapter Google Scholar
Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical GapSVP. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 868–886. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_50
Chapter Google Scholar
Brakerski, Z., Vaikuntanathan, V., Gentry, C.: Fully homomorphic encryption without bootstrapping. In: In Innovations in Theoretical Computer Science (2012)
Google Scholar
Dai, W., Sunar, B.: cuHE: a homomorphic encryption accelerator library. In: Pasalic, E., Knudsen, L.R. (eds.) BalkanCryptSec 2015. LNCS, vol. 9540, pp. 169–186. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29172-7_11
Chapter Google Scholar
Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive (2012)
Google Scholar
Filaseta, M.: On coverings of the integers associated with an irreducibility theorem of A. Schinzel. In: Number Theory for the Millennium, II (Urbana, IL, 2000), pp. 1–24. A K Peters, Natick (2002)
Google Scholar
Gentry, C., Halevi, S., Smart, N.P.: Homomorphic evaluation of the AES circuit. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 850–867. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_49
Chapter Google Scholar
Gilad-Bachrach, R., Dowlin, N., Laine, K., Lauter, K.E., Naehrig, M., Wernsing, J.: Cryptonets: applying neural networks to encrypted data with high throughput and accuracy. In: ICML, JMLR Workshop and Conference Proceedings, vol. 48, pp. 201–210. JMLR.org (2016)
Google Scholar
Goluch, S.: The development of homomorphic cryptography. Master’s thesis, Vienna University of Technology, Austria (2011)
Google Scholar
Halevi, S., Halevi, T., Shoup, V., Stephens-Davidowitz, N.: Implementing BP-obfuscation using graph-induced encoding. Cryptology ePrint Archive, Report 2017/104 (2017). http://eprint.iacr.org/2017/104
Halevi, S., Shoup, V.: Algorithms in HElib. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8616, pp. 554–571. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44371-2_31
Chapter Google Scholar
Harvey, D.: Faster arithmetic for number-theoretic transforms. CoRR, abs/1205.2926 (2012)
Google Scholar
Khedr, A., Gulak, G., Vaikuntanathan, V.: SHIELD: scalable homomorphic implementation of encrypted data-classifiers. IACR Cryptology ePrint Archive, 2014:838 (2014)
Google Scholar
Laine, K., Player, R.: Simple encrypted arithmetic library - seal (v2.0). Technical report, September 2016
Google Scholar
Longa, P., Naehrig, M.: Speeding up the number theoretic transform for faster ideal lattice-based cryptography. In: Foresti, S., Persiano, G. (eds.) CANS 2016. LNCS, vol. 10052, pp. 124–139. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48965-0_8
Chapter Google Scholar
Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 1–23. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5_1
Chapter Google Scholar
Lyubashevsky, V., Peikert, C., Regev, O.: A toolkit for ring-LWE cryptography. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 35–54. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38348-9_3
Chapter Google Scholar
Martins, P., Sousa, L.: Enhancing data parallelism of fully homomorphic encryption. In: Hong, S., Park, J.H. (eds.) ICISC 2016. LNCS, vol. 10157, pp. 194–207. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53177-9_10
Chapter Google Scholar
Mayer, C.M.: Implementing a toolkit for Ring-LWE based cryptography in arbitrary cyclotomic number fields. Cryptology ePrint Archive, Report 2016/049 (2016). http://eprint.iacr.org/2016/049
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985)
Article MathSciNet MATH Google Scholar
Naehrig, M., Lauter, K., Vaikuntanathan, V.: Can homomorphic encryption be practical? In: Proceedings of the 3rd ACM Workshop on Cloud Computing Security Workshop, CCSW 2011, pp. 113–124. New York (2011)
Google Scholar
Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. In: Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing, STOC 2005, pp. 84–93, ACM, New York (2005)
Google Scholar
Smart, N.P., Vercauteren, F.: Fully homomorphic simd operations. Des. Codes Cryptogr. 71(1), 57–81 (2014)
Article MATH Google Scholar

Download references

Acknowledgments

This work was partially supported by the European Union’s H2020 Programme under grant agreement $ICT-644209$, ANR ARRAND $15-CE39-0002-01$ and the Natural Science Engineering Research Council of Canada. This work was also supported in part by Portuguese funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013 and by the Ph.D. grant with reference SFRH/BD/103791/2014.

Author information

Authors and Affiliations

Sorbonne Universités, UPMC, CNRS, LIP6, Paris, France
Jean-Claude Bajard & Vincent Zucca
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada
Julien Eynard & Anwar Hasan
INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
Paulo Martins & Leonel Sousa

Authors

Jean-Claude Bajard
View author publications
You can also search for this author in PubMed Google Scholar
Julien Eynard
View author publications
You can also search for this author in PubMed Google Scholar
Anwar Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Martins
View author publications
You can also search for this author in PubMed Google Scholar
Leonel Sousa
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Zucca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent Zucca .

Editor information

Editors and Affiliations

School of Electrical Engineering and Computer Science (SITE), University of Ottawa, Ottawa, Ontario, Canada
Carlisle Adams
IBM Research - Zurich, Rueschlikon, Switzerland
Jan Camenisch

Appendices

A Proofs

1.1 A.1 Correctness of Algorithm 1

Since $\mathbb {F}_q[X]$ is an Euclidean ring, we can write the Euclidean division of $\varvec{c}$ by $\varvec{\Phi }_m$: $\varvec{c} = \lfloor \varvec{c}/\varvec{\Phi }_m \rfloor \varvec{\Phi }_m+ \varvec{r}$ with $\deg \varvec{r} \le n-1$. Let $a,b\ge 0$ be two integers, then we have the following equations over the field of fractions of $\mathbb {F}_q[X]$:

$$\begin{aligned} \begin{array}{lccl} &{} \tfrac{X^{n+a}}{\varvec{\Phi }_m}\cdot \tfrac{\varvec{c}}{X^{n-b}} &{} =&{} \left( \left\lfloor \tfrac{\varvec{c}}{\varvec{\Phi }_m} \right\rfloor + \tfrac{\varvec{r}}{\varvec{\Phi }_m}\right) \cdot X^{a+b} \\ \Leftrightarrow &{} (\left\lfloor \tfrac{X^{n+a}}{\varvec{\Phi }_m} \right\rfloor + \tfrac{\varvec{r_1}}{\varvec{\Phi }_m}) \cdot ( \left\lfloor \tfrac{\varvec{c}}{X^{n-b}} \right\rfloor + \tfrac{\varvec{r_2}}{X^{n-b}})&{} =&{} ( \left\lfloor \tfrac{\varvec{c}}{\varvec{\Phi }_m} \right\rfloor + \tfrac{\varvec{r}}{\varvec{\Phi }_m})\cdot X^{a+b} \\ \Leftrightarrow &{}\left\lfloor \tfrac{X^{n+a}}{\varvec{\Phi }_m} \right\rfloor \cdot \left\lfloor \tfrac{\varvec{c}}{X^{n-b}} \right\rfloor + \varvec{r_a} + \varvec{r_{\alpha +b}} + \varvec{r'}&{} =&{} ( \left\lfloor \tfrac{\varvec{c}}{\varvec{\Phi }_m} \right\rfloor + \tfrac{\varvec{r}}{\varvec{\Phi }_m} ) \cdot X^{a+b} \\ \Leftrightarrow &{}\left\lfloor \tfrac{ \left\lfloor X^{n+a}/\varvec{\Phi }_m \right\rfloor \cdot \left\lfloor \varvec{c}/X^{n-b} \right\rfloor }{X^{a+b}} \right\rfloor X^{a+b} + \varvec{r''}+\varvec{r_a} + \varvec{r_{\alpha +b}} + \varvec{r'} &{} = &{} (\left\lfloor \tfrac{\varvec{c}}{\varvec{\Phi }_m} \right\rfloor + \tfrac{\varvec{r}}{\varvec{\Phi }_m})\cdot X^{a+b}\\ \Leftrightarrow &{}\left\lfloor \tfrac{ \left\lfloor X^{n+a}/\varvec{\Phi }_m \right\rfloor \cdot \left\lfloor \varvec{c}/X^{n-b} \right\rfloor }{X^{a+b}} \right\rfloor + \tfrac{\varvec{r''}+\varvec{r_a} + \varvec{r_{\alpha +b}} + \varvec{r'}}{X^{a+b}} &{} =&{} \left\lfloor \tfrac{\varvec{c}}{\varvec{\Phi }_m} \right\rfloor + \tfrac{\varvec{r}}{\varvec{\Phi }_m} . \end{array} \end{aligned}$$

Furthermore, $\deg (\varvec{r_1}),\deg (\varvec{r_2})< n$, $\deg (\varvec{r_a})< a$, $\deg (\varvec{r_{\alpha +b}})< \alpha +b$, $\deg (\varvec{r'})< 0$, and $\deg (\varvec{r''})< a+b$. By choosing $b\geqslant 0$ and $a \geqslant \alpha $, the right term of left member of last equation above have a degree smaller than 0 and we obtain an equality between the two floored polynomials. Hence, by taking $b = 0$ and $a = \alpha $, we get that $\lfloor \tfrac{\varvec{c}}{\varvec{\Phi }_m} \rfloor $ is equal to the flooring of the left part of last equation, which is what Algorithm 1 computes. Since $\lfloor X^{n+\alpha }/\varvec{\Phi }_m \rfloor \cdot \lfloor \varvec{c}/X^{n} \rfloor $ is of degree strictly smaller than $2\alpha +1$, the computation can be done with an NTT of size $A=\mathcal {N}_2(2\alpha +1)$.

Finally, we notice that the result of the computation of $\varvec{r} = \varvec{c}-\lfloor \varvec{c}/\varvec{\Phi }_m \rfloor \times \varvec{\Phi }_m$ has a degree strictly smaller than n. Moreover, the polynomial $\varvec{c'}$ at line 5 is nothing but $\lfloor \varvec{c}/\varvec{\Phi }_m \rfloor \times \varvec{\Phi }_m\bmod X^{\tilde{n}}-1$. Indeed, the reduction modulo $X^{\tilde{n}}-1$ is a consequence of the NTT based polynomial product in dimension $\tilde{n}$. Thus, at the end we have that $\varvec{c'}-\varvec{d}=(\varvec{c}-\lfloor \varvec{c}/\varvec{\Phi }_m \rfloor \times \varvec{\Phi }_m)\bmod (X^{\tilde{n}}-1)=\varvec{c}\bmod \varvec{\Phi }_m$. Since the degree of $\varvec{c}\bmod \varvec{\Phi }_m$ is strictly smaller than $\tilde{n}$, the last equality holds.

1.2 A.2 Proof of Lemma 1

The first point is a direct consequence from Lemma 2 in [11]. Since m is not a power of two, m cannot divide N. By denoting $m=2^r m'$ with $m'>1$ an odd integer we have $n=2^{r-1} \varphi (m')$, thus $2n = 2^r\varphi (m')$ and then if N divides m, $\mathcal {N}_2(\varphi (m')) = 1$ which is not possible since $m'\geqslant 3$. Therefore N and m do not divide each others and we can apply Lemma 2 from [11].

Let $\alpha $ be a root of $\varvec{\phi }_m$ in the algebraic closure of $\mathbb {Z}_p$. If $\alpha $ is also a root of $X^N-1$ then $\alpha ^N=1$, since $\alpha $ is of order m by definition of $\varvec{\phi }_m$ it implies that m divide N which is impossible since N is a power of two and m is not. So, $\varvec{\phi }_m$ and $X^N-1$ are coprime on the algebraic closure of $\mathbb {Z}_p$ thus in $\mathbb {Z}_p$. The second point comes from Bezout equality in $\mathbb {Z}_p$ and from the fact that $X^N-1 \equiv (X^{N/2}-1)(X^{N/2}+1)\bmod p$.

1.3 A.3 Correctess of Algorithm 4

First we notice that since $\varvec{c}$ is a polynomial of degree smaller than N / 2 we have $(\varvec{c}(\omega ),\ldots ,\varvec{c}(\omega ^{N-1}))=\texttt {NTT}_{N/2,\omega ^2}(\xi _{N/2,\omega }(\varvec{c}))$ where $\xi _{N/2,\omega }(c_0,c_1,\ldots ,c_{N/2-1})=(c_0,c_1\omega ,\ldots ,c_{N/2-1}\omega ^{N/2-1})$. Therefore the polynomial $\varvec{c'}$ recovered at the first line of Algorithm 4 is $\xi _{N/2,\omega }(\varvec{c})$. The second line of the algorithm is the computation of $\xi _{N/2,\omega ^{-1}}(\varvec{c'})$ to recover $\varvec{c}$ from $\varvec{c'}$. Once $\varvec{c}$ is recovered, we just need to compute and return $\texttt {NTT}_{N/2,\omega ^2}(\varvec{c})$ which is done by the last step of the algorithm.

1.4 A.4 Proof of Lemma 4

Let $\varvec{a}$ and $\varvec{b}$ two elements of $\mathcal {R}-\{0\}$. They naturally embed in $\mathbb {Z}_{n-1}[X]\subset \mathbb {Q}_{2n-2}[X]$. We can write $\Vert \varvec{ab}\Vert _\infty \le n\Vert \varvec{a}\Vert _\infty \Vert \varvec{b}\Vert _\infty $. As the product $\varvec{ab}$ has degree at most $2n-2$ with coefficients in $\mathbb {Z}$, it belongs to $\mathbb {Q}_{2n-2}[X]$. Since $\mathcal {F}_m$ is a linear map between two vector spaces of finite dimension it is continuous, then we obtain $\Vert \mathcal {F}_m(\varvec{ab})\Vert _\infty \le \Vert \mathcal {F}_m \Vert _\infty \cdot \Vert \varvec{ab}\Vert _\infty \le n\cdot \Vert \mathcal {F}_m \Vert _\infty \cdot \Vert \varvec{a}\Vert _\infty \Vert \varvec{b}\Vert _\infty .$

B Size of $\Vert M^{-1}\Vert _\infty $

As discussed in Sect. 4.3, the coefficients’ size of $\varvec{M}^{-1}$ where $\varvec{M} = X^{N/2}+1\bmod \varvec{\Phi }_m\in \mathcal {R}$ can impact the noise growth when using Montgomery’s reduction. Therefore in this case, one must be carefull to choose a cyclotomic with an associated $\varvec{M}^{-1}$ small in norm. However this does not seem to be very restrictive. Indeed for the first 20, 000 cyclotomics less than $13.4\%$ have an infinite norm greater than 10, less than $0.5\%$ greater than 100 and only 3 of them greater than 1, 000. Moreover those whose $\varvec{M}^{-1}$’s norm is greater than 100 offer a relatively small number of batching slot compared to their degree. Indeed, their ratio $n/\ell $, with n the degree of the cyclotomic and $\ell $ the number of batching slots modulo 2, is greater than 36 whereas the cyclotomics proposed in Table 1 have a ratio of 16 (15 for $m=32767$). Therefore the cyclotomics whose $\varvec{M}^{-1}$ factor have large coefficients do not seem to be the best suited for batching.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bajard, JC., Eynard, J., Hasan, A., Martins, P., Sousa, L., Zucca, V. (2018). Efficient Reductions in Cyclotomic Rings - Application to Ring-LWE Based FHE Schemes. In: Adams, C., Camenisch, J. (eds) Selected Areas in Cryptography – SAC 2017. SAC 2017. Lecture Notes in Computer Science(), vol 10719. Springer, Cham. https://doi.org/10.1007/978-3-319-72565-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-72565-9_8
Published: 23 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72564-2
Online ISBN: 978-3-319-72565-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Reductions in Cyclotomic Rings - Application to Ring-LWE Based FHE Schemes

Abstract

Similar content being viewed by others

Batched Fully Dynamic Multi-key FHE from FHEW-Like Cryptosystems

Multi-Key Fully Homomorphic Encryption Scheme Over the Integers

F-FHEW: High-Precision Approximate Homomorphic Encryption with Batch Bootstrapping

Keywords

1 Introduction

2 Background

2.1 Residue Number System (RNS)

2.2 Product of Elements in \(\mathcal {R}_q\)

2.3 RNS Variant of the FV and BGV Encryption Schemes

2.4 Batching

3 Improving Polynomial Reduction Modulo \({\phi }_m\)

3.1 Improving Barrett’s Reduction for Cyclotomic Polynomials

3.2 NTT-based Montgomery’s Reduction

Lemma 1

Definition 1

Lemma 2

4 Adaptation of FV and BGV to the Montgomery Representation

4.1 Impact of the Montgomery Representation in FV

Proposition 1

4.2 Impact of the Montgomery Representation in BGV

Lemma 3

Proof

4.3 Overall Impact on Noise Growth

Lemma 4

4.4 Mixing Optimized Barrett and Montgomery Reductions

5 Experimental Results

6 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proofs

1.1 A.1 Correctness of Algorithm 1

1.2 A.2 Proof of Lemma 1

1.3 A.3 Correctess of Algorithm 4

1.4 A.4 Proof of Lemma 4

B Size of \(\Vert M^{-1}\Vert _\infty \)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation