Faster Homomorphic Encryption over GPGPUs via Hierarchical DGT

Alves, Pedro Geraldo M. R.; Ortiz, Jheyne N.; Aranha, Diego F.

doi:10.1007/978-3-662-64331-0_27

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12675))

Included in the following conference series:

International Conference on Financial Cryptography and Data Security

1311 Accesses
5 Citations

Abstract

Privacy guarantees are still insufficient for outsourced data processing in the cloud. While employing encryption is feasible for data at rest or in transit, it is not for computation without remarkable performance slowdown. Thus, handling data in plaintext during processing is still required, which creates vulnerabilities that can be exploited by malicious entities. Homomorphic encryption schemes enable computation over ciphertexts without knowing the related plaintexts or the decryption key. This work focuses on the challenge of developing an efficient implementation of the BFV scheme on CUDA. This is done by combining and adapting different literature approaches, as the double-CRT representation and the Discrete Galois Transform. Moreover, we propose and implement an improved formulation of the DGT inspired by classical algorithms, which computes the transform up to 2.6 times faster than the state-of-the-art. By using these approaches, we obtain up to 3.6 times faster homomorphic multiplication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Construction of a Fully Homomorphic Encryption Scheme with Shorter Ciphertext and Its Implementation on the CUDA Platform

Speedy Cloud-RSA homomorphic scheme for preserving data confidentiality in cloud computing

Article 18 May 2018

Efficient number theoretic transform implementation on GPU for homomorphic encryption

Article 13 July 2021

Notes

1.
GPGPU, acronym for General-Purpose Graphics Processing Unit.
2.
spog, acronym for “Secure Processing on GPGPUs”.
3.
Let \(x = a + ib\) be a Gaussian integer. If y is x’s conjugate then \(y = a - ib\).
4.
Wuthrich proves in Theorem 5.8 that every \(0 \ne \alpha \in \mathbb {Z}[i]\) has a unique factorization [30].

References

Albrecht, M., Bai, S., Ducas, L.: A subfield lattice attack on overstretched NTRU assumptions. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9814, pp. 153–178. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53018-4_6
Chapter Google Scholar
Alves, P.: SPOG: secure processing on GPGPUs (2021). https://github.com/spog-library
Alves, P., Ortiz, J.N., Aranha, D.F.: Faster homomorphic encryption over GPGPUs via hierarchical DGT. Cryptology ePrint Archive, Report 2020/861 (2020). https://eprint.iacr.org/2020/861
Badawi, A.A., Polyakov, Y., Aung, K.M.M., Veeravalli, B., Rohloff, K.: Implementation and performance evaluation of RNS variants of the BFV homomorphic encryption scheme. IACR Cryptology ePrint Archive 2018, 589 (2018)
Google Scholar
Al Badawi, A., Veeravalli, B., Aung, K.M.M.: Efficient polynomial multiplication via modified discrete Galois transform and Negacyclic convolution. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) FICC 2018. AISC, vol. 886, pp. 666–682. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-03402-3_47
Chapter Google Scholar
Badawi, A.Q.A., Veeravalli, B., Mun, C.F., Aung, K.M.M.: High-performance FV somewhat homomorphic encryption on GPUs: an implementation using GPUs. TCHES 1(2), 70–95 (2018)
Article Google Scholar
Bailey, D.H.: FFTs in external or hierarchical memory. J. Supercomput. 4(1), 23–35 (1990)
Article Google Scholar
Bajard, J.-C., Eynard, J., Hasan, M.A., Zucca, V.: A full RNS variant of FV like somewhat homomorphic encryption schemes. In: Avanzi, R., Heys, H. (eds.) SAC 2016. LNCS, vol. 10532, pp. 423–442. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69453-5_23
Chapter Google Scholar
Bajard, J.C.J., Meloni, N., Plantard, T.: Efficient RNS bases for cryptography. In: IMACS World Congress: Scientific Computation, Applied Mathematics and Simulation (2005)
Google Scholar
Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. Comput. Theory 6(3), 13:1–13:36 (2014)
Google Scholar
Chen, H., Gilad-Bachrach, R., Han, K., Huang, Z., Jalali, A., Laine, K., Lauter, K.E.: Logistic regression over encrypted data from fully homomorphic encryption. IACR Cryptology ePrint Archive 2018, 462 (2018)
Google Scholar
Chen, H., Laine, K., Player, R.: Simple encrypted arithmetic library - SEAL v2.1. IACR Cryptology ePrint Archive 2017, 224 (2017)
Google Scholar
Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10624, pp. 409–437. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70694-8_15
Chapter Google Scholar
Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: TFHE: fast fully homomorphic encryption over the torus. J. Cryptol. 33(1), 34–91 (2020)
Article MathSciNet Google Scholar
Chu, E., George, A.: Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms. CRC Press (1999)
Google Scholar
Costache, A., Smart, N.P.: Which ring based somewhat homomorphic encryption scheme is best? In: Sako, K. (ed.) CT-RSA 2016. LNCS, vol. 9610, pp. 325–340. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29485-8_19
Chapter Google Scholar
Crandall, R.E.: Integer convolution via split-radix fast Galois transform. Center for Advanced Computation Reed College (1999)
Google Scholar
Dai, W., Sunar, B.: cuHE: a homomorphic encryption accelerator library. In: Pasalic, E., Knudsen, L.R. (eds.) BalkanCryptSec 2015. LNCS, vol. 9540, pp. 169–186. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29172-7_11
Chapter Google Scholar
Ding, C., Pei, D., Salomaa, A.: Chinese Remainder Theorem: Applications in Computing, Coding, Cryptography. World Scientific (1996)
Google Scholar
Emmart, N., Weems, C.C.: High precision integer multiplication with a GPU using Strassen’s algorithm with multiple FFT sizes. Parallel Process. Lett. 21(3), 359–375 (2011)
Article MathSciNet Google Scholar
Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012, 144 (2012)
Google Scholar
Gentry, C., Halevi, S., Smart, N.P.: Homomorphic evaluation of the AES circuit. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 850–867. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_49
Chapter Google Scholar
Govindaraju, N.K., Lloyd, B., Dotsenko, Y., Smith, B., Manferdelli, J.: High performance discrete Fourier transforms on graphics processors. In: SC, p. 2. IEEE/ACM (2008)
Google Scholar
Halevi, S., Polyakov, Y., Shoup, V.: An improved RNS variant of the BFV homomorphic encryption scheme. In: Matsui, M. (ed.) CT-RSA 2019. LNCS, vol. 11405, pp. 83–105. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12612-4_5
Chapter Google Scholar
Lindner, R., Peikert, C.: Better key sizes (and Attacks) for LWE-based encryption. In: Kiayias, A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 319–339. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19074-2_21
Chapter Google Scholar
Longa, P., Naehrig, M.: Speeding up the number theoretic transform for faster ideal lattice-based cryptography. In: Foresti, S., Persiano, G. (eds.) CANS 2016. LNCS, vol. 10052, pp. 124–139. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48965-0_8
Chapter Google Scholar
Aguilar-Melchor, C., Barrier, J., Guelton, S., Guinet, A., Killijian, M.-O., Lepoint, T.: NFLlib: NTT-based fast lattice library. In: Sako, K. (ed.) CT-RSA 2016. LNCS, vol. 9610, pp. 341–356. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29485-8_20
Chapter Google Scholar
Player, R.: Parameter selection in lattice-based cryptography. Ph.D. thesis, PhD thesis, Royal Holloway, University of London (2018)
Google Scholar
Thales: 2019 Thales Data Threat Report, USA (2019). https://go.thalesesecurity.com/rs/480-LWA-970/images/2019-DTR-Global-USL-Web.pdf
Wuthrich, C.: Further number theory (2011). https://www.maths.nottingham.ac.uk/plp/pmzcw/download/fnt_chap5.pdf. Accessed 18 June 2020

Download references

Acknowledgements

This work was supported in part by CNPq, grants number 164489/2018-5, 203175/2019-0, and 44265/2019-2; and CAPES grant number 1591123. We specially thank LG for financial support within project “Privacy-preserving analytics”, project number 5296; Google for GCP Research Credits Program under number 106101194491; and the Concordium Blockchain Research Center at Aarhus University, Denmark.

Author information

Authors and Affiliations

University of Campinas, Campinas, Brazil
Pedro Geraldo M. R. Alves & Jheyne N. Ortiz
Aarhus University, Aarhus, Denmark
Diego F. Aranha

Authors

Pedro Geraldo M. R. Alves
View author publications
You can also search for this author in PubMed Google Scholar
Jheyne N. Ortiz
View author publications
You can also search for this author in PubMed Google Scholar
Diego F. Aranha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Geraldo M. R. Alves .

Editor information

Editors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
Nikita Borisov
KU Leuven, Leuven, Belgium
Claudia Diaz

Appendices

A Properties of Gaussian Integers

This Appendix presents important properties of Gaussian integers and useful results that can be applied on their implementation. In the following, we recall some important properties stated by Wuthrich that are useful to this work [30].

Definition 3

(Norm). The norm of a Gaussian integer is defined as its product with its conjugate^{Footnote 3}. That is, \(N(a + ib) = (a + ib) \cdot (a - ib) = a^2 + b^2,\) so \(N(\alpha ) = \alpha \cdot \overline{\alpha }\).

Proposition 1

(Wuthrich’s Proposition 5.7). For each prime number \(p\equiv 1 \mod 4\) there are exactly two Gaussian primes \(\pi \) and \(\overline{\pi }\) of norm p.

Lemma 1

(Wuthrich’s Lemma 5.4). If \(\pi \in \mathbb {Z}[i]\) is such that \(N(\pi )\) is a prime number, then \(\pi \) is a Gaussian prime.

Lemma 2

(Wuthrich’s Lemma 5.6). Let p be a prime number with \(p \equiv 1 \mod 4\). Then there exists a Gaussian prime \(\pi \) such that \(p = \pi . \overline{\pi }\).

Lemma 3

(Wuthrich’s Lemma 5.10). Any prime \(p \equiv 1 \mod 4\) can be written as a sum of two squares. This is a manifestation of Fermat’s theorem on sums of two squares.

From Lemma 2 and Proposition 1, if p is prime such that \(p \equiv 1 \mod 4\), then we know that it can be factored as a product of exactly two Gaussian primes that are the conjugate of each other. Lemma 3 is a direct consequence since we know that a prime \(p \equiv 1 \mod 4\) can be factored as \(p = \pi \cdot \overline{\pi }\) and, assuming that \(\pi = a + bi\), we obtain that \(\pi \cdot \overline{\pi } = a^2 + b^2\).

B Generating k-th Primitive Roots of i Modulo p

The use of the DGT for polynomial multiplication in a cyclotomic polynomial ring requires the computation of a k-th root of i modulo a prime p, discussed in Sect. 3.1. This element is used for achieving a cyclotomic polynomial reduction for free when n is a power of two. When p is a Mersenne prime, the literature presents efficient analytic methods; for other choices of p, the best option still is a trial-and-error approach.

Badawi et al. state that a naive implementation of such approach takes 156 hours to find a \(2^{14}\)-th primitive root of i for \(p = 2^{64}-2^{32}+1\) [5]. Because of that, they propose a more efficient strategy, when \(p \equiv 1 \mod 4\), by factoring p in two Gaussian primes, namely \(f_0\) and \(f_1\). This decomposition of p is quite simple and relies on Lemma 2 and Proposition 1.

Algorithm 6 starts from the Fermat’s Little Theorem, which states that if p is a prime then \(n^{p-1} \equiv 1 \mod p\) for all \(n \in \mathbb {Z}_p\). Hence, the square root of that must be equivalent to either 1 or \(-1\). In the latter case, we can find a number \(k^2\) such that \(k \equiv n^{(p-1)/4} \equiv i \mod p\). In other words, if \(k^2 \equiv -1 \mod p\) then \(k^2 + 1 \equiv 0\mod p\) and p divides \(k^2 + 1\). Since \(k^2+1\) factors in \((k+i)\cdot (k-i)\), we found a factorization of p.

At this point, there is no guarantee that \(k+i\) is a Gaussian prime. By Lemma 4, we find that the greatest common divisor of p and \(k+i\) is either \(k+i\) or that there exists some u such that \(u \mid p\) and \(u \mid k+i\). Thus, since \(u = \texttt {gcd}(p, k+i)\) results in a Gaussian prime, we take it as the first factor of p. From Lemma 2, \(\overline{u}\) is the second factor.

Lemma 4

Let p be an odd prime such that \(p \equiv 1 \mod 4\) and \(k \in \mathbb {Z}_p\). The greatest common divisor of p and \(k+i\) is \(k+i\) or a Gaussian prime u such that \(u \mid p\) and \(u \mid k+i\).

Proof

By the Fermat’s theorem on sums of two squares, we have that an odd prime p can be expressed as \(p = x^2 + y^2\), with \(x,y \in \mathbb {Z}\), if, and only if, \(p \equiv 1 \mod 4\). Since \(x^2 + y^2 = (x + iy)(x - iy)\) and \(N(x + iy) = N(x - iy) = p\), then \(x + iy\) and \(x - iy\) are Gaussian primes and \(p = (x + iy)(x - iy)\) is the unique factorization of p in \(\mathbb {Z}[i]\), not considering the order of the factors^{Footnote 4}.

On the other hand, we have that \((k + i)(k - i) \equiv p \mod p\), by construction. Combining the two facts, we obtain that \(p = (x + iy)(x - iy) \equiv (k + i)(k - i)\), which is equivalent to \((k + i)(k - i) = \ell (x + iy)(x - iy)\), for some \(\ell \in \mathbb {Z}\).

When \(\ell = 1\), we have an equality and we find that \((k + i)\) and \((k - i)\) are indeed the factors of p. When \(\ell \ne 1\), \((k + i)\) is not a Gaussian prime and still can be factored in \(\mathbb {Z}[i]\); otherwise, it would be a factor of p. We know that p divides \((k + i)(k - i)\) but not \(k+i\), or its conjugate, since \(k < p\) and \((k + i)/p\) is not a Gaussian integer. Then, \(k + i\) and p must share a common factor u that can be found as the greatest common divisor. Since the two factors of p are \(x + iy\) and \(x + iy\), u must be one of them.

Finally, the factors of p can be found by computing the greatest common divisor of p and \(k + i\) and then computing its conjugate. Since \(p = x^2+y^2\) and \(N(x + iy) = N(x - iy) = x^2 + y^2\), by Lemma 1, the factors are Gaussian primes.

Given a method for factoring a prime number \(p \equiv 1 \mod 4\) in \(\mathbb {Z}[i]\), Badawi et al. propose Algorithm 7, which makes much faster the step of precomputing a k-th root of i for a prime \(p \equiv 1 \mod 4\) [5]. The method starts by finding the factorization \(p = f_0 \cdot f_1 \in \mathbb {Z}_p[i]\) using the Algorithm 6. Thus, we have that each Gaussian prime \(f_j\), with \(j = \{0,1\}\), defines a cyclic group corresponding to the set of Gaussian integers modulo \(f_j\). Then, a k-th root of i modulo p, denoted as h, is constructed via CRT using that \(h_j = \zeta ^{\frac{(p-1)}{4n}}_j \mod f_j\), with \(j = \{0,1\}\), where \(\zeta _j\) is a generator for the cyclic group j.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alves, P.G.M.R., Ortiz, J.N., Aranha, D.F. (2021). Faster Homomorphic Encryption over GPGPUs via Hierarchical DGT. In: Borisov, N., Diaz, C. (eds) Financial Cryptography and Data Security. FC 2021. Lecture Notes in Computer Science(), vol 12675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-64331-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-662-64331-0_27
Published: 23 October 2021
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-64330-3
Online ISBN: 978-3-662-64331-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Faster Homomorphic Encryption over GPGPUs via Hierarchical DGT

Abstract

Access this chapter

Similar content being viewed by others

Construction of a Fully Homomorphic Encryption Scheme with Shorter Ciphertext and Its Implementation on the CUDA Platform

Speedy Cloud-RSA homomorphic scheme for preserving data confidentiality in cloud computing

Efficient number theoretic transform implementation on GPU for homomorphic encryption

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Properties of Gaussian Integers

Definition 3

Proposition 1

Lemma 1

Lemma 2

Lemma 3

B Generating k-th Primitive Roots of i Modulo p

Lemma 4

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Faster Homomorphic Encryption over GPGPUs via Hierarchical DGT

Abstract

Access this chapter

Similar content being viewed by others

Construction of a Fully Homomorphic Encryption Scheme with Shorter Ciphertext and Its Implementation on the CUDA Platform

Speedy Cloud-RSA homomorphic scheme for preserving data confidentiality in cloud computing

Efficient number theoretic transform implementation on GPU for homomorphic encryption

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Properties of Gaussian Integers

Definition 3

Proposition 1

Lemma 1

Lemma 2

Lemma 3

B Generating k-th Primitive Roots of i Modulo p

Lemma 4

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation