Skip to main content

Faster Homomorphic Encryption over GPGPUs via Hierarchical DGT

  • Conference paper
  • First Online:
Financial Cryptography and Data Security (FC 2021)

Abstract

Privacy guarantees are still insufficient for outsourced data processing in the cloud. While employing encryption is feasible for data at rest or in transit, it is not for computation without remarkable performance slowdown. Thus, handling data in plaintext during processing is still required, which creates vulnerabilities that can be exploited by malicious entities. Homomorphic encryption schemes enable computation over ciphertexts without knowing the related plaintexts or the decryption key. This work focuses on the challenge of developing an efficient implementation of the BFV scheme on CUDA. This is done by combining and adapting different literature approaches, as the double-CRT representation and the Discrete Galois Transform. Moreover, we propose and implement an improved formulation of the DGT inspired by classical algorithms, which computes the transform up to 2.6 times faster than the state-of-the-art. By using these approaches, we obtain up to 3.6 times faster homomorphic multiplication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    GPGPU, acronym for General-Purpose Graphics Processing Unit.

  2. 2.

    spog, acronym for “Secure Processing on GPGPUs”.

  3. 3.

    Let \(x = a + ib\) be a Gaussian integer. If y is x’s conjugate then \(y = a - ib\).

  4. 4.

    Wuthrich proves in Theorem 5.8 that every \(0 \ne \alpha \in \mathbb {Z}[i]\) has a unique factorization [30].

References

  1. Albrecht, M., Bai, S., Ducas, L.: A subfield lattice attack on overstretched NTRU assumptions. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9814, pp. 153–178. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53018-4_6

    Chapter  Google Scholar 

  2. Alves, P.: SPOG: secure processing on GPGPUs (2021). https://github.com/spog-library

  3. Alves, P., Ortiz, J.N., Aranha, D.F.: Faster homomorphic encryption over GPGPUs via hierarchical DGT. Cryptology ePrint Archive, Report 2020/861 (2020). https://eprint.iacr.org/2020/861

  4. Badawi, A.A., Polyakov, Y., Aung, K.M.M., Veeravalli, B., Rohloff, K.: Implementation and performance evaluation of RNS variants of the BFV homomorphic encryption scheme. IACR Cryptology ePrint Archive 2018, 589 (2018)

    Google Scholar 

  5. Al Badawi, A., Veeravalli, B., Aung, K.M.M.: Efficient polynomial multiplication via modified discrete Galois transform and Negacyclic convolution. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) FICC 2018. AISC, vol. 886, pp. 666–682. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-03402-3_47

    Chapter  Google Scholar 

  6. Badawi, A.Q.A., Veeravalli, B., Mun, C.F., Aung, K.M.M.: High-performance FV somewhat homomorphic encryption on GPUs: an implementation using GPUs. TCHES 1(2), 70–95 (2018)

    Article  Google Scholar 

  7. Bailey, D.H.: FFTs in external or hierarchical memory. J. Supercomput. 4(1), 23–35 (1990)

    Article  Google Scholar 

  8. Bajard, J.-C., Eynard, J., Hasan, M.A., Zucca, V.: A full RNS variant of FV like somewhat homomorphic encryption schemes. In: Avanzi, R., Heys, H. (eds.) SAC 2016. LNCS, vol. 10532, pp. 423–442. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69453-5_23

    Chapter  Google Scholar 

  9. Bajard, J.C.J., Meloni, N., Plantard, T.: Efficient RNS bases for cryptography. In: IMACS World Congress: Scientific Computation, Applied Mathematics and Simulation (2005)

    Google Scholar 

  10. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. Comput. Theory 6(3), 13:1–13:36 (2014)

    Google Scholar 

  11. Chen, H., Gilad-Bachrach, R., Han, K., Huang, Z., Jalali, A., Laine, K., Lauter, K.E.: Logistic regression over encrypted data from fully homomorphic encryption. IACR Cryptology ePrint Archive 2018, 462 (2018)

    Google Scholar 

  12. Chen, H., Laine, K., Player, R.: Simple encrypted arithmetic library - SEAL v2.1. IACR Cryptology ePrint Archive 2017, 224 (2017)

    Google Scholar 

  13. Cheon, J.H., Kim, A., Kim, M., Song, Y.: Homomorphic encryption for arithmetic of approximate numbers. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10624, pp. 409–437. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70694-8_15

    Chapter  Google Scholar 

  14. Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: TFHE: fast fully homomorphic encryption over the torus. J. Cryptol. 33(1), 34–91 (2020)

    Article  MathSciNet  Google Scholar 

  15. Chu, E., George, A.: Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms. CRC Press (1999)

    Google Scholar 

  16. Costache, A., Smart, N.P.: Which ring based somewhat homomorphic encryption scheme is best? In: Sako, K. (ed.) CT-RSA 2016. LNCS, vol. 9610, pp. 325–340. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29485-8_19

    Chapter  Google Scholar 

  17. Crandall, R.E.: Integer convolution via split-radix fast Galois transform. Center for Advanced Computation Reed College (1999)

    Google Scholar 

  18. Dai, W., Sunar, B.: cuHE: a homomorphic encryption accelerator library. In: Pasalic, E., Knudsen, L.R. (eds.) BalkanCryptSec 2015. LNCS, vol. 9540, pp. 169–186. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29172-7_11

    Chapter  Google Scholar 

  19. Ding, C., Pei, D., Salomaa, A.: Chinese Remainder Theorem: Applications in Computing, Coding, Cryptography. World Scientific (1996)

    Google Scholar 

  20. Emmart, N., Weems, C.C.: High precision integer multiplication with a GPU using Strassen’s algorithm with multiple FFT sizes. Parallel Process. Lett. 21(3), 359–375 (2011)

    Article  MathSciNet  Google Scholar 

  21. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive 2012, 144 (2012)

    Google Scholar 

  22. Gentry, C., Halevi, S., Smart, N.P.: Homomorphic evaluation of the AES circuit. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 850–867. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5_49

    Chapter  Google Scholar 

  23. Govindaraju, N.K., Lloyd, B., Dotsenko, Y., Smith, B., Manferdelli, J.: High performance discrete Fourier transforms on graphics processors. In: SC, p. 2. IEEE/ACM (2008)

    Google Scholar 

  24. Halevi, S., Polyakov, Y., Shoup, V.: An improved RNS variant of the BFV homomorphic encryption scheme. In: Matsui, M. (ed.) CT-RSA 2019. LNCS, vol. 11405, pp. 83–105. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12612-4_5

    Chapter  Google Scholar 

  25. Lindner, R., Peikert, C.: Better key sizes (and Attacks) for LWE-based encryption. In: Kiayias, A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 319–339. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19074-2_21

    Chapter  Google Scholar 

  26. Longa, P., Naehrig, M.: Speeding up the number theoretic transform for faster ideal lattice-based cryptography. In: Foresti, S., Persiano, G. (eds.) CANS 2016. LNCS, vol. 10052, pp. 124–139. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48965-0_8

    Chapter  Google Scholar 

  27. Aguilar-Melchor, C., Barrier, J., Guelton, S., Guinet, A., Killijian, M.-O., Lepoint, T.: NFLlib: NTT-based fast lattice library. In: Sako, K. (ed.) CT-RSA 2016. LNCS, vol. 9610, pp. 341–356. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29485-8_20

    Chapter  Google Scholar 

  28. Player, R.: Parameter selection in lattice-based cryptography. Ph.D. thesis, PhD thesis, Royal Holloway, University of London (2018)

    Google Scholar 

  29. Thales: 2019 Thales Data Threat Report, USA (2019). https://go.thalesesecurity.com/rs/480-LWA-970/images/2019-DTR-Global-USL-Web.pdf

  30. Wuthrich, C.: Further number theory (2011). https://www.maths.nottingham.ac.uk/plp/pmzcw/download/fnt_chap5.pdf. Accessed 18 June 2020

Download references

Acknowledgements

This work was supported in part by CNPq, grants number 164489/2018-5, 203175/2019-0, and 44265/2019-2; and CAPES grant number 1591123. We specially thank LG for financial support within project “Privacy-preserving analytics”, project number 5296; Google for GCP Research Credits Program under number 106101194491; and the Concordium Blockchain Research Center at Aarhus University, Denmark.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Geraldo M. R. Alves .

Editor information

Editors and Affiliations

Appendices

A Properties of Gaussian Integers

This Appendix presents important properties of Gaussian integers and useful results that can be applied on their implementation. In the following, we recall some important properties stated by Wuthrich that are useful to this work [30].

Definition 3

(Norm). The norm of a Gaussian integer is defined as its product with its conjugateFootnote 3. That is, \(N(a + ib) = (a + ib) \cdot (a - ib) = a^2 + b^2,\) so \(N(\alpha ) = \alpha \cdot \overline{\alpha }\).

Proposition 1

(Wuthrich’s Proposition 5.7). For each prime number \(p\equiv 1 \mod 4\) there are exactly two Gaussian primes \(\pi \) and \(\overline{\pi }\) of norm p.

Lemma 1

(Wuthrich’s Lemma 5.4). If \(\pi \in \mathbb {Z}[i]\) is such that \(N(\pi )\) is a prime number, then \(\pi \) is a Gaussian prime.

Lemma 2

(Wuthrich’s Lemma 5.6). Let p be a prime number with \(p \equiv 1 \mod 4\). Then there exists a Gaussian prime \(\pi \) such that \(p = \pi . \overline{\pi }\).

Lemma 3

(Wuthrich’s Lemma 5.10). Any prime \(p \equiv 1 \mod 4\) can be written as a sum of two squares. This is a manifestation of Fermat’s theorem on sums of two squares.

From Lemma 2 and Proposition 1, if p is prime such that \(p \equiv 1 \mod 4\), then we know that it can be factored as a product of exactly two Gaussian primes that are the conjugate of each other. Lemma 3 is a direct consequence since we know that a prime \(p \equiv 1 \mod 4\) can be factored as \(p = \pi \cdot \overline{\pi }\) and, assuming that \(\pi = a + bi\), we obtain that \(\pi \cdot \overline{\pi } = a^2 + b^2\).

B Generating k-th Primitive Roots of i Modulo p

The use of the DGT for polynomial multiplication in a cyclotomic polynomial ring requires the computation of a k-th root of i modulo a prime p, discussed in Sect. 3.1. This element is used for achieving a cyclotomic polynomial reduction for free when n is a power of two. When p is a Mersenne prime, the literature presents efficient analytic methods; for other choices of p, the best option still is a trial-and-error approach.

Badawi et al. state that a naive implementation of such approach takes 156 hours to find a \(2^{14}\)-th primitive root of i for \(p = 2^{64}-2^{32}+1\) [5]. Because of that, they propose a more efficient strategy, when \(p \equiv 1 \mod 4\), by factoring p in two Gaussian primes, namely \(f_0\) and \(f_1\). This decomposition of p is quite simple and relies on Lemma 2 and Proposition 1.

figure f

Algorithm 6 starts from the Fermat’s Little Theorem, which states that if p is a prime then \(n^{p-1} \equiv 1 \mod p\) for all \(n \in \mathbb {Z}_p\). Hence, the square root of that must be equivalent to either 1 or \(-1\). In the latter case, we can find a number \(k^2\) such that \(k \equiv n^{(p-1)/4} \equiv i \mod p\). In other words, if \(k^2 \equiv -1 \mod p\) then \(k^2 + 1 \equiv 0\mod p\) and p divides \(k^2 + 1\). Since \(k^2+1\) factors in \((k+i)\cdot (k-i)\), we found a factorization of p.

At this point, there is no guarantee that \(k+i\) is a Gaussian prime. By Lemma 4, we find that the greatest common divisor of p and \(k+i\) is either \(k+i\) or that there exists some u such that \(u \mid p\) and \(u \mid k+i\). Thus, since \(u = \texttt {gcd}(p, k+i)\) results in a Gaussian prime, we take it as the first factor of p. From Lemma 2, \(\overline{u}\) is the second factor.

Lemma 4

Let p be an odd prime such that \(p \equiv 1 \mod 4\) and \(k \in \mathbb {Z}_p\). The greatest common divisor of p and \(k+i\) is \(k+i\) or a Gaussian prime u such that \(u \mid p\) and \(u \mid k+i\).

Proof

By the Fermat’s theorem on sums of two squares, we have that an odd prime p can be expressed as \(p = x^2 + y^2\), with \(x,y \in \mathbb {Z}\), if, and only if, \(p \equiv 1 \mod 4\). Since \(x^2 + y^2 = (x + iy)(x - iy)\) and \(N(x + iy) = N(x - iy) = p\), then \(x + iy\) and \(x - iy\) are Gaussian primes and \(p = (x + iy)(x - iy)\) is the unique factorization of p in \(\mathbb {Z}[i]\), not considering the order of the factorsFootnote 4.

On the other hand, we have that \((k + i)(k - i) \equiv p \mod p\), by construction. Combining the two facts, we obtain that \(p = (x + iy)(x - iy) \equiv (k + i)(k - i)\), which is equivalent to \((k + i)(k - i) = \ell (x + iy)(x - iy)\), for some \(\ell \in \mathbb {Z}\).

When \(\ell = 1\), we have an equality and we find that \((k + i)\) and \((k - i)\) are indeed the factors of p. When \(\ell \ne 1\), \((k + i)\) is not a Gaussian prime and still can be factored in \(\mathbb {Z}[i]\); otherwise, it would be a factor of p. We know that p divides \((k + i)(k - i)\) but not \(k+i\), or its conjugate, since \(k < p\) and \((k + i)/p\) is not a Gaussian integer. Then, \(k + i\) and p must share a common factor u that can be found as the greatest common divisor. Since the two factors of p are \(x + iy\) and \(x + iy\), u must be one of them.

Finally, the factors of p can be found by computing the greatest common divisor of p and \(k + i\) and then computing its conjugate. Since \(p = x^2+y^2\) and \(N(x + iy) = N(x - iy) = x^2 + y^2\), by Lemma 1, the factors are Gaussian primes.

Given a method for factoring a prime number \(p \equiv 1 \mod 4\) in \(\mathbb {Z}[i]\), Badawi et al. propose Algorithm 7, which makes much faster the step of precomputing a k-th root of i for a prime \(p \equiv 1 \mod 4\) [5]. The method starts by finding the factorization \(p = f_0 \cdot f_1 \in \mathbb {Z}_p[i]\) using the Algorithm 6. Thus, we have that each Gaussian prime \(f_j\), with \(j = \{0,1\}\), defines a cyclic group corresponding to the set of Gaussian integers modulo \(f_j\). Then, a k-th root of i modulo p, denoted as h, is constructed via CRT using that \(h_j = \zeta ^{\frac{(p-1)}{4n}}_j \mod f_j\), with \(j = \{0,1\}\), where \(\zeta _j\) is a generator for the cyclic group j.

figure g

Rights and permissions

Reprints and permissions

Copyright information

© 2021 International Financial Cryptography Association

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alves, P.G.M.R., Ortiz, J.N., Aranha, D.F. (2021). Faster Homomorphic Encryption over GPGPUs via Hierarchical DGT. In: Borisov, N., Diaz, C. (eds) Financial Cryptography and Data Security. FC 2021. Lecture Notes in Computer Science(), vol 12675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-64331-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-64331-0_27

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-64330-3

  • Online ISBN: 978-3-662-64331-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics