Skip to main content
Log in

Efficient number theoretic transform implementation on GPU for homomorphic encryption

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Lattice-based cryptography forms the mathematical basis for current homomorphic encryption schemes, which allows computation directly on encrypted data. Homomorphic encryption enables privacy-preserving applications such as secure cloud computing; yet, its practical applications suffer from the high computational complexity of homomorphic operations. Fast implementations of the homomorphic encryption schemes heavily depend on efficient polynomial arithmetic, multiplication of very large degree polynomials over polynomial rings, in particular. Number theoretic transform (NTT) accelerates large polynomial multiplication significantly, and therefore, it is the core arithmetic operation in the majority of homomorphic encryption scheme implementations. Therefore, practical homomorphic applications require efficient and fast implementations of NTT in different computing platforms. In this work, we present an efficient and fast implementation of NTT, inverse NTT and NTT-based polynomial multiplication operations for GPU platforms. To demonstrate that our GPU implementation can be utilized as an actual accelerator, we experimented with the key generation, the encryption and the decryption operations of the Brakerski/Fan–Vercauteren (BFV) homomorphic encryption scheme implemented in Microsoft’s SEAL homomorphic encryption library on GPU, all of which heavily depend on the NTT-based polynomial multiplication. Our GPU implementations improve the performance of these three BFV operations by up to 141.95\(\times\), 105.17\(\times\) and 90.13\(\times\), respectively, on Tesla v100 GPU compared to the highly optimized SEAL library running on an Intel i9-7900X CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. A sample code is available at https://github.com/SU-CISEC/gpu-ntt.

References

  1. Aguilar-Melchor C, Barrier J, Guelton S, Guinet A, Killijian MO, Lepoint T (2016) Nfllib: Ntt-based fast lattice library. Topics in Cryptology. In: Cryptographers’ Track at the RSA Conference. San Francisco, CA, USA, pp. 341–356

  2. Al Badawi A, Hoang L, Mun CF, Laine K, Aung KMM (2020) Privft: Private and fast text classification with homomorphic encryption. IEEE Access 8:226544–226556

    Article  Google Scholar 

  3. Al Badawi A, Veeravalli B, Aung KMM (2018) Faster number theoretic transform on graphics processors for ring learning with errors based cryptography. In: 2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI). IEEE, pp. 26–31

  4. Al Badawi A, Veeravalli B, Aung KMM, Hamadicharef B (2018) Accelerating subset sum and lattice based public-key cryptosystems with multi-core cpus and gpus. J Parallel Distrib Comput 119:179–190

    Article  Google Scholar 

  5. Al Badawi A, Veeravalli B, Lin J, Xiao N, Kazuaki M, Mi AKM (2021) Multi-gpu design and performance evaluation of homomorphic encryption on gpu clusters. IEEE Trans Parallel Distrib Syst 32(2):379–391

    Article  Google Scholar 

  6. Al Badawi A, Veeravalli B, Mun CF, Aung KMM (2018) High-performance FV somewhat homomorphic encryption on gpus: an implementation using cuda. IACR Transactions on Cryptographic Hardware and Embedded Systems pp. 70–95

  7. Alkım E, Bilgin YA, Cenk M (2019) Compact and simple RLWE based key encapsulation mechanism. In: International Conference on Cryptology and Information Security in Latin America. Springer, pp. 237–256

  8. Alves PGMR, Ortiz JN, Aranha DF (2020) Faster homomorphic encryption over gpgpus via hierarchical DGT. Cryptology ePrint Archive, Report 2020/861

  9. Angel S, Chen H, Laine K, Setty S (2018) PIR with compressed queries and amortized query processing. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 962–979. IEEE

  10. Bajard JC, Eynard J, Hasan MA, Zucca V (2016) A full RNS variant of FV like somewhat homomorphic encryption schemes. In: International Conference on Selected Areas in Cryptography. NL, Canada, pp. 423–442

  11. Barrett P (1986) Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. Adv Cryptol CRYPTO-86 263:311–323

    Article  MathSciNet  Google Scholar 

  12. Brakerski Z (2012) Fully homomorphic encryption without modulus switching from classical gapsvp. In: Annual Cryptology Conference. Springer, pp. 868–886

  13. Brakerski Z, Gentry C, Vaikuntanathan V (2014) (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans Comput Theory (TOCT) 6(3):1–36

    Article  MathSciNet  Google Scholar 

  14. Brutzkus A, Elisha O (2019) Gilad-Bachrach, R.: Low latency privacy preserving inference. In: International Conference on Machine Learning

  15. Cheon JH, Kim A, Kim M, Song Y (2017) Homomorphic encryption for arithmetic of approximate numbers. In: International Conference on the Theory and Application of Cryptology and Information Security, pp. 409–437. Springer

  16. Chu E, George A (1999) Inside the FFT black box: serial and parallel fast Fourier transform algorithms. CRC Press, Boca Raton

    Book  Google Scholar 

  17. Dai W, Sunar B (2015) cuHE: a homomorphic encryption accelerator library. In: International Conference on Cryptography and Information Security in the Balkans. Springer, pp. 169–186

  18. Bernstein DJ (2008) The salsa20 family of stream ciphers. Lect Notes Comput Sci 4986:84–97. https://doi.org/10.1007/978-3-540-68351-3_8

    Article  Google Scholar 

  19. Fan J, Vercauteren F (2012) Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive, Report 2012/14

  20. Feng X, Li S, Xu S (2019) RLWE-oriented high-speed polynomial multiplier utilizing multi-lane stockham NTT algorithm. IEEE Transactions on Circuits and Systems II: Express Briefs. p. 1 . https://doi.org/10.1109/TCSII.2019.2917621

  21. Gentry C, Boneh D (2009) A Fully Homomorphic Encryption Scheme, vol 20. Stanford university, Stanford

    Google Scholar 

  22. Ghosh M. Salsa20 cuda. https://github.com/moinakg/salsa20_core_cuda

  23. Goey JZ, Lee WK, Goi BM et al (2021) Accelerating number theoretic transform in GPU platform for fully homomorphic encryption. J Supercomput 77:1455–1474. https://doi.org/10.1007/s11227-020-03156-7

    Article  Google Scholar 

  24. Gupta N, Jati A, Chauhan AK, Chattopadhyay A (2020) PQC acceleration using gpus: FrodoKEM, NewHope and Kyber. IEEE Transactions on Parallel and Distributed Systems, p. 1

  25. Halevi S, Shoup V (2014) Algorithms in Helib. Advances in Cryptology-CRYPTO 2014. Santa Barbara, CA, USA, pp 554–571

  26. Karatsuba AA, Ofman YP (1962) Multiplication of many-digital numbers by automatic computers. In: Doklady Akademii Nauk, vol. 145, pp. 293–294. Russian Academy of Sciences

  27. Kim S, Jung W, Park J, Ahn JH (2020) Accelerating number theoretic transformations for bootstrappable homomorphic encryption on gpus. In: 2020 IEEE International Symposium on Workload Characterization (IISWC). https://doi.org/10.1109/iiswc50251.2020.00033

  28. Lee WK, Akleylek S, Wong DCK et al (2021) Parallel implementation of nussbaumer algorithm and number theoretic transform on a GPU platform: application to qTESLA. J Supercomput 77:3289–3314. https://doi.org/10.1007/s11227-020-03392-x

    Article  Google Scholar 

  29. Lee WK, Akleylek S, Yap WS, Goi BM (2019) Accelerating number theoretic transform in gpu platform for qtesla scheme. In: International Conference on Information Security Practice and Experience. Springer, pp. 41–55

  30. Longa P, Naehrig M (2016) Speeding up the number theoretic transform for faster ideal lattice-based cryptography. In: Cryptology and Network Security. Milan, Italy, pp. 124–139

  31. Lyubashevsky V, Peikert C, Regev O (2010) On ideal lattices and learning with errors over rings. In: Advances in Cryptology-EUROCRYPT. French Riviera, pp. 1–23

  32. Mera JMB, Karmakar A, Verbauwhede I (2020) Time-memory trade-off in toom-cook multiplication: an application to module-lattice based cryptography. IACR Transactions on Cryptographic Hardware and Embedded Systems, pp. 222–244

  33. Mert AC, Öztürk E, Savaş E (2019) Design and implementation of encryption/decryption architectures for BFV homomorphic encryption scheme. IEEE Trans Very Large Scale Integr (VLSI) Syst 28(2):353–362

    Article  Google Scholar 

  34. Pollard JM (1971) The fast Fourier transform in a finite field. Math Comput 25(114):365–374

    Article  MathSciNet  Google Scholar 

  35. Polyakov Y, Rohloff K, Ryan GW (2017) Palisade lattice cryptography library user manual. Cybersecurity Research Center, New Jersey Institute of Technology (NJIT), Tech. Rep

  36. Pöppelmann T, Oder T, Güneysu T (2015) High-performance ideal lattice-based cryptography on 8-bit atxmega microcontrollers. In: International Conference on Cryptology and Information Security in Latin America. Springer, pp. 346–365

  37. Riazi MS, Laine K, Pelton B, Dai W (2020) Heax: an architecture for computing on encrypted data. In: Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS’ 20, pp. 1295-1309. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3373376.3378523

  38. Roy SS, Turan F, Jarvinen K, Vercauteren F, Verbauwhede I (2019) Fpga-based high-performance parallel architecture for homomorphic computing on encrypted data. Cryptology ePrint Archive, Report 2019/160

  39. Roy SS, Vercauteren F, Mentens N, Chen DD, Verbauwhede I (2014) Compact ring-lwe cryptoprocessor. In: Batina L, Robshaw M (eds) Cryptographic Hardware and Embedded Systems-CHES 2014. Springer, Berlin, pp 371–391

    Google Scholar 

  40. Sahu G, Rohloff K (2020) Accelerating lattice based proxy re-encryption schemes on gpus. In: Krenn S, Shulman H, Vaudenay S (eds) Cryptology and Network Security. Springer International Publishing, Cham, pp 613–632

    Chapter  Google Scholar 

  41. Microsoft, SEAL, (2020) Microsoft Research. Redmond, Microsoft SEAL, (release 3.6). https://github.com/Microsoft/SEAL

  42. Seiler G (2018) Faster AVX2 optimized NTT multiplication for ring-LWE lattice cryptography. IACR Cryptol ePr Arch 2018:39

    Google Scholar 

  43. Sinha Roy S, Järvinen K, Vliegen J, Vercauteren F, Verbauwhede I (2018) Hepcloud: an fpga-based multicore processor for FV somewhat homomorphic function evaluation. IEEE Trans Comp 67(11):1637–1650. https://doi.org/10.1109/TC.2018.2816640

    Article  MathSciNet  MATH  Google Scholar 

  44. Toom AL (1963) The complexity of a scheme of functional elements realizing the multiplication of integers. Sov Math Dokl 3:714–716

    MATH  Google Scholar 

  45. Zhang N, Yang B, Chen C, Yin S, Wei S, Liu L (2020) Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT. IACR Trans on CHES 2:49–72. https://doi.org/10.13154/tches.v2020.i2.49-72

    Article  Google Scholar 

  46. Zheng, Z (2020) Encrypted Cloud using GPUs, (Master's Thesis, KU Leuven, Leuven, Belgium). Retrieved from https://www.esat.kuleuven.be/cosic/publications/thesis-394.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmet Can Mert.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by TÜBİTAK under Grant Number 118E725.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Özerk, Ö., Elgezen, C., Mert, A.C. et al. Efficient number theoretic transform implementation on GPU for homomorphic encryption. J Supercomput 78, 2840–2872 (2022). https://doi.org/10.1007/s11227-021-03980-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03980-5

Keywords

Navigation