Skip to main content

FPGA Acceleration of Number Theoretic Transform

  • Conference paper
  • First Online:
Book cover High Performance Computing (ISC High Performance 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12728))

Included in the following conference series:

Abstract

Fully Homomorphic Encryption (FHE) is a technique that enables arbitrary computations on encrypted data directly. Number Theoretic Transform (NTT) is a fundamental component in FHE computations as it allows faster polynomial multiplication. However, it is computationally intensive and requires acceleration for practical deployment of FHE. The latency and throughput of existing NTT hardware designs are limited by the complex data communication pattern between adjacent NTT stages and the modular arithmetic operations. In this paper, we propose a parameterized architecture for NTT on FPGA. The architecture can be configured for a given polynomial degree, modulus and target hardware in order to optimize the latency and/or throughput. We develop a novel low latency fully pipelined modular arithmetic logic to implement the NTT core, the key computational unit of NTT. Streaming permutation network is used to reduce the data communication complexity between NTT stages. We implement the proposed architecture for various polynomial degrees, moduli, and data parallelism on state-of-the-art FPGAs. Experimental results show that our architecture configured to perform 4096 polynomial degree NTT achieves up to \(1.29\times \) and \(4.32\times \) improvement in latency and throughput respectively over state-of-the-art designs on FPGA.

T. Ye and Y. Yang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Given a stride S, a permutation stride is defined as reordering an m-element data vector such that elements with distance of S are shifted into adjacent locations.

  2. 2.

    All logs in this paper are to base 2.

References

  1. Aho, A.V., Hopcroft, J.E.: The Design and Analysis of Computer Algorithms. Pearson Education India (1974)

    Google Scholar 

  2. Albrecht, M., et al.: Homomorphic encryption security standard. Tech. rep. (2018)

    Google Scholar 

  3. Alkim, E., Barreto, P.S.L.M., Bindel, N., Kramer, J., Longa, P., Ricardini, J.E.: The lattice-based digital signature scheme qTESLA. In: ACNS (2020)

    Google Scholar 

  4. Alkim, E., Ducas, L., Pöppelmann, T., Schwabe, P.: Post-quantum key exchange: a new hope. In: USENIX SEC (2016)

    Google Scholar 

  5. Banerjee, U., Ukyab, T.S., Chandrakasan, A.P.: Sapphire: a configurable crypto-processor for post-quantum lattice-based protocols. In: TCHES (2019)

    Google Scholar 

  6. Barrett, P.: Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In: CRYPTO 1986 (1987)

    Google Scholar 

  7. Beneš, V.E.: Optimal rearrangeable multistage connecting networks. Bell Syst. Tech. J. 43(4), 1641–1656 (1964)

    Google Scholar 

  8. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. In: ITCS (2012)

    Google Scholar 

  9. Chen, R., Park, N., Prasanna, V.K.: High throughput energy efficient parallel FFT architecture on FPGAs. In: HPEC (2013)

    Google Scholar 

  10. Chen, R., Prasanna, V.K.: Automatic generation of high throughput energy efficient streaming architectures for arbitrary fixed permutations. In: FPL (2015)

    Google Scholar 

  11. Chen, R., Le, H., Prasanna, V.K.: Energy efficient parameterized fft architecture. In: 23rd International Conference on Field programmable Logic and Applications, pp. 1–7. IEEE (2013)

    Google Scholar 

  12. Cheon, J.H., Han, K., Kim, A., Kim, M., Song, Y.: A full RNS variant of approximate homomorphic encryption. In: Selected Areas in Cryptography - SAC (2018)

    Google Scholar 

  13. Chiou, D.: The microsoft catapult project. In: IISWC (2017)

    Google Scholar 

  14. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  15. Dowlin, N., Gilad-Bachrach, R., Laine, K., Lauter, K., Naehrig, M., Wernsing, J.: CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. Tech. Rep. MSR-TR-2016-3 (2016)

    Google Scholar 

  16. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive, Report 2012/144 (2012)

    Google Scholar 

  17. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: STOC (2009)

    Google Scholar 

  18. Intel: Stratix 10 MX FPGAs. https://www.intel.com/content/www/us/en/products/programmable/sip/stratix-10-mx.html

  19. Kim, S., Jung, W., Park, J., Ahn, J.: Accelerating number theoretic transformations for bootstrappable homomorphic encryption on GPUS. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 264–275. IEEE Computer Society, Los Alamitos (2020)

    Google Scholar 

  20. Lee, W.K., Akleylek, S., Yap, W.S., Goi, B.M.: Accelerating number theoretic transform in GPU platform for qTESLA scheme. In: ISPEC (2019)

    Google Scholar 

  21. Longa, P., Naehrig, M.: Speeding up the number theoretic transform for faster ideal lattice-based cryptography. In: Cryptology and Network Security (2016)

    Google Scholar 

  22. Mert, A.C., Karabulut, E., Öztürk, E., Savaş, E., Becchi, M., Aysu, A.: A flexible and scalable NTT hardware: applications from homomorphically encrypted deep learning to post-quantum cryptography. In: DATE (2020)

    Google Scholar 

  23. Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  24. Nejatollahi, H., Gupta, S., Imani, M., Rosing, T.S., Cammarota, R., Dutt, N.: CryptoPIM: in-memory acceleration for lattice-based cryptographic hardware. In: DAC (2020)

    Google Scholar 

  25. Nejatollahi, H., Shahhosseini, S., Cammarota, R., Dutt, N.: Exploring energy efficient quantum-resistant signal processing using array processors. In: ICASSP (2020)

    Google Scholar 

  26. Nguyen, D.T., Dang, V.B., Gaj, K.: A high-level synthesis approach to the software/hardware codesign of NTT-based post-quantum cryptography algorithms. In: ICFPT (2019)

    Google Scholar 

  27. Putnam, A., et al.: A reconfigurable fabric for accelerating large-scale datacenter services. In: ISCA (2014)

    Google Scholar 

  28. Reagen, B., et al.: Cheetah: optimizing and accelerating homomorphic encryption for private inference. In: IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 26–39. IEEE (2020)

    Google Scholar 

  29. Riazi, M.S., Laine, K., Pelton, B., Dai, W.: HEAX: an architecture for computing on encrypted data. In: ASPLOS (2020)

    Google Scholar 

  30. Seiler, G.: Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. report 2018/039 (2018)

    Google Scholar 

  31. Serpanos, D.N., Wolf, T.: Architecture of Network Systems (2011)

    Google Scholar 

  32. Sinha Roy, S., Turan, F., Jarvinen, K., Vercauteren, F., Verbauwhede, I.: Fpga-based high-performance parallel architecture for homomorphic computing on encrypted data. In: HPCA (2019)

    Google Scholar 

  33. Ullma, J.D.: Computational Aspects of VLSI (1984)

    Google Scholar 

  34. Xilinx: 7 Series FPGAs Data Sheet: Overview. https://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf

  35. Xilinx: Xilinx UltraScale+ HBM FPGAs. https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale-plus-hbm.html

  36. Yu, C.L., Kim, J.S., Deng, L., Kestur, S., Narayanan, V., Chakrabarti, C.: FPGA architecture for 2D discrete fourier transform based on 2d decomposition for large-sized data. J. Signal Process. Syst. 64(1), 109–122 (2011)

    Article  Google Scholar 

  37. Zhang, N., Yang, B., Chen, C., Yin, S., Wei, S., Liu, L.: Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT. In: TCHES (2020)

    Google Scholar 

Download references

Acknowledgement

This work has been sponsored by the U.S. National Science Foundation under grant numbers OAC-1911229 and CNS-2009057. Equipment grant by Xilinx is greatly appreciated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tian Ye .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ye, T., Yang, Y., Kuppannagari, S.R., Kannan, R., Prasanna, V.K. (2021). FPGA Acceleration of Number Theoretic Transform. In: Chamberlain, B.L., Varbanescu, AL., Ltaief, H., Luszczek, P. (eds) High Performance Computing. ISC High Performance 2021. Lecture Notes in Computer Science(), vol 12728. Springer, Cham. https://doi.org/10.1007/978-3-030-78713-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78713-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78712-7

  • Online ISBN: 978-3-030-78713-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics