skip to main content
research-article

ProgramGalois: A Programmable Generator of Radix-4 Discrete Galois Transformation Architecture for Lattice-Based Cryptography

Published: 07 November 2024 Publication History

Abstract

Lattice-based cryptography (LBC) has been established as a prominent research field, with particular attention on post-quantum cryptography (PQC) and fully homomorphic encryption (FHE). As the implementing bottleneck of PQC and FHE, number theoretic transform (NTT) has been extensively studied. However, current works struggled with scalability, hindering their adaptation to various parameters, such as bit width and polynomial length. In this article, we proposed a novel Discrete Galois Transformation (DGT) algorithm utilizing the radix-4 variant to achieve a higher level of parallelism to the existing NTT. Furthermore, to implement the efficient radix-4 DGT adapting more LBCs, we proposed a set of scalable building blocks, including a modified Barrett modular multiplier accepting arbitrary modulus with only one integer multiplier, a radix-4 DGT butterfly unit, and a stream permutation network. The proposed modules are implemented on the Xilinx Virtex-7 and U250 FPGA to evaluate resource utilization and performance. Lastly, a design space exploration framework is proposed to generate optimized radix-4 DGT hardware constrained by polynomial and platform parameters. The sensitivity analysis showcases the generated hardware’s performance and scalability. The implementation results on the Xilinx Virtex-7 and U250 FPGA show significant performance improvements over the state-of-the-art works, which reached at least 35%, 192%, and 68% area-time product improvements in terms of LUTs, BRAMs, and DSPs, respectively.

References

[1]
[2]
Github. 2024. SpinalHDL. Retrieved from January 16, 2024 https://github.com/SpinalHDL
[3]
Abbas Acar, Hidayet Aksu, A. Selcuk Uluagac, and Mauro Conti. 2018. A survey on homomorphic encryption schemes: Theory and implementation. ACM Computing Surveys (Csur) 51, 4 (2018), 1–35.
[4]
Rashmi Agrawal, Lake Bu, and Michel A. Kinsy. 2020. Fast arithmetic hardware library for RLWE-based homomorphic encryption. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM ’20). IEEE, 206–206.
[5]
Ahmad Al Badawi, Yuriy Polyakov, Khin Mi Mi Aung, Bharadwaj Veeravalli, and Kurt Rohloff. 2019. Implementation and performance evaluation of RNS variants of the BFV homomorphic encryption scheme. IEEE Transactions on Emerging Topics in Computing 9, 2 (2019), 941–956.
[6]
Ahmad Al Badawi, Bharadwaj Veeravalli, and Khin Mi Mi Aung. 2019. Efficient polynomial multiplication via modified discrete galois transform and negacyclic convolution. In Advances in Information and Communication Networks: Proceedings of the 2018 Future of Information and Communication Conference (FICC ’19), Vol. 1. Springer, 666–682.
[7]
Jonathan Bachrach, Huy D. Vo, Brian C. Richards, Yunsup Lee, Andrew Waterman, Rimas Avizienis, John Wawrzynek, and Krste Asanović. 2012. Chisel: Constructing hardware in a Scala embedded language. DAC Design Automation Conference 2012 (2012), 1212–1221. Retrieved from https://api.semanticscholar.org/CorpusID:1501345
[8]
Henry G. Baker. 1993. Complex Gaussian integers for ’Gaussian Graphics’. ACM Sigplan Notices 28, 11 (1993), 22–27.
[9]
Paul Barrett. 2000. Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In Advances in Cryptology—CRYPTO’86: Proceedings. Springer, 311–323.
[10]
Joppe Bos, Léo Ducas, Eike Kiltz, Tancrède Lepoint, Vadim Lyubashevsky, John M. Schanck, Peter Schwabe, Gregor Seiler, and Damien Stehlé. 2018. CRYSTALS-Kyber: A CCA-secure module-lattice-based KEM. In 2018 IEEE European Symposium on Security and Privacy (EuroS & P ’18). IEEE, 353–367.
[11]
Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. 2014. (Leveled) fully homomorphic encryption without bootstrapping. ACM Transactions on Computation Theory (TOCT) 6, 3 (2014), 1–36.
[12]
Xiangren Chen, Bohan Yang, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2022. CFNTT: Scalable radix-2/4 NTT multiplication architecture with an efficient conflict-free memory mapping scheme. IACR Transactions on Cryptographic Hardware and Embedded Systems, 1 (2022), 94–126.
[13]
Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. 2017. Homomorphic encryption for arithmetic of approximate numbers. In Advances in Cryptology–ASIACRYPT 2017: 23rd International Conference on the Theory and Applications of Cryptology and Information Security. Springer, 409–437.
[14]
Eleanor Chu and Alan George. 1999. Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms. CRC press.
[15]
Léo Ducas, Eike Kiltz, Tancrede Lepoint, Vadim Lyubashevsky, Peter Schwabe, Gregor Seiler, and Damien Stehlé. 2018. Crystals-dilithium: A lattice-based digital signature scheme. IACR Transactions on Cryptographic Hardware and Embedded Systems, 1 (2018), 238–268.
[16]
Junfeng Fan and Frederik Vercauteren. 2012. Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive. Retrieved from https://eprint.iacr.org/2012/144
[17]
Pierre-Alain Fouque, Jeffrey Hoffstein, Paul Kirchner, Vadim Lyubashevsky, Thomas Pornin, Thomas Prest, Thomas Ricosset, Gregor Seiler, William Whyte, and Zhenfei Zhang. 2018. Falcon: Fast-Fourier lattice-based compact signatures over NTRU. Submission to the NIST’s Post-Quantum Cryptography Standardization Process 36 (2018), 1–75.
[18]
W. Morven Gentleman and Gordon Sande. 1966. Fast Fourier transforms: For fun and profit. In Proceedings of the November 7-10, 1966, Fall Joint Computer Conference, 563–578.
[19]
Sunwoong Kim, Keewoo Lee, Wonhee Cho, Jung Hee Cheon, and Rob A. Rutenbar. 2019. FPGA-based accelerators of fully pipelined modular multipliers for homomorphic encryption. In 2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig ’19). IEEE, 1–8.
[20]
Sunwoong Kim, Keewoo Lee, Wonhee Cho, Yujin Nam, Jung Hee Cheon, and Rob A. Rutenbar. 2020. Hardware architecture of a number theoretic transform for a bootstrappable RNS-based homomorphic encryption scheme. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM ’20). IEEE, 56–64.
[21]
Guangyan Li, Donglong Chen, Gaoyu Mao, Wangchen Dai, Abdurrashid Ibrahim Sanka, and Ray C. C. Cheung. 2023. Algorithm-hardware co-design of split-radix discrete galois transformation for KyberKEM. IEEE Transactions on Emerging Topics in Computing, 11, 4 (2023), 1–15.
[22]
Ahmet Can Mert, Aikata, Sunmin Kwon, Youngsam Shin, Donghoon Yoo, Yongwoo Lee, Sujoy Sinha Roy. 2022. Medha: Microcoded hardware accelerator for computing on encrypted data. arXiv:2210.05476. Retrieved from https://eprint.iacr.org/2022/480
[23]
Ahmet Can Mert, Erdinç Öztürk, and Erkay Savaş. 2019. Design and implementation of encryption/decryption architectures for BFV homomorphic encryption scheme. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 2 (2019), 353–362.
[24]
Erdinç Öztürk, Yarki̇n Doröz, Erkay Savaş, and Berk Sunar. 2016. A custom accelerator for homomorphic encryption applications. IEEE Transactions on Computers 66, 1 (2016), 3–16.
[25]
Brandon Reagen, Woo-Seok Choi, Yeongil Ko, Vincent T. Lee, Hsien-Hsin S. Lee, Gu-Yeon Wei, and David Brooks. 2021. Cheetah: Optimizing and accelerating homomorphic encryption for private inference. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA ’21). IEEE, 26–39.
[26]
Dayane Reis, Jonathan Takeshita, Taeho Jung, Michael Niemier, and Xiaobo Sharon Hu. 2020. Computing-in-memory for performance and energy-efficient homomorphic encryption. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 11 (2020), 2300–2313.
[27]
M. Sadegh Riazi, Kim Laine, Blake Pelton, and Wei Dai. 2020. HEAX: An architecture for computing on encrypted data. In 25th International Conference on Architectural Support for Programming Languages and Operating Systems, 1295–1309.
[28]
Sujoy Sinha Roy, Ahmet Can Mert, Aikata, Sunmin Kwon, Youngsam Shin, and Donghoon Yoo. 2021. Accelerator for Computing on Encrypted Data. Cryptology ePrint Archive, Paper 2021/1555. Retrieved from https://eprint.iacr.org/2021/1555
[29]
Sujoy Sinha Roy, Furkan Turan, Kimmo Jarvinen, Frederik Vercauteren, and Ingrid Verbauwhede. 2019. FPGA-based high-performance parallel architecture for homomorphic computing on encrypted data. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA 19). IEEE, 387–398.
[30]
Sujoy Sinha Roy, Frederik Vercauteren, Nele Mentens, Donald Donglong Chen, and Ingrid Verbauwhede. 2014. Compact ring-LWE cryptoprocessor. In Cryptographic Hardware and Embedded Systems–CHES 2014: 16th International Workshop. Springer, 371–391.
[31]
Nikola Samardzic, Axel Feldmann, Aleksandar Krastev, Srinivas Devadas, Ronald Dreslinski, Christopher Peikert, and Daniel Sanchez. 2021. F1: A fast and programmable accelerator for fully homomorphic encryption. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 238–252.
[32]
Nikola Samardzic, Axel Feldmann, Aleksandar Krastev, Nathan Manohar, Nicholas Genise, Srinivas Devadas, Karim Eldefrawy, Chris Peikert, and Daniel Sanchez. 2022. Craterlake: A hardware accelerator for efficient unbounded computation on encrypted data. In 49th Annual International Symposium on Computer Architecture, 173–187.
[33]
François Serre, Thomas Holenstein, and Markus Püschel. 2016. Optimal circuits for streamed linear permutations using RAM. In 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 215–223.
[34]
François Serre and Markus Püschel. 2016. Generalizing block LU factorization: A lower–upper–lower block triangular decomposition with minimal off-diagonal ranks. Linear Algebra and Its Applications 509 (2016), 114–142.
[35]
Yang Su, Bai-Long Yang, Chen Yang, Ze-Peng Yang, and Yi-Wei Liu. 2022. A highly unified reconfigurable multicore architecture to speed up NTT/INTT for homomorphic polynomial multiplication. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 30, 8 (2022), 993–1006.
[36]
Guozhu Xin, Yifan Zhao, and Jun Han. 2021. A multi-layer parallel hardware architecture for homomorphic computation in machine learning. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS ’21). IEEE, 1–5.
[37]
Yufei Xing and Shuguo Li. 2021. A compact hardware implementation of CCA-secure key exchange mechanism CRYSTALS-KYBER on FPGA. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2 (2021), 328–356.
[38]
Yang Yang, Sanmukh R. Kuppannagari, Rajgopal Kannan, and Viktor K. Prasanna. 2022. NTTGen: A framework for generating low latency NTT implementations on FPGA. In 19th ACM International Conference on Computing Frontiers, 30–39.
[39]
Tian Ye, Rajgopal Kannan, and Viktor K. Prasanna. 2022. FPGA acceleration of fully homomorphic encryption over the torus. In 2022 IEEE High Performance Extreme Computing Conference (HPEC ’22). IEEE, 1–7.
[40]
Zewen Ye, Ray C. C. Cheung, and Kejie Huang. 2022. PipeNTT: A pipelined number theoretic transform architecture. IEEE Transactions on Circuits and Systems II: Express Briefs 69, 10 (2022), 4068–4072.
[41]
Neng Zhang, Qiao Qin, Hang Yuan, Chenggao Zhou, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2019. NTTU: An area-efficient low-power NTT-uncoupled architecture for NTT-based multiplication. IEEE Transactions on Computers 69, 4 (2019), 520–533.
[42]
Neng Zhang, Bohan Yang, Chen Chen, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2020. Highly efficient architecture of NewHope-NIST on FPGA using low-complexity NTT/INTT. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2 (2020), 49–72.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 17, Issue 4
December 2024
303 pages
EISSN:1936-7414
DOI:10.1145/3613637
  • Editor:
  • Deming Chen
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024
Online AM: 24 August 2024
Accepted: 02 August 2024
Revised: 03 June 2024
Received: 14 September 2023
Published in TRETS Volume 17, Issue 4

Check for updates

Author Tags

  1. Lattice-based Cryptography
  2. Number Theoretic Transform (NTT)
  3. Discrete Galois Transform (DGT)
  4. FPGA architecture

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • Hong Kong Innovation and Technology Commission
  • City University of Hong Kong
  • Hong Kong Innovation and Technology Commission
  • National Natural Science Foundation of China
  • Guangdong Provincial Key Laboratory IRADS
  • Guangdong Basic and Applied Basic Research Foundation-General
  • Guangdong Province General Universities Key Field
  • UIC Research

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 419
    Total Downloads
  • Downloads (Last 12 months)419
  • Downloads (Last 6 weeks)90
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media