Abstract:
Recently, four NIST-approved Post-Quantum Cryptography (PQC) algorithms are selected to be standardized. Three of them are lattice-based cryptographic schemes and feature...Show MoreMetadata
Abstract:
Recently, four NIST-approved Post-Quantum Cryptography (PQC) algorithms are selected to be standardized. Three of them are lattice-based cryptographic schemes and feature the number-theoretic transform (NTT) as the computing bottleneck compelling fast and low-power hardware implementations. In this work, a high-speed and power-efficient NTT accelerator is presented leveraging the compute-in-memory (CIM) technique with bottom-up optimizations. Firstly, a carry-free modular multiplication (CFMM) algorithm is proposed, which utilizes on-the-fly reduction and hybrid-redundant representation to optimize the butterfly unit operation, the cornerstone of NTT. Based on the optimized algorithm, an efficient butterfly unit in memory (BUIM) is developed by co-designing with SRAM circuit, which saves the memory access energy, decreases operation cycles, and obtains ultra-short critical path. Additionally, the data pattern of CIM array is also improved to avoid redundant memory read/write operations, which further reduces memory access overhead. Finally, a combination of pipelined operation flow and constant interstage data mapping strategy is employed to bestow the proposed hybrid-redundant CIM NTT (HRCIM-NTT) architecture with minimized computing cycles and reduced routing overhead. The implementation under 45nm CMOS technology demonstrates that HRCIM-NTT achieves the highest throughput and lowest latency among the existing CIM-based NTT accelerators.
Published in: IEEE Transactions on Circuits and Systems I: Regular Papers ( Volume: 72, Issue: 1, January 2025)