Abstract:
This paper presents two improved modular multiplication algorithms: variable length Interleaved modular multiplication (VLIM) algorithm and parallel modular multiplicatio...Show MoreMetadata
Abstract:
This paper presents two improved modular multiplication algorithms: variable length Interleaved modular multiplication (VLIM) algorithm and parallel modular multiplication (P_MM) method using variable length algorithms to achieve high throughput rates. The new Interleaved modular multiplication algorithm applies the zero counting and partitioning algorithm to a multiplier’s non-adjacent form (NAF). It divides this input into sections with variable-radix. The sections include a digit of zero sequences and a non-zero digit (-1 or 1) in the most valuable place. Therefore, in addition to reducing the number of required clock pulses, high-radix partial multiplication \mathbf{X}^{\left(\mathbf{i}\right)}\cdot \mathbf{Y} is simplified and performed as a binary addition or subtraction operation, and multiplication operations for consecutive zero bits are executed in one clock cycle instead of several clock cycles. The proposed parallel modular multiplication algorithm divides the multiplier into two parts. It utilizes (VLIM) and variable length Montgomery modular multiplication (VLM3) methods to compute the modular multiplication for the upper and lower portions in parallel, according to the proximity of their multiplication time. The implementation results on a Xilinx Virtex-7 FPGA show that the parallel modular multiplication computes a 2048-bit modular multiplication in 0.903 µs, with a maximum clock frequency of 387 MHz and area × time per bit value equal to 9.14.
Published in: IEEE Transactions on Computers ( Volume: 74, Issue: 1, January 2025)