Abstract
This paper improves the quotient-pipelined high radix scalable Montgomery modular multiplier by processing w-bit and k-bit words in carry save form instead of some (w + k)-bit length operands. It directly reduces both the critical path and the area overhead of the original processing elements. Then based on this improved high-radix scalable Montgomery modular multiplier, we propose an efficient hardware architecture for RSA decryption with Chinese Remainder Theorem. With simple configuration logics, the hardware unit works in three modes: (1) scalable modular reduction for precomputation, (2) scalable Montgomery modular multiplication for modular exponentiation, where an approximation method is developed to reduce the expanded result below the modulus, and (3) scalable multiplication for post-processing. Hardware implementation shows that the proposed architecture is optimal with reference to the literature in terms of speed, area, and frequency. A 4096-bit RSA decryption in XC2V6000-6 FPGA can be completed in 11.05 ms with 14041 slices/17409 LUTs, 128 16 × 16 multipliers, and 70 kbits of block RAMs. Finally, by the use of Montogmery powering ladder the modular exponentiation unit based on the improved high radix scalable Montgomery modular multiplier can be built resistant to fault and simple power attacks. A 1024-bit modular exponentiation unit with such resistances costs about 255K NAND2 gates in.18 μm CMOS process, and one full modular exponentiation takes about 1.44 ms at 250 MHz.
Similar content being viewed by others
References
Rivest R, Shamir A, Adleman L. A method for obtaining digital signatures and public-key cryptosystems. Commun ACM1978, 21: 120–126
Koblitz N. Elliptic curve cryptosystems. Math Comp, 1987, 48: 203–209
Eisenbarth T, Güneysu T, Heyse S, et al. Microeliece: Mceliece for embedded devices. In: Clavier C, Gaj K, eds. CHES. 2009, 5747: 49–64
Tenca A, Koç Ç K. Scalable architecture for Montgomery multiplication. In: First International Workshop on Cryptographic Hardware and Embedded Systems. Worcester, 1999. 94–108
Tenca A, Koç Ç K. A scalable architecture for modular multiplication based on montgomery’s algorithm. IEEE Trans Comp, 2003, 52: 1215–1221
Montgomery P. Modular multiplication without trial division. Math Comp, 1985, 44: 519–521
Orup H. Simplifying quotient determination in high-radix modular multiplication. In: The 12th IEEE Symposium on Computer Arithmetic, 1995. 193–199
Wu T. Improving radix-4 feedforward scalable Montgomery modular multiplier by precomputation and double Boothencodings. In: IEEE International Conference on Computer Science and Network Technology, Dalian, 2013. 596–600
Wu T, Li S, Liu L. Fast, compact and symmetric modular exponentiation architecture by common-multiplicand Montgomery modular multiplications. Integ VLSI J, 2013, 46: 323–332
Wang S H, Lin W C, Ye J H, et al. Fast scalable radix-4 Montgomery modular multiplier. In: IEEE Symposium on Circuits and Systems, Seoul, 2012. 3049–3052
Kelley K, Harris D. Parallelized very high radix scalable Montgomery multipliers. In: Proc. IEEE 39th Asilomar Conference on Signals, Systems, and Computers, Asilomar, 2005. 1196–1200
Jiang N, Harris D. Quotient pipelined very high radix scalable Montgomery multipliers. In: The 40th Asilomar Conference on Signals, Systems and Computers, Asilomar, 2006. 1673–1677
Quisquater J J, Couvreur C. Fast decipherment algorithm for RSA public-key cryptosystem. Electr Lett, 1982, 18: 905–907
Taylor F. Residue arithmetic: A tutorial with examples. Computer, 1984, 17: 50–62
Han L, Wang X, Xu G. On an attack on rsa with small crt-exponents. Sci China Inf Sci, 2010, 53: 1511–1518
Fournaris A. Fault and simple power attack resistant RSA using Montgomery modular multiplication. In: IEEE Symposium on Circuits and Systems, Paris, 2010. 1875–1878
Giraud C. An RSA implementation resistant to fault attacks and to simple power analysis. IEEE Trans Comp, 2006, 55: 1116–1120
Joye M, Yen S. The montgomery powering ladder. In: International Workshop on Cryptographic Hardware and Embedded Systems. Lect Notes Comp Sci, 2002. 2523: 291–302
Shand M, Vuillemin J. Fast implementations of RSA cryptography. In: The 11th IEEE Symposium on Computer Arithmetic, Windsor, 1993. 252–259
Tenca A, Todorov G, Koç Ç K. High-radix design of a scalable modular multiplier. In Koc C, Naccache D, Paar C, eds. Third International Workshop on Cryptographic Hardware and Embedded Systems (CHES 2001). Lect Notes Comp Sci, 2001, 2162: 185–201
Amberg P, Pinckney N, Harris D. Parallel high-radix Montgomery multipliers. In: 42nd Asilomar Conference on Signals, Systems and Computers, Asilomar, 2008. 772–776
Walter C. Montgomery exponentiation needs no final subtractions. Electr Lett, 1999, 35: 1831–1832
Wu T, Li S, Liu L. A two-stage pipelined architecture for parallel modular exponentiation. In: International Conference on Information Science and Technology Wuhan, 2012. 215–218
Koç Ç K. Analysis of sliding window techniques for exponentiation. Comp Math Appl, 1995, 30: 17–24
Suzuki D. How to maximize the potential of FPGA resources for modular exponentiation. In: International Workshop on Cryptographic Hardware and Embedded Systems (CHES). Lect Notes Comp Sci, 2007, 4727: 272–288
Oh J, Moon S. Modular multiplication method. IEEE Proc Comp Digital Tech, 145: 1998, 317-318
Su C, Hwang S, Chen P, et al. An improved Montgomery’s algorithm for high-speed rsa public-key cryptosystem. IEEE Trans VLSI Syst, 1999, 7: 280–284
Senturk A, Gok M. Pipelined large multiplier designs on FPGAs. In: 15th Euromicro Conference on Digital System Design (DSD), 2012. 809–814
Dhem J, Joye M, Quisquater J. Normalisation in diminished-radix modulus transformation. Electr Lett, 1997, 33: 1931
Shieh M D, Lin WC. Word-based Montgomery modular multiplication algorithm for low-latency scalable architectures. IEEE Trans Comp, 2010, 59: 1145–1151
Huang M, Gaj K, El-Ghazawi T. New hardware architecture for Montgomery modular multiplication algorithm. IEEE Trans Comp, 2011, 60: 923–936
McIvor C, McLoone M, McCanny J. Modified Montgomery modular multiplication and RSA exponentiation techniques. IEEE Proc Comp Digital Tech, 2004, 151: 402–408
Miyamoto A, Homma N, Aoki T, et al. Systematic design of RSA processors based on high-radix Montgomery multipliers. IEEE Trans VLSI Syst, 2011, 19: 1136–1146
Koç Ç K, Acar T, Kaliski B S. Analyzing and comparing Montgomery multiplication algorithms. IEEE Micro, 1996, 16: 26–33
Blum T, Paar C. High-radix Montgomery modular exponentiation on reconfigurable hardware. IEEE Trans Comp, 2001, 50: 759–764
Tang S, Tsui K, Leong P. Modular exponentiation using parallel multipliers. In: Proc. IEEE International Conference on Field-Programmable Technology, Tokyo, 2003. 52–59
Nozaki H, Motoyama M, Shimbo A, et al. Implementation of RSA algorithm based on RNS Montgomery modular multiplication. In: Third International Workshop on Cryptographic Hardware and Embedded Systems. Lect Notes Comp Sci, 2001, 2162: 364–376
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Wu, T., Li, S. & Liu, L. Fast RSA decryption through high-radix scalable Montgomery modular multipliers. Sci. China Inf. Sci. 58, 1–16 (2015). https://doi.org/10.1007/s11432-014-5215-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-014-5215-4