Skip to main content
Log in

Fast Multi-precision Multiplication for Public-Key Cryptography on Embedded Microprocessors

  • Published:
Journal of Cryptology Aims and scope Submit manuscript

Abstract

Multi-precision multiplication is one of the most fundamental operations on microprocessors to allow public-key cryptography such as RSA and elliptic curve cryptography. In this paper, we present a novel multiplication technique that increases the performance of multiplication by sophisticated caching of operands. Our method significantly reduces the number of needed load instructions which is usually one of the most expensive operations on modern processors. We evaluate our new technique on an 8-b ATmega128 and a 32-b ARM7TDMI microcontroller and compare the results with existing solutions. For the ATmega128, our implementation needs only 2395 clock cycles for a 160-b multiplication. The number of required load instructions is reduced from 167 (needed for the best known hybrid multiplication) to only 80. On the ARM7TDMI, our implementation needs only 281 clock cycles as opposed to 357. For both platforms, the proposed technique outperforms related work by a factor of about 10–23%. We also show that the method scales very well even for larger integer sizes (required for RSA) and limited register sets. It fully complies with existing multiply–accumulate instructions that are integrated in most of the available processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Note that we do not consider multiplication methods such as Karatsuba–Ofman or FFT in this paper since they are considered to require more resources and memory accesses on common microcontrollers than the given methods  [11].

  2. We assume the allocation of three registers for the accumulator register, whereas \(2+\lceil \log _2 (n)/W\rceil \) registers are actually needed to maintain the sum of partial products.

  3. The code generator is available from http://www.iaik.tugraz.at/content/research/sesys/tools/mulopcache/.

  4. Note that a fully unrolled implementation using such large integer multiplications might be impractical due to the huge amount of code.

  5. The early-terminating multiplier of the ARM7TDMI is susceptible to side-channel attacks that exploit the data-dependent runtime of the multiplier as shown in [6].

References

  1. Atmel Corporation, 8-bit AVR Microcontroller with 128K Bytes In-System Programmable Flash (2007). http://www.atmel.com/dyn/resources/prod_documents/doc2467.pdf

  2. Atmel Corporation, 8-bit AVR Instruction Set (2008). http://www.atmel.com/dyn/resources/prod_documents/doc0856.pdf

  3. M. Aydos, T. Yanik, Ç .K. Koç, An high-speed ECC-based wireless authentication protocol on an ARM microprocessor, in 16th Annual Computer Security Applications Conference (ACSAC 2000), 11–15 December 2000, New Orleans, Louisiana, USA (IEEE, 2000), pp. 401–410

  4. Certicom Research, Standards for Efficient Cryptography, SEC 2: Recommended Elliptic Curve Domain Parameters, Version 2.0 (2010). http://www.secg.org/

  5. P. Comba, Exponentiation cryptosystems on the IBM PC. IBM Syst. J.29(4), 526–538 (1990)

    Article  Google Scholar 

  6. J. Großschädl, E. Oswald, D. Page, M. Tunstall, Side channel analysis of cryptographic software via early-terminating multiplications, in Information, Security and Cryptology ? ICISC 2009, 12th International Conference, Seoul, Korea, December 2–4, 2009, Proceedings, Lecture Notes in Computer Science, vol. 5984, Seoul, Korea (Springer, 2010), pp. 176–192

  7. J. Großschädl, E. Savaş, Instruction set extensions for fast arithmetic in finite fields GF(\(p\)) and GF(\(2^m\)), in CHES 2004, 6th International Workshop, Cambridge, MA, USA, August 11–13, 2004, LNCS, vol. 3156 (Springer, 2004), pp. 133–147

  8. N. Gura, A. Patel, A. Wander, H. Eberle, S. C. Shantz, Comparing elliptic curve cryptography and RSA on 8-Bit CPUs, in CHES 2004, 6th International Workshop, Cambridge, USA, August 11–13, 2004 (Springer, 2004), pp. 119–132

  9. IAR Systems, IAR Embedded Workbench (2012). http://www.iar.com/

  10. A. Kargl, S. Pyka, H. Seuschek, Fast arithmetic on ATmega128 for elliptic curve cryptography (2008). Cryptology ePrint Archive http://eprint.iacr.org/. Report 2008/442

  11. Ç. K. Koç, High speed RSA implementation. Technical Report, RSA Laboratories, RSA Data Security, Inc., Redwood City (1994)

  12. C. Lederer, R. Mader, M. Koschuch, J. Großschädl, A. Szekely, S. Tillich, Energy-efficient implementation of ECDH key exchange for wireless sensor networks, in 3rd International Workshop in Information Security Theory and Practices—WISTP 2009, Brussels, Belgium, September 1–4, 2009, LNCS, vol. 5746 (Springer, 2009), pp. 112–127

  13. A. Lenstra, E. Verheul, Selecting cryptographic key sizes. J. Cryptol.14(4), 255–293 (2001)

    Article  MathSciNet  Google Scholar 

  14. A. Liu, P. Ning, TinyECC: a configurable library for elliptic curve cryptography in wireless sensor networks, in International Conference on Information Processing in Sensor Networks-IPSN 2008, April 22–24, 2008, St. Louis, Missouri, USA, pp. 245–256 (2008)

  15. Z. Liu, J. Großschädl, I. Kizhvatov, Efficient and side-channel resistant RSA implementation for 8-bit AVR microcontrollers, in Workshop on the Security of the Internet of Things-SOCIOT 2010, 1st International Workshop, November 29, 2010, Tokyo, Japan. IEEE Computer Society (2010)

  16. M. Medwed, E. Oswald, Template attacks on ECDSA, in 9th International Workshop on Information Security Applications (WISA 2008), Jeju Island, Korea, September 23–25, 2008, Pre-Proceedings, ed. by K.-I. Chung, M. Yung, K. Sohn, pp. 14–27 (2008)

  17. National Institute of Standards and Technology (NIST), SP800-57 Part 1: DRAFT Recommendation for Key Management: Part 1: General (2011). http://csrc.nist.gov/publications/drafts/800-57/Draft_SP800-57-Part1-Rev3_May2011.pdf

  18. nongnu.org. AVR Libc Home Page (2012). http://www.nongnu.org/avr-libc/

  19. J. Pelzl, T. Wollinger, J. Guajardo, C. Paar, Hyperelliptic curve cryptosystems: closing the performance gap to elliptic curves, in Cryptographic Hardware and Embedded Systems–CHES 2003, 5th International Workshop, Cologne, Germany, September 8-10, 2003, Proceedings, ed. by C.D. Walter, Ç.K. Koç, C. Paar. Lecture Notes in Computer Science, vol. 2779 (Springer, 2003), pp. 351–365

  20. Rowley. Crossworks for AVR (2012). http://www.rowley.co.uk/avr/

  21. M. Scott, P. Szczechowiak, Optimizing multiprecision multiplication for public key cryptography (2007). Cryptology ePrint Archive http://eprint.iacr.org/. Report 2007/299

  22. P. Szczechowiak, L.B. Oliveira, M. Scott, M. Collier, R. Dahab, NanoECC: testing the limits of elliptic curve cryptography in sensor networks, in Wireless Sensor Networks 5th European Conference, EWSN 2008, Bologna, Italy, January 30-February 1, 2008, ed. by R. Verdone. LNCS, vol. 4913 (Springer, 2008), pp. 305–320

  23. O. Ugus, A. Hessler, D. Westhoff, Performance of additive homomorphic EC-ElGamal encryption for TinyPEDS, in GI/ITG KuVS Fachgespr?ch ?Drahtlose Sensornetze?, RWTH Aachen, 2007. UbiSec (2007)

  24. L. Uhsadel, A. Poschmann, C. Paar, Enabling full-size public-key algorithms on 8-bit sensor nodes, In Security and Privacy in Ad-hoc and Sensor Networks 4th European Workshop, ESAS 2007, Cambridge, UK, July 2–3 (2007)

  25. H. Wang, Q. Li, Efficient implementation of public key cryptosystems on mote sensors, in Information and Communications Security 8th International Conference, ICICS 2006, Raleigh, NC, USA, December 4–7, 2006. LNCS, vol. 4307 (Springer, 2006), pp. 519–528

  26. S.-B. Xu, L. Batina, Efficient implementation of elliptic curve cryptosystems on an ARM7 with hardware accelerator, in Information Security 4th International Conference, ISC 2001 Malaga, Spain, October 1–3, 2001 Proceedings, ed. by G.I. Davida, Y. Frankel. Lecture Notes in Computer Science, vol. 2200 (Springer, 2001), pp. 266–279

Download references

Acknowledgements

The work has been supported by the European Commission through the ICT program under Contract ICT-2007-216646 (European Network of Excellence in Cryptology—ECRYPT II) and under Contract ICT-SEC-2009-5-258754 (Tamper Resistant Sensor Node—TAMPRES).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Hutter.

Additional information

Communicated by Mitsuru Matsui

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was done while the authors were at Graz University of Technology (IAIK)

Appendix: Algorithm for Operand-Caching Multiplication

Appendix: Algorithm for Operand-Caching Multiplication

The following pseudocode shows the algorithm for multi-precision multiplication using the operand-caching method. Variables that are located in data memory are denoted by \(M_x\) where x represents the name of the integer a or b. The parameter e describes the number of locally usable registers \(R_a[e-1,\dots ,0]\) and \(R_b[e-1,\dots ,0]\). The triple-word accumulator is denoted by \(ACC = (ACC_2,ACC_1,ACC_0)\).

figure a
figure b
figure c
figure d
figure e
figure f

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hutter, M., Wenger, E. Fast Multi-precision Multiplication for Public-Key Cryptography on Embedded Microprocessors. J Cryptol 33, 1442–1460 (2020). https://doi.org/10.1007/s00145-020-09351-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00145-020-09351-2

Keywords

Navigation