Abstract
RSA key generation requires devices to generate large prime numbers. The naïve approach is to generate candidates at random, and then test each one for (probable) primality. However, it is faster to use a sieve method, where the candidates are chosen so as not to be divisible by a list of small prime numbers \(\{p_i\}\).
Sieve methods can be somewhat complex and time-consuming, at least by the standards of embedded and hardware implementations, and they can be tricky to defend against side-channel analysis. Here we describe an improvement on Joye et al.’s sieve based on the Chinese Remainder Theorem (CRT). We also describe a new sieve method using quadratic residuosity which is simpler and faster than previously known methods, and which can produce values in desired RSA parameter ranges such as \((2^{n-1/2}, 2^n)\) with minimal additional work. The same methods can be used to generate strong primes and DSA moduli.
We also demonstrate a technique for RSA private key operations using the Chinese Remainder Theorem (RSA-CRT) without \(q^{-1}\) mod p. This technique also leads to inversion-free batch RSA and inversion-free RSA mod \(p^k q\).
We demonstrate how an embedded device can use our key generation and RSA-CRT techniques to perform RSA efficiently without storing the private key itself: only a symmetric seed and one or two short hints are required.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Most of these algorithms exhibit false positives in rare cases. That is, when given a prime number they always say that it is prime, but they may accept a composite number as prime with some tiny probability. The present work does not address this issue.
- 2.
- 3.
A random \(r\mathop {{\mathop {\leftarrow }\limits ^{\$}}}\mathbb {Z}/N\) will be coprime to N with overwhelming probability. But if we wanted to be sure then we could reuse one of our sieve techniques.
References
Adrian, D., et al.: Imperfect forward secrecy: how Diffie-Hellman fails in practice. In: Ray, I., Li, N., Kruegel, C. (eds.) ACM CCS 2015, pp. 5–17. ACM Press (2015)
Aumüller, C., Bier, P., Fischer, W., Hofreiter, P., Seifert, J.-P.: Fault attacks on RSA with CRT: concrete results and practical countermeasures. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 260–275. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36400-5_20
Aldaya A.C., García, C.P., Tapia, L.M.A., Brumley, B.B.: Cache-timing attacks on RSA key generation. IACR TCHES 2019(4), 213–242 (2019). https://tches.iacr.org/index.php/TCHES/article/view/8350
Atkin, A.O.L., Morain, F.: Elliptic curves and primality proving. Math. Comput. 61(203), 29–68 (1993)
Barrett, P.: Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987). https://doi.org/10.1007/3-540-47721-7_24
Bernstein, D.J., Heninger, N., Lou, P., Valenta, L.: Post-quantum RSA. In: Lange, T., Takagi, T. (eds.) PQCrypto 2017. LNCS, vol. 10346, pp. 311–329. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59879-6_18
Aldaya, A.C., Brumley, B.: When one vulnerable primitive turns viral: novel single-trace attacks on ECDSA and RSA. In: CHES 2020, p. 03 (2020)
Clavier, C., Coron, J.-S.: On the implementation of a fast prime generation algorithm. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 443–449. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74735-2_30
Diffie, W., Hellman, M.E.: New directions in cryptography. IEEE Trans. Inf. Theory 22(6), 644–654 (1976)
Ebeid, N.M., Lambert, R.: A new CRT-RSA algorithm resistant to powerful fault attacks. In: WESS 2010, p. 8. ACM (2010)
Fiat, A.: Batch RSA. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 175–185. Springer, New York (1990). https://doi.org/10.1007/0-387-34805-0_17
Hamburg, M.: Fast and compact elliptic-curve cryptography. Cryptology ePrint Archive, Report 2012/309 (2012). http://eprint.iacr.org/2012/309
Joye, M., Paillier, P.: GCD-free algorithms for computing modular inverses. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS, vol. 2779, pp. 243–253. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45238-6_20
Joye, M., Paillier, P.: Fast generation of prime numbers on portable devices: an update. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 160–173. Springer, Heidelberg (2006). https://doi.org/10.1007/11894063_13
Joye, M., Paillier, P., Vaudenay, S.: Efficient generation of prime numbers. In: Koç, Ç.K., Paar, C. (eds.) CHES 2000. LNCS, vol. 1965, pp. 340–354. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44499-8_27
Kerry, C.F., Gallagher, P.D.: Digital signature standard (DSS). FIPS Pub 186–4 (2013). https://doi.org/10.6028/NIST.FIPS.186-4
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Nemec, M., Sýs, M., Svenda, P., Klinec, D., Matyas, V.: The return of Coppersmith’s attack: practical factorization o f widely used RSA moduli. In: Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D. (eds.) ACM CCS 2017, pp. 1631–1648. ACM Press (2017)
Pomerance, C., Selfridge, J.L., Wagstaff, S.S.: The pseudoprimes to \(25\cdot 10^9\). Math. Comput. 35(151), 1003–1026 (1980)
Rabin, M.O.: Probabilistic algorithm for testing primality. J. Number Theory 12(1), 128–138 (1980)
Rivest, R.L., Shamir, A., Adleman, L.M.: A method for obtaining digital signatures and public-key cryptosystems. Commun. Assoc. Comput. Mach. 21(2), 120–126 (1978)
Schanck, J.M.: Multi-power post-quantum RSA. Cryptology ePrint Archive, Report 2018/325 (2018). https://eprint.iacr.org/2018/325
Svenda, P., et al.: The million-key question - investigating the origins of RSA public keys. In: Holz, T., Savage, S. (ed.) USENIX Security 2016, pp. 893–910. USENIX Association (2016)
Takagi, T.: Fast RSA-type cryptosystem modulo \(p^{k} q\). In: Krawczyk, H. (ed.) CRYPTO 1998. LNCS, vol. 1462, pp. 318–326. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055738
Takagi, T.: A fast RSA-type public-key primitive modulo \(p^k q\) using Hensel lifting. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 87(1), 94–101 (2004)
Van Oorschot, P.C., Wiener, M.J.: Parallel collision search with cryptanalytic applications. J. Cryptol. 12(1), 1–28 (1999)
Wagner, D.: A generalized birthday problem. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 288–304. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45708-9_19
Acknowledgements
Special thanks to Denis Pochuev for feedback on RSA with \(p^k\cdot q\).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Some of these techniques may be covered by US and/or international patents.
Appendices
A Proof of Theorem 1
We prove this theorem in the extended version, at https://eprint.iacr.org/2020/1507.
B Minimizing u
We say that u is “valid” mod M if \(\left( \frac{-u}{p}\right) = -1\) for all primes p|M. If M’s factorization is known, then it is easy to find a valid \(u_p\) modulo each p|M (e.g. by checking the Jacobi symbol \(\left( \frac{-u_p}{p}\right) \) until a valid \(u_p\) is found), and to combine them using the Chinese Remainder Theorem. But what is the minimum valid u? Using a smaller u could allow the same u to be used for several values of M, or could reduce memory usage and compute time, but mostly it is a mathematically interesting question. For simplicity, we assume here that M is square-free.
If there are n primes dividing M, then a random element of \((\mathbb {Z}/M)^*\) is valid with probability \(2^{-n}\), so we expect the minimum valid u to be around \(u_\text {minexp} := 2^n\cdot M/\phi (M)\). A brute-force strategy would require about \(u_\text {minexp}\) work, which is infeasible past the first 50 primes or so. But this work can be reduced somewhat, particularly if we settle for a small but not minimal u.
1.1 B.1 Sparse Solutions to Linear Equations
The most effective method we found was to search for valid u of the form \(u = q_1\cdot q_2 \cdots q_m\) where the \(q_i\)’s are in some set Q. The validity criterion is that:
If each \(q_i\) is coprime to M, then the Jacobi symbols are all either \(-1\) or 1; mapping these to 1 and 0 respectively translates the validity criterion to a system of affine equations over \(\mathbb {F} _2\). This allows us to solve for u with xor-list or sparse solution techniques, such as:
-
A birthday attack or stronger collision technique [VOW99] for \(m=2\) and Q a large set (e.g. \(|Q|\approx 2^{32}\)).
-
Wagner’s xor-list algorithm [Wag02] for m small and Q a large set.
-
Information set decoding for large m and a relatively small set Q (e.g. the first 1000 primes not dividing M).
Using a birthday attack, we discovered that the 59-bit value
is valid mod the 383-bit product of the first 59 odd primes. We also used Wagner’s algorithm to search for u a product of four 32-bit odd numbers, requiring it to be valid mod at least the first 72 odd primes. We ran the algorithm for a day on a 64-core Amazon EC2 Graviton2 instance, producing some 5 million results. Notably,
is valid mod the 729-bit product of the first 99 odd primes. Our search was tuned to find u relatively close to \(u_\text {minexp}\); tuning it differently would have been faster or found valid u mod more primes, but the resulting u would be significantly larger.
It isn’t necessary to choose M before u. One could start with a small u which is valid mod the first several primes, and then choose further primes p|M so that u is valid. This sacrifices some performance, because discarding small primes reduces \(M/\phi (M)\). Our search using Wagner’s algorithm found that
performs well across a range of bit sizes, losing about 0.5% of performance compared to an unconstrained (M, u) at 1024 bits and 3% at 2048 bits.
The quality of results from Wagner’s algorithm should fall off exponentially with the number of primes dividing M, because at each step the algorithm multiplies two intermediate values to produce another intermediate that solves b more equations, for some block size b. So while it performs well for the first 100 primes, ISD appears to perform better for the first 400 primes.
1.2 B.2 Multiple u
Instead of using linear equations to search for a single u, we could choose a few small u such that at least one of them is valid for every p|M. For example, for each of the first 133 odd primes, at least one of \(u\in U := \{1,2,5,19\}\) is valid. We could factor M into \(\prod _{u\in U} M_u\) such that u is valid mod the corresponding \(M_u\). Then we could sample values \(x_u\mathop {{\mathop {\leftarrow }\limits ^{\$}}}(\mathbb {Z}/M_u)^*\) and combine them as in Sect. 2.2.
1.3 B.3 Quadratic Minimization
Two other techniques are based on finding small values of quadratic functions over the integers. One is to factor M as \(M_1\cdot M_3\) where \(M_1\) contains the 1-mod-4 factors and \(M_3\) contains the 3-mod-4 factors of M. Valid u are of the form \(u\equiv x^2\mod M_3\) for some x coprime to \(M_3\). We may plug in \(x = \lfloor \sqrt{k M_3}\rfloor + \ell \) for small positive integers \(k,\ell \) as a more efficient brute force technique. This technique gives many candidate values of u which are around \(\sqrt{M_3}\approx \root 4 \of {M}\), but it still takes exponential time as M increases.
The second approach is to choose small, coprime, square-free positive integers \((\alpha ,\beta )\), and then partition M as \(M_0\cdot M_1\), such that
is valid. This will be true if:
-
1.
For all primes p|M, if \(p|\alpha \) then \(p| M_0\) and likewise if \(p|\beta \) then \(p| M_1\).
-
2.
For all other primes \(p|M_0\), \(\left( \frac{\beta }{p}\right) \cdot \prod _{q|M_1}\left( \frac{q}{p}\right) = -1\) and vice versa.
These equations are actually affine: switching a prime p from \(M_0\) to \(M_1\) or back has the same effect on all the equations regardless of where the other primes are assigned. They can therefore be solved efficiently for a given \((\alpha ,\beta )\) with probability about \((1-\frac{1}{2})\cdot (1-\frac{1}{4})\cdots \approx 0.29\).
To further reduce u, we make two improvements. First, we extend the equation to \(u = \alpha M_0 x^2 - \beta M_1 y^2\) where x is coprime to \(\beta M_1 y^2\) and vice versa. By setting x/y as convergents to \(\sqrt{\beta M_1 / (\alpha M_0)}\), we can find many valid values of \(u\approx \sqrt{\alpha \cdot \beta \cdot M_0\cdot M_1}\). Furthermore, we don’t need to set \(M = M_0\cdot M_1\) exactly: it suffices to instead choose \(M_2|M\) upfront and set \(M = M_0\cdot M_1\cdot M_2\). This method produces many u which are valid mod \(M_0\cdot M_1\), and we can continue until by chance we find one which is also valid mod \(M_2\). Overall, this approach finds u which are slightly smaller than \(\sqrt{M}\), as does ISD, but ISD seems to work better in practice.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hamburg, M., Tunstall, M., Xiao, Q. (2021). Improvements to RSA Key Generation and CRT on Embedded Devices. In: Paterson, K.G. (eds) Topics in Cryptology – CT-RSA 2021. CT-RSA 2021. Lecture Notes in Computer Science(), vol 12704. Springer, Cham. https://doi.org/10.1007/978-3-030-75539-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-75539-3_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75538-6
Online ISBN: 978-3-030-75539-3
eBook Packages: Computer ScienceComputer Science (R0)