Improvements to RSA Key Generation and CRT on Embedded Devices

Hamburg, Mike; Tunstall, Mike; Xiao, Qinglai

doi:10.1007/978-3-030-75539-3_26

Mike Hamburg⁹,
Mike Tunstall⁹ &
Qinglai Xiao⁹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12704))

Included in the following conference series:

Cryptographers’ Track at the RSA Conference

1539 Accesses

Abstract

RSA key generation requires devices to generate large prime numbers. The naïve approach is to generate candidates at random, and then test each one for (probable) primality. However, it is faster to use a sieve method, where the candidates are chosen so as not to be divisible by a list of small prime numbers $\{p_i\}$.

Sieve methods can be somewhat complex and time-consuming, at least by the standards of embedded and hardware implementations, and they can be tricky to defend against side-channel analysis. Here we describe an improvement on Joye et al.’s sieve based on the Chinese Remainder Theorem (CRT). We also describe a new sieve method using quadratic residuosity which is simpler and faster than previously known methods, and which can produce values in desired RSA parameter ranges such as $(2^{n-1/2}, 2^n)$ with minimal additional work. The same methods can be used to generate strong primes and DSA moduli.

We also demonstrate a technique for RSA private key operations using the Chinese Remainder Theorem (RSA-CRT) without $q^{-1}$ mod p. This technique also leads to inversion-free batch RSA and inversion-free RSA mod $p^k q$.

We demonstrate how an embedded device can use our key generation and RSA-CRT techniques to perform RSA efficiently without storing the private key itself: only a symmetric seed and one or two short hints are required.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Most of these algorithms exhibit false positives in rare cases. That is, when given a prime number they always say that it is prime, but they may accept a composite number as prime with some tiny probability. The present work does not address this issue.
2.
They need not be cryptographically indistinguishable from uniform. In practice, a wide variety of not-quite-uniform distributions are used [SNS+16]. This seems to be sufficient so long as (p, q) are close enough to uniform and are uncorrelated [NSS+17].
3.
A random $r\mathop {{\mathop {\leftarrow }\limits ^{\$}}}\mathbb {Z}/N$ will be coprime to N with overwhelming probability. But if we wanted to be sure then we could reuse one of our sieve techniques.

References

Adrian, D., et al.: Imperfect forward secrecy: how Diffie-Hellman fails in practice. In: Ray, I., Li, N., Kruegel, C. (eds.) ACM CCS 2015, pp. 5–17. ACM Press (2015)
Google Scholar
Aumüller, C., Bier, P., Fischer, W., Hofreiter, P., Seifert, J.-P.: Fault attacks on RSA with CRT: concrete results and practical countermeasures. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 260–275. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36400-5_20
Chapter Google Scholar
Aldaya A.C., García, C.P., Tapia, L.M.A., Brumley, B.B.: Cache-timing attacks on RSA key generation. IACR TCHES 2019(4), 213–242 (2019). https://tches.iacr.org/index.php/TCHES/article/view/8350
Atkin, A.O.L., Morain, F.: Elliptic curves and primality proving. Math. Comput. 61(203), 29–68 (1993)
Article MathSciNet MATH Google Scholar
Barrett, P.: Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987). https://doi.org/10.1007/3-540-47721-7_24
Chapter Google Scholar
Bernstein, D.J., Heninger, N., Lou, P., Valenta, L.: Post-quantum RSA. In: Lange, T., Takagi, T. (eds.) PQCrypto 2017. LNCS, vol. 10346, pp. 311–329. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59879-6_18
Chapter MATH Google Scholar
Aldaya, A.C., Brumley, B.: When one vulnerable primitive turns viral: novel single-trace attacks on ECDSA and RSA. In: CHES 2020, p. 03 (2020)
Google Scholar
Clavier, C., Coron, J.-S.: On the implementation of a fast prime generation algorithm. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 443–449. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74735-2_30
Chapter Google Scholar
Diffie, W., Hellman, M.E.: New directions in cryptography. IEEE Trans. Inf. Theory 22(6), 644–654 (1976)
Article MathSciNet MATH Google Scholar
Ebeid, N.M., Lambert, R.: A new CRT-RSA algorithm resistant to powerful fault attacks. In: WESS 2010, p. 8. ACM (2010)
Google Scholar
Fiat, A.: Batch RSA. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 175–185. Springer, New York (1990). https://doi.org/10.1007/0-387-34805-0_17
Chapter Google Scholar
Hamburg, M.: Fast and compact elliptic-curve cryptography. Cryptology ePrint Archive, Report 2012/309 (2012). http://eprint.iacr.org/2012/309
Joye, M., Paillier, P.: GCD-free algorithms for computing modular inverses. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS, vol. 2779, pp. 243–253. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45238-6_20
Chapter Google Scholar
Joye, M., Paillier, P.: Fast generation of prime numbers on portable devices: an update. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 160–173. Springer, Heidelberg (2006). https://doi.org/10.1007/11894063_13
Chapter Google Scholar
Joye, M., Paillier, P., Vaudenay, S.: Efficient generation of prime numbers. In: Koç, Ç.K., Paar, C. (eds.) CHES 2000. LNCS, vol. 1965, pp. 340–354. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44499-8_27
Chapter Google Scholar
Kerry, C.F., Gallagher, P.D.: Digital signature standard (DSS). FIPS Pub 186–4 (2013). https://doi.org/10.6028/NIST.FIPS.186-4
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Article MathSciNet MATH Google Scholar
Nemec, M., Sýs, M., Svenda, P., Klinec, D., Matyas, V.: The return of Coppersmith’s attack: practical factorization o f widely used RSA moduli. In: Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D. (eds.) ACM CCS 2017, pp. 1631–1648. ACM Press (2017)
Google Scholar
Pomerance, C., Selfridge, J.L., Wagstaff, S.S.: The pseudoprimes to $25\cdot 10^9$. Math. Comput. 35(151), 1003–1026 (1980)
Google Scholar
Rabin, M.O.: Probabilistic algorithm for testing primality. J. Number Theory 12(1), 128–138 (1980)
Article MathSciNet MATH Google Scholar
Rivest, R.L., Shamir, A., Adleman, L.M.: A method for obtaining digital signatures and public-key cryptosystems. Commun. Assoc. Comput. Mach. 21(2), 120–126 (1978)
MathSciNet MATH Google Scholar
Schanck, J.M.: Multi-power post-quantum RSA. Cryptology ePrint Archive, Report 2018/325 (2018). https://eprint.iacr.org/2018/325
Svenda, P., et al.: The million-key question - investigating the origins of RSA public keys. In: Holz, T., Savage, S. (ed.) USENIX Security 2016, pp. 893–910. USENIX Association (2016)
Google Scholar
Takagi, T.: Fast RSA-type cryptosystem modulo $p^{k} q$. In: Krawczyk, H. (ed.) CRYPTO 1998. LNCS, vol. 1462, pp. 318–326. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055738
Chapter Google Scholar
Takagi, T.: A fast RSA-type public-key primitive modulo $p^k q$ using Hensel lifting. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 87(1), 94–101 (2004)
Google Scholar
Van Oorschot, P.C., Wiener, M.J.: Parallel collision search with cryptanalytic applications. J. Cryptol. 12(1), 1–28 (1999)
Article MathSciNet MATH Google Scholar
Wagner, D.: A generalized birthday problem. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 288–304. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45708-9_19
Chapter Google Scholar

Download references

Acknowledgements

Special thanks to Denis Pochuev for feedback on RSA with $p^k\cdot q$.

Author information

Authors and Affiliations

Rambus, Inc., San Jose, USA
Mike Hamburg, Mike Tunstall & Qinglai Xiao

Authors

Mike Hamburg
View author publications
You can also search for this author in PubMed Google Scholar
Mike Tunstall
View author publications
You can also search for this author in PubMed Google Scholar
Qinglai Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mike Hamburg .

Editor information

Editors and Affiliations

ETH Zürich, Zürich, Switzerland
Kenneth G. Paterson

Ethics declarations

Some of these techniques may be covered by US and/or international patents.

Appendices

A Proof of Theorem 1

We prove this theorem in the extended version, at https://eprint.iacr.org/2020/1507.

B Minimizing u

We say that u is “valid” mod M if $\left( \frac{-u}{p}\right) = -1$ for all primes p|M. If M’s factorization is known, then it is easy to find a valid $u_p$ modulo each p|M (e.g. by checking the Jacobi symbol $\left( \frac{-u_p}{p}\right) $ until a valid $u_p$ is found), and to combine them using the Chinese Remainder Theorem. But what is the minimum valid u? Using a smaller u could allow the same u to be used for several values of M, or could reduce memory usage and compute time, but mostly it is a mathematically interesting question. For simplicity, we assume here that M is square-free.

If there are n primes dividing M, then a random element of $(\mathbb {Z}/M)^*$ is valid with probability $2^{-n}$, so we expect the minimum valid u to be around $u_\text {minexp} := 2^n\cdot M/\phi (M)$. A brute-force strategy would require about $u_\text {minexp}$ work, which is infeasible past the first 50 primes or so. But this work can be reduced somewhat, particularly if we settle for a small but not minimal u.

1.1 B.1 Sparse Solutions to Linear Equations

The most effective method we found was to search for valid u of the form $u = q_1\cdot q_2 \cdots q_m$ where the $q_i$’s are in some set Q. The validity criterion is that:

$$\begin{aligned} \text {for each prime}\ p|M,\ \ \left( \frac{-u}{p}\right) = \left( \frac{-1}{p}\right) \cdot \left( \frac{q_1}{p}\right) \cdots \left( \frac{q_m}{p}\right) \end{aligned}$$

(2)

If each $q_i$ is coprime to M, then the Jacobi symbols are all either $-1$ or 1; mapping these to 1 and 0 respectively translates the validity criterion to a system of affine equations over $\mathbb {F} _2$. This allows us to solve for u with xor-list or sparse solution techniques, such as:

A birthday attack or stronger collision technique [VOW99] for $m=2$ and Q a large set (e.g. $|Q|\approx 2^{32}$).
Wagner’s xor-list algorithm [Wag02] for m small and Q a large set.
Information set decoding for large m and a relatively small set Q (e.g. the first 1000 primes not dividing M).

Using a birthday attack, we discovered that the 59-bit value

$$u = \texttt {0x4b0555d761f3f52}$$

is valid mod the 383-bit product of the first 59 odd primes. We also used Wagner’s algorithm to search for u a product of four 32-bit odd numbers, requiring it to be valid mod at least the first 72 odd primes. We ran the algorithm for a day on a 64-core Amazon EC2 Graviton2 instance, producing some 5 million results. Notably,

$$u=\texttt {0xe3b0f73b0050ab294417001ad1e63d}$$

is valid mod the 729-bit product of the first 99 odd primes. Our search was tuned to find u relatively close to $u_\text {minexp}$; tuning it differently would have been faster or found valid u mod more primes, but the resulting u would be significantly larger.

It isn’t necessary to choose M before u. One could start with a small u which is valid mod the first several primes, and then choose further primes p|M so that u is valid. This sacrifices some performance, because discarding small primes reduces $M/\phi (M)$. Our search using Wagner’s algorithm found that

$$u = \texttt {0x23e9ee9bd621b0b248e8b59a4c80bb55}$$

performs well across a range of bit sizes, losing about 0.5% of performance compared to an unconstrained (M, u) at 1024 bits and 3% at 2048 bits.

The quality of results from Wagner’s algorithm should fall off exponentially with the number of primes dividing M, because at each step the algorithm multiplies two intermediate values to produce another intermediate that solves b more equations, for some block size b. So while it performs well for the first 100 primes, ISD appears to perform better for the first 400 primes.

1.2 B.2 Multiple u

Instead of using linear equations to search for a single u, we could choose a few small u such that at least one of them is valid for every p|M. For example, for each of the first 133 odd primes, at least one of $u\in U := \{1,2,5,19\}$ is valid. We could factor M into $\prod _{u\in U} M_u$ such that u is valid mod the corresponding $M_u$. Then we could sample values $x_u\mathop {{\mathop {\leftarrow }\limits ^{\$}}}(\mathbb {Z}/M_u)^*$ and combine them as in Sect. 2.2.

1.3 B.3 Quadratic Minimization

Two other techniques are based on finding small values of quadratic functions over the integers. One is to factor M as $M_1\cdot M_3$ where $M_1$ contains the 1-mod-4 factors and $M_3$ contains the 3-mod-4 factors of M. Valid u are of the form $u\equiv x^2\mod M_3$ for some x coprime to $M_3$. We may plug in $x = \lfloor \sqrt{k M_3}\rfloor + \ell $ for small positive integers $k,\ell $ as a more efficient brute force technique. This technique gives many candidate values of u which are around $\sqrt{M_3}\approx \root 4 \of {M}$, but it still takes exponential time as M increases.

The second approach is to choose small, coprime, square-free positive integers $(\alpha ,\beta )$, and then partition M as $M_0\cdot M_1$, such that

$$u = \alpha M_0 - \beta M_1$$

is valid. This will be true if:

1.
For all primes p|M, if $p|\alpha $ then $p| M_0$ and likewise if $p|\beta $ then $p| M_1$.
2.
For all other primes $p|M_0$, $\left( \frac{\beta }{p}\right) \cdot \prod _{q|M_1}\left( \frac{q}{p}\right) = -1$ and vice versa.

These equations are actually affine: switching a prime p from $M_0$ to $M_1$ or back has the same effect on all the equations regardless of where the other primes are assigned. They can therefore be solved efficiently for a given $(\alpha ,\beta )$ with probability about $(1-\frac{1}{2})\cdot (1-\frac{1}{4})\cdots \approx 0.29$.

To further reduce u, we make two improvements. First, we extend the equation to $u = \alpha M_0 x^2 - \beta M_1 y^2$ where x is coprime to $\beta M_1 y^2$ and vice versa. By setting x/y as convergents to $\sqrt{\beta M_1 / (\alpha M_0)}$, we can find many valid values of $u\approx \sqrt{\alpha \cdot \beta \cdot M_0\cdot M_1}$. Furthermore, we don’t need to set $M = M_0\cdot M_1$ exactly: it suffices to instead choose $M_2|M$ upfront and set $M = M_0\cdot M_1\cdot M_2$. This method produces many u which are valid mod $M_0\cdot M_1$, and we can continue until by chance we find one which is also valid mod $M_2$. Overall, this approach finds u which are slightly smaller than $\sqrt{M}$, as does ISD, but ISD seems to work better in practice.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hamburg, M., Tunstall, M., Xiao, Q. (2021). Improvements to RSA Key Generation and CRT on Embedded Devices. In: Paterson, K.G. (eds) Topics in Cryptology – CT-RSA 2021. CT-RSA 2021. Lecture Notes in Computer Science(), vol 12704. Springer, Cham. https://doi.org/10.1007/978-3-030-75539-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-75539-3_26
Published: 11 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75538-6
Online ISBN: 978-3-030-75539-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics