Abstract
This paper presents faster implementations of the lattice-based schemes Dilithium and Kyber on the Cortex-M4. Dilithium is one of three signature finalists in the NIST post-quantum project (NIST PQC), while Kyber is one of four key-encapsulation mechanism (KEM) finalists.
Our optimizations affect the core polynomial arithmetic involving number-theoretic transforms in both schemes. Our main contributions are threefold: We present a faster signed Barrett reduction for Kyber, propose to switch to a smaller prime modulus for the polynomial multiplications \(c\textbf{s}_1\) and \(c\textbf{s}_2\) in the signing procedure of Dilithium, and apply various known optimizations to the polynomial arithmetic in both schemes. Using a smaller prime modulus is particularly interesting as it allows using the Fermat number transform resulting in especially fast code.
We outperform the state-of-the-art for both Dilithium and Kyber. For Dilithium, our NTT and iNTT are faster by 5.2% and 5.7%. Switching to a smaller modulus results in speed-up of 33.1%–37.6% for the relevant operations (sum of the base multiplication and iNTT) in the signing procedure. For Kyber, the optimizations results in 15.9%–17.8% faster matrix-vector product which is a core arithmetic operation in Kyber .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, R.C., Burrus, C.S.: Fast convolution using Fermat number transforms with applications to digital filtering. IEEE Trans. Acoust. Speech Signal Process. 22(2), 87–97 (1974)
Agarwal, R.C., Burrus, C.S.: Number theoretic transforms to implement fast digital convolution. Proc. IEEE 63(4), 550–560 (1975)
Albrecht, M.R., et al.: Classic McEliece. Submission to the NIST Post-Quantum Cryptography Standardization Project Nat (2020). https://classic.mceliece.org/
Alkim, E., Bilgin, Y.A., Cenk, M., Gérard, F.: Cortex-M4 optimizations for R, MLWE schemes. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020(3), 336–357 (2020)
Avanzi, R., et al.: CRYSTALS-Kyber: algorithm specifications and supporting documentation (version 3.0). Submission to round 3 of the NIST post-quantum project Nat, October 2020
Alkim, E., et al.: Polynomial multiplication in NTRU prime: comparison of optimization strategies on cortex-M4. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021(1), 217–238 (2020)
Abdulrahman, A., Chen, J.P., Chen, Y.J., Hwang, V., Kannwischer, M.J., Yang, B.Y.: Multi-moduli NTTs for saber on Cortex-M3 and Cortex-M4. Cryptology ePrint Archive, Report 2021/995 (2021). https://ia.cr/2021/995
ARM: Cortex-M4 Devices Generic User Guide. ARM, August 2011
Barrett, P.: Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987). https://doi.org/10.1007/3-540-47721-7_24
Bai, S., et al.: CRYSTALS-dilithium: algorithm specifications and supporting documentation (version 3.0). Submission to round 3 of the NIST post-quantum project Nat, October 2020
Bernstein, D.J., Duif, N., Lange, T., Schwabe, P., Yang, B.-Y.: High-speed high-security signatures. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 124–142. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23951-9_9
Bernstein, D.J.: Multidigit multiplication for mathematicians (2001)
Becker, H., Hwang, V., Kannwischer, M.J., Yang, B.Y., Yang, S.Y.: Neon NTT: faster dilithium, Kyber, and saber on Cortex-A72 and Apple M1. Cryptology ePrint Archive, Report 2021/986 (2021). https://ia.cr/2021/986
Botros, L., Kannwischer, M.J., Schwabe, P.: Memory-efficient high-speed implementation of Kyber on Cortex-M4. In: Buchmann, J., Nitaj, A., Rachidi, T. (eds.) AFRICACRYPT 2019. LNCS, vol. 11627, pp. 209–228. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23696-0_11
Chen, C., et al.: NTRU. Submission to the NIST Post-Quantum Cryptography Standardization Project Nat (2020). https://ntru.org/
Chen, M., et al.: Rainbow. Submission to round 3 of the NIST post-quantum project Nat (2020)
Chung, C.M.M., et al.: NTT multiplication for NTT-unfriendly rings: new speed records for Saber and NTRU on Cortex-M4 and AVX2. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021(2), 159–188 (2021)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)
Ducas, L., et al.: CRYSTALS-dilithium: a lattice-based digital signature scheme. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(1), 238–268 (2018)
D’Anvers, J.P., Karmakar, A., Roy, S.S., Vercauteren, F.: SABER. Submission to round 3 of the NIST post-quantum project Nat (2020)
Fouque, P.A., et al.: FALCON. Submission to round 3 of the NIST post-quantum project Nat (2020). https://falcon-sign.info/
Fujisaki, E., Okamoto, T.: Secure integration of asymmetric and symmetric encryption schemes. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 537–554. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_34
Gauss, C.F.: Theoria Interpolationis Methodo Nova Tractata. Nachlass 3, 265–330 (1866)
Güneysu, T., Krausz, M., Oder, T., Speith, J.: Evaluation of lattice-based signature schemes in embedded systems. In: 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 385–388 (2018)
Greconici, D.O.C., Kannwischer, M.J., Sprenkels, A.: Compact dilithium implementations on Cortex-M3 and Cortex-M4. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021(1), 1–24 (2020)
Güneysu, T., Oder, T., Pöppelmann, T., Schwabe, P.: Software speed records for lattice-based signatures. In: Gaborit, P. (ed.) PQCrypto 2013. LNCS, vol. 7932, pp. 67–82. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38616-9_5
Karmakar, A., Mera, J., Roy, S.S., Verbauwhede, I.: Saber on ARM: CCA-secure module lattice-based key encapsulation on ARM. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(3), 243–266 (2018)
Kannwischer, M.J., Rijneveld, J., Schwabe, P., Stoffelen, K.: pqm4: testing and benchmarking NIST PQC on ARM Cortex-M4. In: Second NIST PQC Standardization Conference (2019)
Lyubashevsky, V., Micciancio, D., Peikert, C., Rosen, A.: SWIFFT: a modest proposal for FFT hashing. In: Nyberg, K. (ed.) FSE 2008. LNCS, vol. 5086, pp. 54–72. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71039-4_4
Lyubashevsky, V., Seiler, G.: NTTRU: truly fast NTRU using NTT. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019(3), 180–201 (2019)
Lyubashevsky, V.: Fiat-Shamir with aborts: applications to lattice and factoring-based signatures. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 598–616. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10366-7_35
National Institute of Standards and Technology: Post-Quantum Cryptography Standardization Project. Accessed 04 Apr 2021
Seiler, G.: Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. Report 2018/039 (2018)
Shor, P.W.: Algorithms for quantum computation: discrete logarithms and factoring. In: FOCS 1994, pp. 124–134. IEEE (1994)
Schönhage, A., Strassen, V.: Schnelle Multiplikation großer Zahlen. Computing 7(3–4), 281–292 (1971)
Acknowledgments
This work has been supported by the European Commission through the ERC Starting Grant 805031 (EPOQUE), the Sinica Investigator Award AS-IA-109-M01, and the Taiwan Ministry of Science and Technology Grant 109-2221-E-001-009-MY3. We thank Bo-Yin Yang for sharing the idea of 16-bit Barrett reductions.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Abdulrahman, A., Hwang, V., Kannwischer, M.J., Sprenkels, A. (2022). Faster Kyber and Dilithium on the Cortex-M4. In: Ateniese, G., Venturi, D. (eds) Applied Cryptography and Network Security. ACNS 2022. Lecture Notes in Computer Science, vol 13269. Springer, Cham. https://doi.org/10.1007/978-3-031-09234-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-09234-3_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09233-6
Online ISBN: 978-3-031-09234-3
eBook Packages: Computer ScienceComputer Science (R0)