Abstract
This paper presents an efficient and side-channel-protected software implementation of scalar multiplication for the standard National Institute of Standards and Technology (NIST) and Standards for Efficient Cryptography Group binary elliptic curves. The enhanced performance is achieved by leveraging Intel’s AVX architecture and utilizing the pclmulqdq processor instruction. The fast carry-less multiplication is further used to speed up the reduction on the Haswell platform. For the five NIST curves over \(GF(2^m)\) with \(m\) \(\in \) \(\{163,233,283,409,571\}\), the resulting scalar multiplication implementation is about 5–12 times faster than that of OpenSSL-1.0.1e, enhancing the ECDHE and ECDSA algorithms significantly.


Similar content being viewed by others
Notes
Here, throughput means the number of cycles until the next instruction of this type can be issued.
References
Aranha, D.F., Faz-Hernàndez, A., Lòpez, J., Rodrìguez-Henrìquez, F.: Faster implementation of scalar multiplication on Koblitz curves. In: Cryptology ePrint Archive, Report 2012/519 (2012). http://eprint.iacr.org/2012/519.pdf. Accessed 17 Jul 2014
Aranha, D.F., López, J., Hankerson, D.: Efficient software implementation of binary field arithmetic using vector instruction sets. In: Abdalla, M., Barreto, P.S.L.M. (eds.) The First International Conference on Cryptology and Information Security (LATINCRYPT 2010), LNCS, vol. 6212, pp. 144–161 (2010)
Bluhm, M., Gueron, S.: A fast vectorized implementation of binary elliptic curves on x86-64 processors (2013). http://rt.openssl.org/Ticket/Display.html?id=3117. Accessed 17 Jul 2014
Bos, J.W., Costello, C., Hisil, H., Lauter, K.: Fast cryptography in Genus 2. In: Cryptology ePrint Archive, Report 2012/670 (2012). http://eprint.iacr.org/2012/670.pdf. Accessed 17 Jul 2014
Brumley, B.B., Tuveri, N.: Remote timing attacks are still practical. In: Cryptology ePrint Archive, Report 2011/232 (2011). http://eprint.iacr.org/2011/232.pdf. Accessed 17 Jul 2014
Ecrypt, II and VAMPIRE, eBACS: ECRYPT benchmarking of cryptographic systems (2014). http://bench.cr.yp.to/. Accessed 17 Jul 2014
Fouque, P.-A., Réal, D., Valette, F., Drissi, M.: The carry leakage on the randomized exponent countermeasure, in cryptographic hardware and embedded systems—CHES 2008. In: Oswald, E., Rohatgi, P. (eds.) Lecture Notes in Computer Science, vol. 5154, pp. 198–213. Springer, Berlin (2008)
Gueron, S., Kounavis, M.: Intel Carry-Less Multiplication Instruction and Its Usage for Computing the GCM Mode (2008). http://software.intel.com/sites/default/files/article/165685/clmul-wp-rev-2.01-2012-09-21.pdf. Accessed 17 Jul 2014
Gueron, S., Krasnov, V.: Parallelizing message schedules to accelerate the computations of hash functions. In: Cryptology ePrint Archive, Report 2012/067 (2012). http://eprint.iacr.org/2012/067.pdf. Accessed 17 Jul 2014
Itoh, T., Tsujii, S.: A fast algorithm for computing multiplicative inverses in GF(2\(^{m}\)) using normal bases. Inf. Comput. 78, 171–177 (1988)
Jankowski, K., Laurent, P., O’Mahony, A.: Intel Polynomial Multiplication Instruction and Its Usage for Elliptic Curve Cryptography (2012). http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/polynomial-multiplication-instructions-paper.pdf. Accessed 17 Jul 2014
Knežević, M., Sakiyama, K., Fan, J., Verbauwhede, I.: Modular reduction in GF(2\(^{n}\)) without pre-computational phase. In: Proceedings of the 2nd International Workshop on Arithmetic of Finite Fields, WAIFI ’08, pp. 77–87. Springer-Verlag, Berlin (2008)
Lòpez, J., Dahab, R.: Fast multiplication on elliptic curves over GF(2\(^{m}\)) without precomputation. In: Koc, C.K., Paar, C. (eds.) Cryptographic Hardware and Embedded Systems. Lecture Notes in Computer Science, vol. 1717, pp. 316–327. Springer, Berlin (1999)
Montgomery, P.L.: Speeding the pollard and elliptic curve methods of factorization. Math. Comput. 48, 243–264 (1987)
Oliveira, T., Aranha, D.F., Lòpez, J., Rodrìguez-Henrìquez, F.: Fast point multiplication algorithms for binary elliptic curves with and without precomputation. In: Cryptology ePrint Archive, Report 2014/427 (2014). http://eprint.iacr.org/2014/427.pdf. Accessed 17 Jul 2014
Oliveira, T., Lòpez, J., Aranha, D.F., Rodrìguez-Henrìquez, F.: Two is the fastest prime. In: Cryptology ePrint Archive, Report 2013/131 (2013). http://eprint.iacr.org/2013/131.pdf. Accessed 17 Jul 2014
Stam, M.: On montgomery-like representations for elliptic curves over GF(\(2^k\)). In: Proceedings of the 6th International Workshop on Theory and Practice in Public Key Cryptography: Public Key Cryptography, PKC ’03, London, pp. 240–253. Springer-Verlag, New York (2003)
Standards for Efficient Cryptography Group, SEC 2: Recommended Elliptic Curve Domain Parameters (2010). http://www.secg.org/download/aid-784/sec2-v2.pdf. Accessed 17 Jul 2014
Su, C., Fan, H.: Impact of Intel’s new instruction sets on software implementation of GF(2)\([x]\) multiplication. Inf. Process. Lett. 112, 497–502 (2012)
Taverne, J., Faz-Hernàndez, A., Aranha, D.F., Rodrìguez-Henrìquez, F., Hankerson, D., Lòpez, J.: Speeding scalar multiplication over binary elliptic curves using the new carry-less multiplication instruction. J. Cryptogr. Eng. 1, 187–199 (2011)
Weber, D., Denny, T.: The solution of McCurley’s discrete log challenge. In: Krawczyk, H. (ed.) Advances in Cryptology—CRYPTO ’98. Lecture Notes in Computer Science, vol. 1462, pp. 458–471. Springer, Berlin (1998)
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Speed results
See Tables 4, 5, 6, 7, 8, 9 and 10.
Appendix 2: Subfunctions of the 2P-algorithm



Rights and permissions
About this article
Cite this article
Bluhm, M., Gueron, S. Fast software implementation of binary elliptic curve cryptography. J Cryptogr Eng 5, 215–226 (2015). https://doi.org/10.1007/s13389-015-0094-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13389-015-0094-1