Abstract
This paper describes an algorithm for accelerating the computations of Davies–Meyer based hash functions. It is based on parallelizing the computation of several message schedules for several message blocks of a given message. This parallelization, together with the proper use of vector processor instructions (SIMD) improves the overall algorithm’s performance. Using this method, we obtain a new software implementation of SHA-256 that performs at 11.47 Cycles/Byte on the second and 10.18 Cycles/Byte (for an 8 KB message) on the third Generation Intel\(^{\textregistered }\) Core\(^\mathrm{TM}\) processors. We also show how to extend the method to the soon-to-come AVX2 architecture, which has wider registers. Since processors with AVX2 will be available only in 2013, exact performance reporting is not yet possible. Instead, we show that our resulting SHA-256 and SHA-512 implementations have a reduced number of instructions. Based on our findings, we make some observations on the SHA3 competition. We argue that if the prospective SHA3 standard is expected to be competitive against the performance of SHA-256 or SHA-512, on the high end platforms, then its performance should be well below 10 Cycles/Byte on the current, and certainly on the near future processors. Not all the SHA3 finalists have this performance. Furthermore, even the fastest finalists will probably offer only a small performance advantage over the current SHA-256 and SHA-512 implementations.
Similar content being viewed by others
References
Federal Information Processing Standards Publication 180–2: Secure Hash Standard. http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf
Gueron, S.: Speeding up SHA-1, SHA-256 and SHA-512 on the 2nd Generation Intel® Core\(^{\rm TM}\) Processors (to be published; ITNG 2012)
Gueron, S., Johnson, S., Walker, J.: SHA-512/256. In: IEEE Proceedings of 8th International Conference on Information Technology: New Generations (ITNG 2011) (2011)
Gueron, S., Krasnov, V.: [PATCH] Efficient implementations of SHA256 and SHA512, using the Simultaneous Message Scheduling method. http://rt.openssl.org/Ticket/Display.html?id=2784&user=guest&pass=guest
Intel: Intel Advanced Vector Extensions Programming Reference. http://software.intel.com/file/36945
Intel: Software Development Emulator (SDE). http://software.intel.com/enus/articles/intel-software-development-emulator/
Intel: Intel® Compilers. http://software.intel.com/en-us/articles/intel-compilers/
Intel (M. Buxton): Haswell New Instruction Descriptions Now Available! http://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available/
Kounavis, M.E., Kang, X., Grewal, K., Eszenyi, M., Gueron, S., Durham, D.: Encrypting the internet. In: Proceedings of the ACM SIGCOMM 2010 conference on SIGCOMM. http://portal.acm.org/citation.cfm?id=1851182.1851200
Menezes, A.J., van Oorschot P.C., Vanstone, S.A.: Handbook of Applied Cryptography, 5th edn. CRC Press, Boca Raton (2001)
NIST, cryptographic hash Algorithm Competition. http://csrc.nist.gov/groups/ST/hash/sha-3/index.html
NIST: Secure Hash Standard. Draft Federal Information Processing Standards Publication, pp. 180–184 (2011)
OpenSSL, The Open Source toolkit for SSL/TLS. http://openssl.org/
SUPERCOP. http://bench.cr.yp.to/supercop.html
YASM, The YASM Modular Assembler Project. http://yasm.tortall.net/
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Code snippets
This appendix contains two C code examples. The first one implements 4-SMS for SHA-256, using SSE intrinsics, and the second one implements SHA-512 using AVX2 intrinsics. This code only illustrates the discussed method. The performance code is written in assembly (Figs. 8, 9).
1.2 Fig. 7—Sources
Figure 7 presents performance numbers for several hash algorithms. To facilitate reproducing the results, we provide the following details.
The source codes for Blake, Grøstl, JH, Keccak, and Skein were retrieved from “supercop” [14], and re-measured using the methodology described in Sect. 6.
The supercop version we used was 20120329 (SUPERCOP hereafter). It can be downloaded from http://hyperelliptic.org/ebats/supercop-20120329.tar.bz2. More details on the sources, including the compilation flags (when relevant) are:
-
SHA-256 openssl: OpenSSL 1.0.1
-
SHA-512 openssl: OpenSSL 1.0.1
-
SHA-256 4-SMS: the code posted in [4], applied to OpenSSL 1.0.1
-
SHA-512 2-SMS: the code posted in [4], applied to OpenSSL 1.0.1
-
Skein: SUPERCOP, “sandy”, compiled using: gcc -m64 -march=core2 -msse4.1-Os -fomit-frame-pointer
-
Blake256-SUPERCOP, “avxicc”, assembler
-
Blake512-SUPERCOP, “avxicc”, assembler
-
Grøstl256-SUPERCOP, “avx”, compiled using: gcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer -DTASM
-
Grøstl512-SUPERCOP, “aesni”, compiled using: gcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer -DTASM
-
JH256-SUPERCOP, “bitslice_sse2_opt64”, compiled using: icc -O3 -xAVX
-
Keccak-SUPERCOP, “\(\times \)86_64_shld”, compiled using: gcc-funroll-loops -O3 -fomit-frame-pointer
Compilers: we used gcc version 4.5.1, and icc version 12.
Rights and permissions
About this article
Cite this article
Gueron, S., Krasnov, V. Parallelizing message schedules to accelerate the computations of hash functions. J Cryptogr Eng 2, 241–253 (2012). https://doi.org/10.1007/s13389-012-0037-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13389-012-0037-z