Skip to main content
Log in

Parallelizing message schedules to accelerate the computations of hash functions

  • Regular Paper
  • Published:
Journal of Cryptographic Engineering Aims and scope Submit manuscript

Abstract

This paper describes an algorithm for accelerating the computations of Davies–Meyer based hash functions. It is based on parallelizing the computation of several message schedules for several message blocks of a given message. This parallelization, together with the proper use of vector processor instructions (SIMD) improves the overall algorithm’s performance. Using this method, we obtain a new software implementation of SHA-256 that performs at 11.47 Cycles/Byte on the second and 10.18 Cycles/Byte (for an 8 KB message) on the third Generation Intel\(^{\textregistered }\) Core\(^\mathrm{TM}\) processors. We also show how to extend the method to the soon-to-come AVX2 architecture, which has wider registers. Since processors with AVX2 will be available only in 2013, exact performance reporting is not yet possible. Instead, we show that our resulting SHA-256 and SHA-512 implementations have a reduced number of instructions. Based on our findings, we make some observations on the SHA3 competition. We argue that if the prospective SHA3 standard is expected to be competitive against the performance of SHA-256 or SHA-512, on the high end platforms, then its performance should be well below 10 Cycles/Byte on the current, and certainly on the near future processors. Not all the SHA3 finalists have this performance. Furthermore, even the fastest finalists will probably offer only a small performance advantage over the current SHA-256 and SHA-512 implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Federal Information Processing Standards Publication 180–2: Secure Hash Standard. http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf

  2. Gueron, S.: Speeding up SHA-1, SHA-256 and SHA-512 on the 2nd Generation Intel® Core\(^{\rm TM}\) Processors (to be published; ITNG 2012)

  3. Gueron, S., Johnson, S., Walker, J.: SHA-512/256. In: IEEE Proceedings of 8th International Conference on Information Technology: New Generations (ITNG 2011) (2011)

  4. Gueron, S., Krasnov, V.: [PATCH] Efficient implementations of SHA256 and SHA512, using the Simultaneous Message Scheduling method. http://rt.openssl.org/Ticket/Display.html?id=2784&user=guest&pass=guest

  5. Intel: Intel Advanced Vector Extensions Programming Reference. http://software.intel.com/file/36945

  6. Intel: Software Development Emulator (SDE). http://software.intel.com/enus/articles/intel-software-development-emulator/

  7. Intel: Intel® Compilers. http://software.intel.com/en-us/articles/intel-compilers/

  8. Intel (M. Buxton): Haswell New Instruction Descriptions Now Available! http://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available/

  9. Kounavis, M.E., Kang, X., Grewal, K., Eszenyi, M., Gueron, S., Durham, D.: Encrypting the internet. In: Proceedings of the ACM SIGCOMM 2010 conference on SIGCOMM. http://portal.acm.org/citation.cfm?id=1851182.1851200

  10. Menezes, A.J., van Oorschot P.C., Vanstone, S.A.: Handbook of Applied Cryptography, 5th edn. CRC Press, Boca Raton (2001)

  11. NIST, cryptographic hash Algorithm Competition. http://csrc.nist.gov/groups/ST/hash/sha-3/index.html

  12. NIST: Secure Hash Standard. Draft Federal Information Processing Standards Publication, pp. 180–184 (2011)

  13. OpenSSL, The Open Source toolkit for SSL/TLS. http://openssl.org/

  14. SUPERCOP. http://bench.cr.yp.to/supercop.html

  15. YASM, The YASM Modular Assembler Project. http://yasm.tortall.net/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shay Gueron.

Appendix

Appendix

1.1 Code snippets

This appendix contains two C code examples. The first one implements 4-SMS for SHA-256, using SSE intrinsics, and the second one implements SHA-512 using AVX2 intrinsics. This code only illustrates the discussed method. The performance code is written in assembly (Figs. 8, 9).

Fig. 8
figure 8figure 8figure 8

4-SMS message scheduling for SHA-256 using C intrinsics for the SSE3 instruction set

Fig. 9
figure 9figure 9figure 9

4-SMS message scheduling for SHA-512 using C intrinsics for the AVX2 instruction set

1.2 Fig. 7—Sources

Figure 7 presents performance numbers for several hash algorithms. To facilitate reproducing the results, we provide the following details.

The source codes for Blake, Grøstl, JH, Keccak, and Skein were retrieved from “supercop” [14], and re-measured using the methodology described in Sect. 6.

The supercop version we used was 20120329 (SUPERCOP hereafter). It can be downloaded from http://hyperelliptic.org/ebats/supercop-20120329.tar.bz2. More details on the sources, including the compilation flags (when relevant) are:

  • SHA-256 openssl: OpenSSL 1.0.1

  • SHA-512 openssl: OpenSSL 1.0.1

  • SHA-256 4-SMS: the code posted in [4], applied to OpenSSL 1.0.1

  • SHA-512 2-SMS: the code posted in [4], applied to OpenSSL 1.0.1

  • Skein: SUPERCOP, “sandy”, compiled using: gcc -m64 -march=core2 -msse4.1-Os -fomit-frame-pointer

  • Blake256-SUPERCOP, “avxicc”, assembler

  • Blake512-SUPERCOP, “avxicc”, assembler

  • Grøstl256-SUPERCOP, “avx”, compiled using: gcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer -DTASM

  • Grøstl512-SUPERCOP, “aesni”, compiled using: gcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer -DTASM

  • JH256-SUPERCOP, “bitslice_sse2_opt64”, compiled using: icc -O3 -xAVX

  • Keccak-SUPERCOP, “\(\times \)86_64_shld”, compiled using: gcc-funroll-loops -O3 -fomit-frame-pointer

Compilers: we used gcc version 4.5.1, and icc version 12.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gueron, S., Krasnov, V. Parallelizing message schedules to accelerate the computations of hash functions. J Cryptogr Eng 2, 241–253 (2012). https://doi.org/10.1007/s13389-012-0037-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13389-012-0037-z

Keywords

Navigation