Parallelizing message schedules to accelerate the computations of hash functions

Gueron, Shay; Krasnov, Vlad

doi:10.1007/s13389-012-0037-z

Parallelizing message schedules to accelerate the computations of hash functions

Regular Paper
Published: 26 September 2012

Volume 2, pages 241–253, (2012)
Cite this article

Journal of Cryptographic Engineering Aims and scope Submit manuscript

Shay Gueron^1,2 &
Vlad Krasnov²

228 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

This paper describes an algorithm for accelerating the computations of Davies–Meyer based hash functions. It is based on parallelizing the computation of several message schedules for several message blocks of a given message. This parallelization, together with the proper use of vector processor instructions (SIMD) improves the overall algorithm’s performance. Using this method, we obtain a new software implementation of SHA-256 that performs at 11.47 Cycles/Byte on the second and 10.18 Cycles/Byte (for an 8 KB message) on the third Generation Intel\(^{\textregistered }\) Core\(^\mathrm{TM}\) processors. We also show how to extend the method to the soon-to-come AVX2 architecture, which has wider registers. Since processors with AVX2 will be available only in 2013, exact performance reporting is not yet possible. Instead, we show that our resulting SHA-256 and SHA-512 implementations have a reduced number of instructions. Based on our findings, we make some observations on the SHA3 competition. We argue that if the prospective SHA3 standard is expected to be competitive against the performance of SHA-256 or SHA-512, on the high end platforms, then its performance should be well below 10 Cycles/Byte on the current, and certainly on the near future processors. Not all the SHA3 finalists have this performance. Furthermore, even the fastest finalists will probably offer only a small performance advantage over the current SHA-256 and SHA-512 implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Pipelined FFT Hardware Architectures

Article Open access 06 July 2021

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

In-memory database acceleration on FPGAs: a survey

Article Open access 26 October 2019

References

Federal Information Processing Standards Publication 180–2: Secure Hash Standard. http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf
Gueron, S.: Speeding up SHA-1, SHA-256 and SHA-512 on the 2nd Generation Intel® Core\(^{\rm TM}\) Processors (to be published; ITNG 2012)
Gueron, S., Johnson, S., Walker, J.: SHA-512/256. In: IEEE Proceedings of 8th International Conference on Information Technology: New Generations (ITNG 2011) (2011)
Gueron, S., Krasnov, V.: [PATCH] Efficient implementations of SHA256 and SHA512, using the Simultaneous Message Scheduling method. http://rt.openssl.org/Ticket/Display.html?id=2784&user=guest&pass=guest
Intel: Intel Advanced Vector Extensions Programming Reference. http://software.intel.com/file/36945
Intel: Software Development Emulator (SDE). http://software.intel.com/enus/articles/intel-software-development-emulator/
Intel: Intel® Compilers. http://software.intel.com/en-us/articles/intel-compilers/
Intel (M. Buxton): Haswell New Instruction Descriptions Now Available! http://software.intel.com/en-us/blogs/2011/06/13/haswell-new-instruction-descriptions-now-available/
Kounavis, M.E., Kang, X., Grewal, K., Eszenyi, M., Gueron, S., Durham, D.: Encrypting the internet. In: Proceedings of the ACM SIGCOMM 2010 conference on SIGCOMM. http://portal.acm.org/citation.cfm?id=1851182.1851200
Menezes, A.J., van Oorschot P.C., Vanstone, S.A.: Handbook of Applied Cryptography, 5th edn. CRC Press, Boca Raton (2001)
NIST, cryptographic hash Algorithm Competition. http://csrc.nist.gov/groups/ST/hash/sha-3/index.html
NIST: Secure Hash Standard. Draft Federal Information Processing Standards Publication, pp. 180–184 (2011)
OpenSSL, The Open Source toolkit for SSL/TLS. http://openssl.org/
SUPERCOP. http://bench.cr.yp.to/supercop.html
YASM, The YASM Modular Assembler Project. http://yasm.tortall.net/

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Haifa, Haifa, Israel
Shay Gueron
Intel Architecture Group, Israel Development Center, Haifa, Israel
Shay Gueron & Vlad Krasnov

Authors

Shay Gueron
View author publications
You can also search for this author in PubMed Google Scholar
Vlad Krasnov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shay Gueron.

Appendix

1.1 Code snippets

This appendix contains two C code examples. The first one implements 4-SMS for SHA-256, using SSE intrinsics, and the second one implements SHA-512 using AVX2 intrinsics. This code only illustrates the discussed method. The performance code is written in assembly (Figs. 8, 9).

1.2 Fig. 7—Sources

Figure 7 presents performance numbers for several hash algorithms. To facilitate reproducing the results, we provide the following details.

The source codes for Blake, Grøstl, JH, Keccak, and Skein were retrieved from “supercop” [14], and re-measured using the methodology described in Sect. 6.

The supercop version we used was 20120329 (SUPERCOP hereafter). It can be downloaded from http://hyperelliptic.org/ebats/supercop-20120329.tar.bz2. More details on the sources, including the compilation flags (when relevant) are:

SHA-256 openssl: OpenSSL 1.0.1
SHA-512 openssl: OpenSSL 1.0.1
SHA-256 4-SMS: the code posted in [4], applied to OpenSSL 1.0.1
SHA-512 2-SMS: the code posted in [4], applied to OpenSSL 1.0.1
Skein: SUPERCOP, “sandy”, compiled using: gcc -m64 -march=core2 -msse4.1-Os -fomit-frame-pointer
Blake256-SUPERCOP, “avxicc”, assembler
Blake512-SUPERCOP, “avxicc”, assembler
Grøstl256-SUPERCOP, “avx”, compiled using: gcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer -DTASM
Grøstl512-SUPERCOP, “aesni”, compiled using: gcc -funroll-loops -march=nocona -O3 -fomit-frame-pointer -DTASM
JH256-SUPERCOP, “bitslice_sse2_opt64”, compiled using: icc -O3 -xAVX
Keccak-SUPERCOP, “\(\times \)86_64_shld”, compiled using: gcc-funroll-loops -O3 -fomit-frame-pointer

Compilers: we used gcc version 4.5.1, and icc version 12.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gueron, S., Krasnov, V. Parallelizing message schedules to accelerate the computations of hash functions. J Cryptogr Eng 2, 241–253 (2012). https://doi.org/10.1007/s13389-012-0037-z

Download citation

Received: 20 February 2012
Accepted: 19 August 2012
Published: 26 September 2012
Issue Date: November 2012
DOI: https://doi.org/10.1007/s13389-012-0037-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallelizing message schedules to accelerate the computations of hash functions

Abstract

Access this article

Similar content being viewed by others

A Survey on Pipelined FFT Hardware Architectures

Performance improvement of the triangular matrix product in commodity clusters

In-memory database acceleration on FPGAs: a survey

References