Skip to main content
Log in

Parallel SHA-256 on SW26010 many-core processor for hashing of multiple messages

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

To explore whether new parallelism techniques can provide additional performance improvements in cryptographic hash functions, we conducted our study with the SW26010, which is a special-architecture processor on Sunway TaihuLight, one of the world’s fastest supercomputers. Secure Hash Algorithms (SHAs) are significant for secure transmission, with SHA-256 remaining a safe and most efficient SHA design. We propose SW-SHA-256, a parallel SHA-256 implementation for hashing of multiple messages on the SW26010. Our work explores the parallel schemes at the instruction and thread levels. At the instruction level, we use vector registers to load multiple messages to complete hashing simultaneously. Assembly-level optimization methods such as dual issue are used, and the pipeline is distinct from that of a general-purpose processor. At the thread level, the optimized DMA transmission strategy and double buffer technique are used to reduce the cost from memory to cache. As a result, we obtain 5.87 cycles per byte in a single core which is 8.18X speed up faster than the C code in OpenSSLv3.0.0. Moreover, our implementation achieves a throughput of 60.21 GB/s on a SW26010 processor and is highly scalable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

  1. Kishore N, Raina P (2019) Parallel cryptographic hashing: developments in the last 25 years. Cryptologia 43(6):504–535. https://www.tandfonline.com/doi/full/10.1080/01611194.2019.1609130

    Article  Google Scholar 

  2. Dang QH (2015) Secure hash standard. https://doi.org/10.6028/NIST.FIPS.180-4

  3. Hülsing A, Butin D, Buchmann J, Gazdag S, Rijneveld J, Mohaisen A (2018) XMSS: eXtended Merkle signature scheme. RFC 8391:1–74. https://doi.org/10.17487/RFC8391

    Article  Google Scholar 

  4. McGrew DA, Curcio M, Fluhrer SR (2018) Leighton-Micali Hash-based signatures. RFC 8554:1–61. https://doi.org/10.17487/RFC8554

    Article  Google Scholar 

  5. D’Anvers JP, Karmakar A, Sinha Roy S, Vercauteren F (2018) Saber: module-LWR based key exchange, CPA-Secure Encryption and CCA-Secure KEM. In: Progress in cryptology - AFRICACRYPT 2018-10th International Conference on Cryptology in Africa, Marrakesh, Morocco, May 7–9, 2018, Proceedings, vol 10831, pp 282–305. https://doi.org/10.1007/978-3-319-89339-6_16

  6. Bos J, Ducas L, Kiltz E, Lepoint T, Lyubashevsky V, Schanck JM, Schwabe P, Seiler G, Stehle D (2018) CRYSTALS-Kyber: a CCA-secure module-lattice-based KEM, In: 2018 IEEE European symposium on security and privacy, EuroS &P 2018, London, United Kingdom, pp 353–367. https://doi.org/10.1109/EuroSP.2018.00032

  7. Archer BJ (2015) Seventy years of computing in the nuclear weapons program. https://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-15-20067

  8. Li L, Fang J, Jiang J, Gan L, Zheng W, Fu H, Yang G (2021) Efficient AES implementation on sunway taihu light supercomputer: a systematic approach. J Parallel Distrib Comput 138:178–189. https://doi.org/10.1016/j.jpdc.2019.12.013

    Article  Google Scholar 

  9. Cai W, Chen H, Wang Z, Zhang X (2021) Implementation and optimization of chacha20 stream cipher on sunway taihulight supercomputer. J Supercomput. https://doi.org/10.1007/s11227-021-04023-9

    Article  Google Scholar 

  10. Aciicmez, O (2004) Fast hashing on pentium SIMD architecture. https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/mk61rk723

  11. Atighehchi, K, Bonnecaze A (2017) Asymptotic analysis of plausible tree hash modes for SHA-3, IACR trans. Symmetric Cryptol 2017(4):212–239. https://doi.org/10.13154/tosc.v2017.i4.212-239

    Article  Google Scholar 

  12. Gueron S (2014) Parallelized hashing via j-Lanes and j-Pointers tree modes, with applications to SHA-256. J Inform Secur 5:91–113. https://doi.org/10.4236/jis.2014.53010

    Article  Google Scholar 

  13. Kachris C, Soudris D (2016) A survey on reconfigurable accelerators for cloud computing. In: 26th International Conference on Field Programmable Logic and Applications. FPL, pp 1–10. https://doi.org/10.1109/FPL.2016.7577381

  14. Coughlin A, Cusack G, Wampler J, Keller E, Wustrow E (2019) Reaking the trust dependence on third party processes for reconfigurable secure hardware. In: Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays, vol 282. FPGA, p 291. https://doi.org/10.1145/3289602.3293895

  15. Bonneau J, Miller A, Clark J, Narayanan A, Kroll JA, Felten EW (2015) SoK: research perspectives and challenges for bitcoin and cryptocurrencies. In: 2015 IEEE symposium on security and privacy, pp 104–121. https://doi.org/10.1109/SP.2015.14

  16. Hong B, Kim HY, Kim M, Suh T, Xu L, Shi W (2017) Fasten: an fpga-based secure system for big data processing. IEEE Design Test 35(1):30–38. https://doi.org/10.1109/MDAT.2017.2741464

    Article  Google Scholar 

  17. Kouicem DE, Bouabdallah A, Lakhlef H (2018) Internet of things security: a top-down survey. Comput Netw 141:99–221. https://doi.org/10.1016/j.comnet.2018.03.012

    Article  Google Scholar 

  18. Liu Z, Chu X, Lv X, Meng H, Shi S, Han W, Xu J, Fu H, Yang G (2019) SunwayLB: enabling extreme-scale lattice boltzmann method based computing fluid dynamics simulations on sunway taihu light. In: In 2019 IEEE international parallel and distributed processing symposium (IPDPS), pp 557–566. https://doi.org/10.1109/IPDPS.2019.00065

  19. Gueron S, Krasnov V (2012) Parallelizing message schedules to accelerate the computations of hash functions. J Cryptogr Eng 2(4):241–253. https://doi.org/10.1007/s13389-012-0037-z

    Article  Google Scholar 

  20. Gueron S, Krasnov V (2012) Simultaneous hashing of multiple messages. Cryptol ePrint Arch 3(04):319. https://doi.org/10.4236/jis.2012.34039

    Article  Google Scholar 

  21. Fan X, Niu B (2021) Multi-core and SIMD architecture based implementation on SHA-256 of Blockchain. In: CCF China Blockchain Conference, PP 55–65. https://doi.org/10.1007/s13389-012-0037-z

  22. van der Linde W, Schwabe P, Batina L (2016) Parallel SHA-256 in NEON for use in hash-based signatures. http://www.cs.ru.nl/bachelors-theses/2016/Wouter_van_der_Linde___4291832___Parallel_SHA_256_in_NEON_for_use_in_hash_based_signatures.pdf

  23. Kelsey J, Chang SJ Perlner R (2016) SHA-3 derived functions: cSHAKE, KMAC, tupleHash and parallelHash. NIST Spec Publ, 800:185. https://csrc.nist.gov/publications/detail/sp/800-185/final

  24. Dworkin MJ, (2015) SHA-3 standard: permutation-based hash and extendable-output functions. https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf

  25. Bensalem H, Yves Blaquière Y, Savaria Y (2021) Acceleration of the secure hash algorithm-256 (SHA-256) on an FPGA-CPU cluster using openCL. In: 2021 IEEE international symposium on circuits and systems, pp 1–5. https://doi.org/10.1109/ISCAS51556.2021.9401197

  26. binti Suhaili S, Watanabe T, (2017) Design of high-throughput SHA-256 hash function based on FPGA. In: 6th International Conference on Electrical Engineering and Informatics (ICEEI), pp 1–6. https://doi.org/10.1109/ICEEI.2017.8312449

  27. Phan VD, Pham HL, Tran TH, Nakashima Y (2021) High performance multicore SHA-256 accelerator using fully parallel computation and local memory. In: 2021 IEEE symposium in low-power and high-speed chips (COOL CHIPS), pp 166–170. https://doi.org/10.1109/COOLCHIPS52128.2021.9410349

  28. Courtois NT, Grajek M, Naik R (2014) Optimizing SHA256 in bitcoin mining. In: International Conference on Cryptography and Security Systems, vol 448, pp 131–144. https://doi.org/10.1007/978-3-662-44893-9_12

  29. NIST (2022) Cryptographic algorithm validation program. https://csrc.nist.gov/projects/cryptographic-algorithm-validation-program/secure-hashing

  30. Kuznetsov A, Shekhanin K, Kolhatin A, Kovalchuk D, Babenko V, Perevozova I (2019) Performance of hash algorithms on gpus for use in blockchain. In: IEEE International Conference on Advanced Trends in Information Theory, pp 1–3. https://doi.org/10.1109/ATIT49449.2019.9030442

Download references

Acknowledgements

This research has been supported by the China National Key R &D Program (Grant No. 2018YFB1700405).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Chen.

Ethics declarations

Conflicts of interest

The authors have no conflicts of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Dong, X., Kang, Y. et al. Parallel SHA-256 on SW26010 many-core processor for hashing of multiple messages. J Supercomput 79, 2332–2355 (2023). https://doi.org/10.1007/s11227-022-04750-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04750-7

Keywords

Navigation