Abstract
To explore whether new parallelism techniques can provide additional performance improvements in cryptographic hash functions, we conducted our study with the SW26010, which is a special-architecture processor on Sunway TaihuLight, one of the world’s fastest supercomputers. Secure Hash Algorithms (SHAs) are significant for secure transmission, with SHA-256 remaining a safe and most efficient SHA design. We propose SW-SHA-256, a parallel SHA-256 implementation for hashing of multiple messages on the SW26010. Our work explores the parallel schemes at the instruction and thread levels. At the instruction level, we use vector registers to load multiple messages to complete hashing simultaneously. Assembly-level optimization methods such as dual issue are used, and the pipeline is distinct from that of a general-purpose processor. At the thread level, the optimized DMA transmission strategy and double buffer technique are used to reduce the cost from memory to cache. As a result, we obtain 5.87 cycles per byte in a single core which is 8.18X speed up faster than the C code in OpenSSLv3.0.0. Moreover, our implementation achieves a throughput of 60.21 GB/s on a SW26010 processor and is highly scalable.










Similar content being viewed by others
Data availability
The data used to support the findings of this study are available from the corresponding author upon request.
References
Kishore N, Raina P (2019) Parallel cryptographic hashing: developments in the last 25 years. Cryptologia 43(6):504–535. https://www.tandfonline.com/doi/full/10.1080/01611194.2019.1609130
Dang QH (2015) Secure hash standard. https://doi.org/10.6028/NIST.FIPS.180-4
Hülsing A, Butin D, Buchmann J, Gazdag S, Rijneveld J, Mohaisen A (2018) XMSS: eXtended Merkle signature scheme. RFC 8391:1–74. https://doi.org/10.17487/RFC8391
McGrew DA, Curcio M, Fluhrer SR (2018) Leighton-Micali Hash-based signatures. RFC 8554:1–61. https://doi.org/10.17487/RFC8554
D’Anvers JP, Karmakar A, Sinha Roy S, Vercauteren F (2018) Saber: module-LWR based key exchange, CPA-Secure Encryption and CCA-Secure KEM. In: Progress in cryptology - AFRICACRYPT 2018-10th International Conference on Cryptology in Africa, Marrakesh, Morocco, May 7–9, 2018, Proceedings, vol 10831, pp 282–305. https://doi.org/10.1007/978-3-319-89339-6_16
Bos J, Ducas L, Kiltz E, Lepoint T, Lyubashevsky V, Schanck JM, Schwabe P, Seiler G, Stehle D (2018) CRYSTALS-Kyber: a CCA-secure module-lattice-based KEM, In: 2018 IEEE European symposium on security and privacy, EuroS &P 2018, London, United Kingdom, pp 353–367. https://doi.org/10.1109/EuroSP.2018.00032
Archer BJ (2015) Seventy years of computing in the nuclear weapons program. https://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-15-20067
Li L, Fang J, Jiang J, Gan L, Zheng W, Fu H, Yang G (2021) Efficient AES implementation on sunway taihu light supercomputer: a systematic approach. J Parallel Distrib Comput 138:178–189. https://doi.org/10.1016/j.jpdc.2019.12.013
Cai W, Chen H, Wang Z, Zhang X (2021) Implementation and optimization of chacha20 stream cipher on sunway taihulight supercomputer. J Supercomput. https://doi.org/10.1007/s11227-021-04023-9
Aciicmez, O (2004) Fast hashing on pentium SIMD architecture. https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/mk61rk723
Atighehchi, K, Bonnecaze A (2017) Asymptotic analysis of plausible tree hash modes for SHA-3, IACR trans. Symmetric Cryptol 2017(4):212–239. https://doi.org/10.13154/tosc.v2017.i4.212-239
Gueron S (2014) Parallelized hashing via j-Lanes and j-Pointers tree modes, with applications to SHA-256. J Inform Secur 5:91–113. https://doi.org/10.4236/jis.2014.53010
Kachris C, Soudris D (2016) A survey on reconfigurable accelerators for cloud computing. In: 26th International Conference on Field Programmable Logic and Applications. FPL, pp 1–10. https://doi.org/10.1109/FPL.2016.7577381
Coughlin A, Cusack G, Wampler J, Keller E, Wustrow E (2019) Reaking the trust dependence on third party processes for reconfigurable secure hardware. In: Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays, vol 282. FPGA, p 291. https://doi.org/10.1145/3289602.3293895
Bonneau J, Miller A, Clark J, Narayanan A, Kroll JA, Felten EW (2015) SoK: research perspectives and challenges for bitcoin and cryptocurrencies. In: 2015 IEEE symposium on security and privacy, pp 104–121. https://doi.org/10.1109/SP.2015.14
Hong B, Kim HY, Kim M, Suh T, Xu L, Shi W (2017) Fasten: an fpga-based secure system for big data processing. IEEE Design Test 35(1):30–38. https://doi.org/10.1109/MDAT.2017.2741464
Kouicem DE, Bouabdallah A, Lakhlef H (2018) Internet of things security: a top-down survey. Comput Netw 141:99–221. https://doi.org/10.1016/j.comnet.2018.03.012
Liu Z, Chu X, Lv X, Meng H, Shi S, Han W, Xu J, Fu H, Yang G (2019) SunwayLB: enabling extreme-scale lattice boltzmann method based computing fluid dynamics simulations on sunway taihu light. In: In 2019 IEEE international parallel and distributed processing symposium (IPDPS), pp 557–566. https://doi.org/10.1109/IPDPS.2019.00065
Gueron S, Krasnov V (2012) Parallelizing message schedules to accelerate the computations of hash functions. J Cryptogr Eng 2(4):241–253. https://doi.org/10.1007/s13389-012-0037-z
Gueron S, Krasnov V (2012) Simultaneous hashing of multiple messages. Cryptol ePrint Arch 3(04):319. https://doi.org/10.4236/jis.2012.34039
Fan X, Niu B (2021) Multi-core and SIMD architecture based implementation on SHA-256 of Blockchain. In: CCF China Blockchain Conference, PP 55–65. https://doi.org/10.1007/s13389-012-0037-z
van der Linde W, Schwabe P, Batina L (2016) Parallel SHA-256 in NEON for use in hash-based signatures. http://www.cs.ru.nl/bachelors-theses/2016/Wouter_van_der_Linde___4291832___Parallel_SHA_256_in_NEON_for_use_in_hash_based_signatures.pdf
Kelsey J, Chang SJ Perlner R (2016) SHA-3 derived functions: cSHAKE, KMAC, tupleHash and parallelHash. NIST Spec Publ, 800:185. https://csrc.nist.gov/publications/detail/sp/800-185/final
Dworkin MJ, (2015) SHA-3 standard: permutation-based hash and extendable-output functions. https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf
Bensalem H, Yves Blaquière Y, Savaria Y (2021) Acceleration of the secure hash algorithm-256 (SHA-256) on an FPGA-CPU cluster using openCL. In: 2021 IEEE international symposium on circuits and systems, pp 1–5. https://doi.org/10.1109/ISCAS51556.2021.9401197
binti Suhaili S, Watanabe T, (2017) Design of high-throughput SHA-256 hash function based on FPGA. In: 6th International Conference on Electrical Engineering and Informatics (ICEEI), pp 1–6. https://doi.org/10.1109/ICEEI.2017.8312449
Phan VD, Pham HL, Tran TH, Nakashima Y (2021) High performance multicore SHA-256 accelerator using fully parallel computation and local memory. In: 2021 IEEE symposium in low-power and high-speed chips (COOL CHIPS), pp 166–170. https://doi.org/10.1109/COOLCHIPS52128.2021.9410349
Courtois NT, Grajek M, Naik R (2014) Optimizing SHA256 in bitcoin mining. In: International Conference on Cryptography and Security Systems, vol 448, pp 131–144. https://doi.org/10.1007/978-3-662-44893-9_12
NIST (2022) Cryptographic algorithm validation program. https://csrc.nist.gov/projects/cryptographic-algorithm-validation-program/secure-hashing
Kuznetsov A, Shekhanin K, Kolhatin A, Kovalchuk D, Babenko V, Perevozova I (2019) Performance of hash algorithms on gpus for use in blockchain. In: IEEE International Conference on Advanced Trends in Information Theory, pp 1–3. https://doi.org/10.1109/ATIT49449.2019.9030442
Acknowledgements
This research has been supported by the China National Key R &D Program (Grant No. 2018YFB1700405).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no conflicts of interest to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Z., Dong, X., Kang, Y. et al. Parallel SHA-256 on SW26010 many-core processor for hashing of multiple messages. J Supercomput 79, 2332–2355 (2023). https://doi.org/10.1007/s11227-022-04750-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04750-7