Abstract
Data have always been the most valuable asset of enterprises and research institutions, and their confidentiality, especially the input and output data related to applications running on remote supercomputers, should be protected as much as possible. However, because of the large scale of the data, it takes a considerable amount of time to encrypt and decrypt them. The ChaCha20 cipher and the Advanced Encryption Standard (AES) cipher are the only ciphers supported by TLS v1.3. The ChaCha20 cipher is a kind of high-speed stream cipher emerging in recent years, which has attracted more and more attention due to its security and high efficiency. In order to make large-scale data en-/decryption more efficient, we implement a parallel version of the ChaCha20 stream cipher, parallel ChaCha20, which is optimized for SW26010 heterogeneous multi-core processor on the Sunway TaihuLight supercomputer. We used multiple optimization methods such as Direct Memory Access (DMA) and Single Instruction Multiple Data (SIMD) supported by SW26010 and proposed an optimization scheme that dynamically changes with the size of input data. The experiment results show that the parallel ChaCha20 has a maximum throughput of 32.43 GB/s on a single SW26010 processor, which is 2.4 times that of the best AES implementation on Sunway as far as we know. Moreover, the parallel ChaCha20 has a good scalability and runs on 1024 core groups with a max throughput of 8296.43 GB/s.
Similar content being viewed by others
References
At N, Beuchat J, Okamoto E, San I, Yamazaki T (2014) Compact hardware implementations of chacha, blake, threefish, and skein on FPGA. IEEE Trans Circuits Syst I Regul Pap 61(2):485–498
Aumasson J, Fischer S, Khazaei S, Meier W, Rechberger C (2008) New features of latin dances: Analysis of salsa, chacha, and rumba. In: Nyberg K (ed) Fast Software Encryption, 15th International Workshop, FSE 2008, Lausanne, Switzerland, 2008, Revised Selected Papers, Springer, Lecture Notes in Computer Science, vol 5086, pp 470–488
Bernstein D (2008a) Chacha, a variant of salsa20. In: Workshop Record of SASC, pp 3–5
Bernstein DJ (2008b) The salsa20 family of stream ciphers. In: Robshaw MJB, Billet O (eds) New stream cipher designs - the eSTREAM finalists, vol 4986. Lecture notes in computer science. Springer, Berlin, pp 84–97
Chen Y, Li K, Fei X, Quan Z, Li K (2016) Implementation and optimization of AES algorithm on the sunway taihulight. In: Shen H, Sang Y, Tian H (eds) 17th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2016, Guangzhou, China, 2016, IEEE Computer Society, pp. 256–261
Chen Y, Li K, Fei X, Quan Z, Li K (2019) Implementation and optimization of a data protecting model on the sunway taihulight supercomputer with heterogeneous many-core processors. Concurr Comput Pract Exp 31(21):e4758
Dey S, Sarkar S (2020) Proving the biases of salsa and chacha in differential attack. Des Codes Cryptogr 88(9):1827–1856
Dey S, Roy T, Sarkar S (2019) Revisiting design principles of salsa and chacha. Adv Math Commun 13(4):689–704. https://doi.org/10.3934/amc.2019041
Dongarra J (2016) Report on the sunway taihulight system. PDF) www netlib org Retrieved
Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F, Zhao W, Yin X, Hou C, Zhang C, Ge W, Zhang J, Wang Y, Zhou C, Yang G (2016) The sunway taihulight supercomputer: system and applications. Sci China. Inf Sci 59(7):072001:1–072001:16(7):072001:1-072001:16
Fu H, Liao J, Ding N, Duan X, Gan L, Liang Y, Wang X, Yang J, Zheng Y, Liu W, Wang L, Yang G (2017) Redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on sunway taihulight. In: Mohr B, Raghavan P (eds) Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, 2017, ACM, pp. 1:1–1:12
Goll M, Gueron S (2014) Vectorization on chacha stream cipher. In: Latifi S (ed) 11th International Conference on Information Technology: New Generations, ITNG 2014, Las Vegas, NV, USA, 2014, IEEE Computer Society, pp. 612–615
He L, An H, Yang C, Wang F, Chen J, Wang C, Liang W, Dong S, Sun Q, Han W, Liu W, Han Y, Yao W (2018) PEPS++: towards extreme-scale simulations of strongly correlated quantum many-particle models on sunway taihulight. IEEE Trans Parallel Distrib Syst 29(12):2838–2848
III BD, Gunawi HS, Feldman AJ, Hoffmann H (2018) Strongbox: Confidentiality, integrity, and performance using stream ciphers for full drive encryption. In: Shen X, Tuck J, Bianchini R, Sarkar V (eds) Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018, Williamsburg, VA, USA, 2018, ACM, pp. 708–721
Isobe T, Ohigashi T, Watanabe Y, Morii M (2013) Full plaintext recovery attack on broadcast RC4. In: Moriai S (ed) Fast Software Encryption - 20th International Workshop, FSE 2013, Singapore, 2013. Revised Selected Papers, Springer, Lecture Notes in Computer Science, vol 8424, pp. 179–202
Kumar SVD, Patranabis S, Breier J, Mukhopadhyay D, Bhasin S, Chattopadhyay A, Baksi A (2017) A practical fault attack on arx-like ciphers with a case study on chacha20. In: 2017 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2017, Taipei, Taiwan, 2017, IEEE Computer Society, pp. 33–40
Li L, Fang J, Jiang J, Gan L, Zheng W, Fu H, Yang G (2017) SW-AES: accelerating AES algorithm on the sunway taihulight. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), Guangzhou, China, 2017, IEEE, pp 1204–1211
Li L, Fang J, Jiang J, Gan L, Zheng W, Fu H, Yang G (2020) Efficient AES implementation on sunway taihulight supercomputer: a systematic approach. J Parallel Distrib Comput 138:178–189
Maitra S (2016) Chosen IV cryptanalysis on reduced round chacha and salsa. Discret Appl Math 208:88–97. https://doi.org/10.1016/j.dam.2016.02.020
Langley A, Chang W-T, Mavrogiannopoulos N, Strömbergson J, Simon J (2016) ChaCha20-Poly1305 Cipher Suites for Transport Layer Security (TLS). RFC 7905:1–8 (2016). https://doi.org/10.17487/RFC7905. https://dblp.org/rec/journals/rfc/rfc7905.bib
McLaren P, Buchanan WJ, Russell G, Tan Z (2019) Deriving chacha20 key streams from targeted memory analysis. J Inf Secur Appl 48:102372
Nir Y, Langley A (2018) Chacha20 and poly1305 for IETF protocols. RFC 8439:1–46
Pfau J, Reuter M, Harbaum T, Hofmann K, Becker J (2019) A hardware perspective on the chacha ciphers: scalable chacha8/12/20 implementations ranging from 476 slices to bitrates of 175 gbit/s. In: 32nd IEEE International System-on-Chip Conference, SOCC 2019, Singapore, 2019, IEEE, pp. 294–299
Rescorla E (2018) The transport layer security (TLS) protocol version 1.3. RFC 8446:1–160
Shi Z, Zhang B, Feng D, Wu W (2012) Improved key recovery attacks on reduced-round salsa20 and chacha. In: Kwon T, Lee M, Kwon D (eds) Information Security and Cryptology - ICISC 2012 - 15th International Conference, Seoul, Korea, 2012, Revised Selected Papers, Springer, Lecture Notes in Computer Science, vol 7839, pp. 337–351
Silitonga A, Schade F, Jiang G, Becker J (2018) Hls-based performance and resource optimization of cryptographic modules. In: Chen J, Yang LT (eds) IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications, ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018, Melbourne, Australia, 2018, IEEE, pp. 1009–1016
Soltani A, Sharifian S (2015) An ultra-high throughput and fully pipelined implementation of AES algorithm on FPGA. Microprocess Microsyst 39(7):480–493
Sun S, Zhang R, Ma H (2020) Efficient parallelism of post-quantum signature scheme SPHINCS. IEEE Trans Parallel Distrib Syst 31(11):2542–2555
Velea R, Gurzau F, Margarit L, Bica I, Patriciu VV (2016) Performance of parallel chacha20 stream cipher. In: 11th IEEE International Symposium on Applied Computational Intelligence and Informatics, SACI 2016, Timisoara, Romania, 2016, IEEE, pp 391–396
Xiao Z, Liu X, Xu J, Sun Q, Gan L (2021) Highly scalable parallel genetic algorithm on sunway many-core processors. Future Gener Comput Syst 114:679–691
Xu Z, Lin J, Matsuoka S (2017) Benchmarking SW26010 many-core processor. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2017, Orlando / Buena Vista, FL, USA, 2017, IEEE Computer Society, pp. 743–752
Acknowledgements
This research has been supported by the China National Key R&D Program during the 13th Five-year Plan Period (Grant No. 2018YFB1700405).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cai, W., Chen, H., Wang, Z. et al. Implementation and optimization of ChaCha20 stream cipher on sunway taihuLight supercomputer. J Supercomput 78, 4199–4216 (2022). https://doi.org/10.1007/s11227-021-04023-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04023-9