Skip to main content
Log in

Implementation and optimization of ChaCha20 stream cipher on sunway taihuLight supercomputer

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Data have always been the most valuable asset of enterprises and research institutions, and their confidentiality, especially the input and output data related to applications running on remote supercomputers, should be protected as much as possible. However, because of the large scale of the data, it takes a considerable amount of time to encrypt and decrypt them. The ChaCha20 cipher and the Advanced Encryption Standard (AES) cipher are the only ciphers supported by TLS v1.3. The ChaCha20 cipher is a kind of high-speed stream cipher emerging in recent years, which has attracted more and more attention due to its security and high efficiency. In order to make large-scale data en-/decryption more efficient, we implement a parallel version of the ChaCha20 stream cipher, parallel ChaCha20, which is optimized for SW26010 heterogeneous multi-core processor on the Sunway TaihuLight supercomputer. We used multiple optimization methods such as Direct Memory Access (DMA) and Single Instruction Multiple Data (SIMD) supported by SW26010 and proposed an optimization scheme that dynamically changes with the size of input data. The experiment results show that the parallel ChaCha20 has a maximum throughput of 32.43 GB/s on a single SW26010 processor, which is 2.4 times that of the best AES implementation on Sunway as far as we know. Moreover, the parallel ChaCha20 has a good scalability and runs on 1024 core groups with a max throughput of 8296.43 GB/s.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. At N, Beuchat J, Okamoto E, San I, Yamazaki T (2014) Compact hardware implementations of chacha, blake, threefish, and skein on FPGA. IEEE Trans Circuits Syst I Regul Pap 61(2):485–498

    Article  Google Scholar 

  2. Aumasson J, Fischer S, Khazaei S, Meier W, Rechberger C (2008) New features of latin dances: Analysis of salsa, chacha, and rumba. In: Nyberg K (ed) Fast Software Encryption, 15th International Workshop, FSE 2008, Lausanne, Switzerland, 2008, Revised Selected Papers, Springer, Lecture Notes in Computer Science, vol 5086, pp 470–488

  3. Bernstein D (2008a) Chacha, a variant of salsa20. In: Workshop Record of SASC, pp 3–5

  4. Bernstein DJ (2008b) The salsa20 family of stream ciphers. In: Robshaw MJB, Billet O (eds) New stream cipher designs - the eSTREAM finalists, vol 4986. Lecture notes in computer science. Springer, Berlin, pp 84–97

    Chapter  Google Scholar 

  5. Chen Y, Li K, Fei X, Quan Z, Li K (2016) Implementation and optimization of AES algorithm on the sunway taihulight. In: Shen H, Sang Y, Tian H (eds) 17th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2016, Guangzhou, China, 2016, IEEE Computer Society, pp. 256–261

  6. Chen Y, Li K, Fei X, Quan Z, Li K (2019) Implementation and optimization of a data protecting model on the sunway taihulight supercomputer with heterogeneous many-core processors. Concurr Comput Pract Exp 31(21):e4758

    Article  Google Scholar 

  7. Dey S, Sarkar S (2020) Proving the biases of salsa and chacha in differential attack. Des Codes Cryptogr 88(9):1827–1856

    Article  MathSciNet  Google Scholar 

  8. Dey S, Roy T, Sarkar S (2019) Revisiting design principles of salsa and chacha. Adv Math Commun 13(4):689–704. https://doi.org/10.3934/amc.2019041

    Article  MathSciNet  MATH  Google Scholar 

  9. Dongarra J (2016) Report on the sunway taihulight system. PDF) www netlib org Retrieved

  10. Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F, Zhao W, Yin X, Hou C, Zhang C, Ge W, Zhang J, Wang Y, Zhou C, Yang G (2016) The sunway taihulight supercomputer: system and applications. Sci China. Inf Sci 59(7):072001:1–072001:16(7):072001:1-072001:16

    Google Scholar 

  11. Fu H, Liao J, Ding N, Duan X, Gan L, Liang Y, Wang X, Yang J, Zheng Y, Liu W, Wang L, Yang G (2017) Redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on sunway taihulight. In: Mohr B, Raghavan P (eds) Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, 2017, ACM, pp. 1:1–1:12

  12. Goll M, Gueron S (2014) Vectorization on chacha stream cipher. In: Latifi S (ed) 11th International Conference on Information Technology: New Generations, ITNG 2014, Las Vegas, NV, USA, 2014, IEEE Computer Society, pp. 612–615

  13. He L, An H, Yang C, Wang F, Chen J, Wang C, Liang W, Dong S, Sun Q, Han W, Liu W, Han Y, Yao W (2018) PEPS++: towards extreme-scale simulations of strongly correlated quantum many-particle models on sunway taihulight. IEEE Trans Parallel Distrib Syst 29(12):2838–2848

    Article  Google Scholar 

  14. III BD, Gunawi HS, Feldman AJ, Hoffmann H (2018) Strongbox: Confidentiality, integrity, and performance using stream ciphers for full drive encryption. In: Shen X, Tuck J, Bianchini R, Sarkar V (eds) Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018, Williamsburg, VA, USA, 2018, ACM, pp. 708–721

  15. Isobe T, Ohigashi T, Watanabe Y, Morii M (2013) Full plaintext recovery attack on broadcast RC4. In: Moriai S (ed) Fast Software Encryption - 20th International Workshop, FSE 2013, Singapore, 2013. Revised Selected Papers, Springer, Lecture Notes in Computer Science, vol 8424, pp. 179–202

  16. Kumar SVD, Patranabis S, Breier J, Mukhopadhyay D, Bhasin S, Chattopadhyay A, Baksi A (2017) A practical fault attack on arx-like ciphers with a case study on chacha20. In: 2017 Workshop on Fault Diagnosis and Tolerance in Cryptography, FDTC 2017, Taipei, Taiwan, 2017, IEEE Computer Society, pp. 33–40

  17. Li L, Fang J, Jiang J, Gan L, Zheng W, Fu H, Yang G (2017) SW-AES: accelerating AES algorithm on the sunway taihulight. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), Guangzhou, China, 2017, IEEE, pp 1204–1211

  18. Li L, Fang J, Jiang J, Gan L, Zheng W, Fu H, Yang G (2020) Efficient AES implementation on sunway taihulight supercomputer: a systematic approach. J Parallel Distrib Comput 138:178–189

    Article  Google Scholar 

  19. Maitra S (2016) Chosen IV cryptanalysis on reduced round chacha and salsa. Discret Appl Math 208:88–97. https://doi.org/10.1016/j.dam.2016.02.020

    Article  MathSciNet  MATH  Google Scholar 

  20. Langley A, Chang W-T, Mavrogiannopoulos N, Strömbergson J, Simon J (2016) ChaCha20-Poly1305 Cipher Suites for Transport Layer Security (TLS). RFC 7905:1–8 (2016). https://doi.org/10.17487/RFC7905. https://dblp.org/rec/journals/rfc/rfc7905.bib

  21. McLaren P, Buchanan WJ, Russell G, Tan Z (2019) Deriving chacha20 key streams from targeted memory analysis. J Inf Secur Appl 48:102372

    Google Scholar 

  22. Nir Y, Langley A (2018) Chacha20 and poly1305 for IETF protocols. RFC 8439:1–46

    Google Scholar 

  23. Pfau J, Reuter M, Harbaum T, Hofmann K, Becker J (2019) A hardware perspective on the chacha ciphers: scalable chacha8/12/20 implementations ranging from 476 slices to bitrates of 175 gbit/s. In: 32nd IEEE International System-on-Chip Conference, SOCC 2019, Singapore, 2019, IEEE, pp. 294–299

  24. Rescorla E (2018) The transport layer security (TLS) protocol version 1.3. RFC 8446:1–160

    Google Scholar 

  25. Shi Z, Zhang B, Feng D, Wu W (2012) Improved key recovery attacks on reduced-round salsa20 and chacha. In: Kwon T, Lee M, Kwon D (eds) Information Security and Cryptology - ICISC 2012 - 15th International Conference, Seoul, Korea, 2012, Revised Selected Papers, Springer, Lecture Notes in Computer Science, vol 7839, pp. 337–351

  26. Silitonga A, Schade F, Jiang G, Becker J (2018) Hls-based performance and resource optimization of cryptographic modules. In: Chen J, Yang LT (eds) IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications, ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018, Melbourne, Australia, 2018, IEEE, pp. 1009–1016

  27. Soltani A, Sharifian S (2015) An ultra-high throughput and fully pipelined implementation of AES algorithm on FPGA. Microprocess Microsyst 39(7):480–493

    Article  Google Scholar 

  28. Sun S, Zhang R, Ma H (2020) Efficient parallelism of post-quantum signature scheme SPHINCS. IEEE Trans Parallel Distrib Syst 31(11):2542–2555

    Article  Google Scholar 

  29. Velea R, Gurzau F, Margarit L, Bica I, Patriciu VV (2016) Performance of parallel chacha20 stream cipher. In: 11th IEEE International Symposium on Applied Computational Intelligence and Informatics, SACI 2016, Timisoara, Romania, 2016, IEEE, pp 391–396

  30. Xiao Z, Liu X, Xu J, Sun Q, Gan L (2021) Highly scalable parallel genetic algorithm on sunway many-core processors. Future Gener Comput Syst 114:679–691

    Article  Google Scholar 

  31. Xu Z, Lin J, Matsuoka S (2017) Benchmarking SW26010 many-core processor. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2017, Orlando / Buena Vista, FL, USA, 2017, IEEE Computer Society, pp. 743–752

Download references

Acknowledgements

This research has been supported by the China National Key R&D Program during the 13th Five-year Plan Period (Grant No. 2018YFB1700405).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weilin Cai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, W., Chen, H., Wang, Z. et al. Implementation and optimization of ChaCha20 stream cipher on sunway taihuLight supercomputer. J Supercomput 78, 4199–4216 (2022). https://doi.org/10.1007/s11227-021-04023-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04023-9

Keywords

Navigation