Skip to main content
Log in

Fast implementation of block ciphers and PRNGs in Maxwell GPU architecture

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

GPU is widely used in various applications that require huge computational power. In this paper, we contribute to the cryptography and high performance computing research community by presenting techniques to accelerate symmetric block ciphers (AES-128, CAST-128, Camellia, SEED, IDEA, Blowfish and Threefish) in NVIDIA GTX 980 with Maxwell architecture. The proposed techniques consider various aspects of block cipher implementation in GPU, including the placement of encryption keys and T-box in memory, thread block size, cipher operating mode, parallel granularity and data copy between CPU and GPU. We proposed a new method to store the encryption keys in registers with high access speed and exchange it with other threads by using the warp shuffle operation in GPU. The block ciphers implemented in this paper operate in CTR mode, and able to achieve high encryption speed with 149 Gbps (AES-128), 143 Gbps (CAST-128), 124 Gbps (Camelia), 112 Gbps (SEED), 149 Gbps (IDEA), 111 Gbps (Blowfish) and 197 Gbps (Threefish). To the best of our knowledge, this is the first implementation of block ciphers that exploits warp shuffle, an advanced feature in NVIDIA GPU. On the other hand, block ciphers can be used as pseudorandom number generator (PRNG) when it is operating under counter mode (CTR), but the speed is usually slower compare to other PRNG using lighter operations. Hence, we attempt to modify IDEA and Blowfish in order to achieve faster PRNG generation. The modified IDEA and Blowfish manage to pass all NIST Statistical Test and TestU01 SmallCrush except the more stringent tests in TestU01 (Crush and BigCrush).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Cao, W., Xu, C., Wang, Z., Yao, L., Liu, H.: CPU/GPU computing for a multi-block structured grid based high-order flow solver on a large heterogeneous system. J. Clust. Comput. 17(2), 255270 (2014)

    Google Scholar 

  2. Islam, M.S., Kim, C., Kim, J.: A GPU-based (8, 4) Hamming decoder for secure transmission of watermarked medical images. J. Clus. Comput. 18(1), 333–341 (2015)

    Article  Google Scholar 

  3. Chopkowski, M., Walkowiak, R.: A general purpose lossless data compression method for GPU. J. Parallel Distrib. Comput. 75, 40–52 (2015)

    Article  Google Scholar 

  4. Osa, G.L.: Fast implementation of two hash algorithm on NVidia CUDA GPU. Master thesis, Norwegian University of Science and Technology (2009)

  5. Bos, J.W., Osvik, D.A., Stefan, D.: Fast implementations of AES on various platforms. In: Software Performance Enhancement for Encryption and Decryption and Cryptographic Compilers (SPEED-CC), pp. 19–34 (2009)

  6. Hu, G., Ma, J., Huang, B.: High throughput implementation of MD5 algorithm on GPU. In: IEEE Proceedings of the 4th International Conference on Ubiquitous Information Technologies & Applications, pp. 1–5 (2009)

  7. Seo, S.C., Kim, T.H., Hong, S.K.: Accelerating elliptic curve scalar multiplication over GF(\(2^m\))on graphic hardware. J. Parallel Distrib. Comput. 75, 152–167 (2015)

    Article  Google Scholar 

  8. National Institute of Standards and Technology (NIST): FIPS-197: advanced encryption standard. http://www.itl.nist.gov/fipspubs/ (2001). Accessed 1 Sept 2015

  9. Adams, C.: The CAST-128 Encryption Algorithm. RFC 2144 (Informational) (1997)

  10. Lee, H.J., Lee, S.J., Yoon, J.H., Cheon, D.H., Lee, J.I., Korea Information Security Agency: The SEED encryption algorithm. The Internet Engineering Task Force RFC 4269 [online database]. http://www.ietf.org/rfc/rfc4269.txt (2005). Accessed 18 May 2015

  11. Aoki, K., Ichikawa, T., Kanda, M., Matsui, M., Moriai, S., Nakajima, J., Tokita, T.: Specifications of Camellia a 128-bit block cipher. http://info.isl.ntt.co.jp/crypt/eng/camellia/dl/01espec (2001). Accessed 6 May 2015

  12. Lai, X., Massey, J.L.: A proposal for a new block encryption standard. In: EUROCRYPT 1990, pp. 389404 (1990)

  13. Schneier, B.: Description of a new variable-length key, 64-bit block cipher (Blowfish). In Fast Software Encryption, Cambridge Security Workshop Proceedings, pp. 191204. Springer (1993)

  14. Ferguson, N., Lucks, S., Schneier, B., Whiting, D., Bellare, M., Kohno, T., Callas, J., Walker, J.: The Skein Hash Function Family, a SHA-3 candidate (2009)

  15. Transport Layer Security (TLS) Protocol Version 1.2, RFC 5246, 2008

  16. The Secure Sockets Layer (SSL) Protocol Version 3.0, RFC 6101 (1996)

  17. Leetmaa, M., Skorodumova, N.V.: KMCLib 1.1: extended random number support and technical updates to the KMCLib general framework for kinetic Monte-Carlo simulations. Comput. Phys. Commun. 196, 611–613 (2015)

    Article  Google Scholar 

  18. NIST Statistical Test Suite: A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. In: SP800-22, Revision 1a (2010)

  19. LEcuyer, P., Simard, R.: TestU01: a C library for empirical testing of random number generators. ACM Trans. Math. Softw. 33, 22 (2007)

  20. CUDA C Programming Guide V7.0. NVIDIA Corporation. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (2015). Accessed 26 June 2015

  21. Dworkin, M.: Recommendation for block cipher mode of operations. http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a (2001). Accessed 2 June 2015

  22. Biagio, A.D., Barenghi, A., Agosta, G., Pelosi, G.: Design of a parallel AES for graphics hardware using the CUDA framework. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2009)

  23. Mei, C., Jiang, H., Jenness, J.: CUDA-based AES parallelization with fine-tuned GPU memory utilization. In: IEEE International Symposium on Parallel and Distributed Processing, Workshops and Ph.D. Forum (IPDPSW) (2010)

  24. Bos, J.W., Osvik, D.A., Stefan, D., Canright, D.: Fast software AES encryption. In: Proceedings of the 17th International Conference on Fast Software Encryption (2010)

  25. Tran, N.P., Lee, M., Hong, S., Lee, S.J.: Parallel execution of AES-CTR algorithm using extended block size. In: IEEE 14th International Conference on Computational Science and Engineering (2011)

  26. Nishikawa, N., Iwai, K., Kurokawa, T.: High-performance symmetric block ciphers on multicore CPU and GPUs. Int. J. Netw. Comput. 2(2), 251–268 (2012)

    Google Scholar 

  27. Gilger, J., Barnickel, J., Meyer, U.: GPU-acceleration of block ciphers in the OpenSSL cryptographic library. In: Proceedings of the 15th International Conference on Information Security, pp. 338–353, Springer (2012)

  28. Lee, S., Kim, D., Yi, J., Ro, W.W.: An efficient block cipher implementation on many-core graphics processing unit. J. Inf. Process. Syst. 8(1), 159–174 (2012)

    Article  Google Scholar 

  29. Li, Q., Zhong, C., Zhao, K., Mei, X., Chu, X.: Implementation and analysis of AES encryption on GPU. In: IEEE 14th International Conference on High Performance Computing and Communications, pp. 843–848 (2012)

  30. Zola, W., Bona, L.C.E.: Parallel speculative encryption of multiple AES contexts on GPUs. In: IEEE International Conference on Innovative Parallel Programming, pp. 1–9 (2012)

  31. Nishikawa, N., Iwai, K., Tanaka, H., Kurokawa, T.: Throughput and power efficiency evaluations of block ciphers on Kepler and GCN GPUs using micro-benchmark analysis. IEICE Trans. Inf. Syst. E97–D(6), 1506–1515 (2014)

    Article  Google Scholar 

  32. Mukherjee, R., Rehman, M.S., Kothapalli, K., Narayanan, P.J., Srinathan, K.: Fast, scalable, and secure encryption on the GPU. http://researchweb.iiit.ac.in/~rishabh_m/gpu_crypto (2011). Accessed 2 Aug 2015

  33. Salmon, J.K., Moraes, M.A., Dror, R.O., Shaw, D.E.: Parallel random numbers: as easy as 1, 2, 3. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–12 (2011)

  34. Tuning CUDA Applications for Kepler V7.0. NVIDIA Corporation. http://docs.nvidia.com/cuda/kepler-tuning-guide/#axzz3wKHPco4y (2015). Accessed 21 May 2015

Download references

Acknowledgments

This work was supported partially by Universiti Tunku Abdul Rahman Research Fund (UTARRF) under Grant IPSR/RMC/UTARRF/2012-C2/L04. We would also like to thank the all members in Accelerative Technology Lab, MIMOS, Malaysia for their great support. This research work is also partially supported by Ministry of Science, Technology and Innovation (MOSTI), Malaysia under Grant 01-02-11-SF0202.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wai-Kong Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, WK., Cheong, HS., Phan, R.CW. et al. Fast implementation of block ciphers and PRNGs in Maxwell GPU architecture. Cluster Comput 19, 335–347 (2016). https://doi.org/10.1007/s10586-016-0536-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0536-2

Keywords

Navigation