Fast implementation of block ciphers and PRNGs in Maxwell GPU architecture

Lee, Wai-Kong; Cheong, Hon-Sang; Phan, Raphael C.-W.; Goi, Bok-Min

doi:10.1007/s10586-016-0536-2

Fast implementation of block ciphers and PRNGs in Maxwell GPU architecture

Published: 22 January 2016

Volume 19, pages 335–347, (2016)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Wai-Kong Lee ORCID: orcid.org/0000-0003-4659-8979¹,
Hon-Sang Cheong¹,
Raphael C.-W. Phan² &
…
Bok-Min Goi³

657 Accesses
25 Citations
Explore all metrics

Abstract

GPU is widely used in various applications that require huge computational power. In this paper, we contribute to the cryptography and high performance computing research community by presenting techniques to accelerate symmetric block ciphers (AES-128, CAST-128, Camellia, SEED, IDEA, Blowfish and Threefish) in NVIDIA GTX 980 with Maxwell architecture. The proposed techniques consider various aspects of block cipher implementation in GPU, including the placement of encryption keys and T-box in memory, thread block size, cipher operating mode, parallel granularity and data copy between CPU and GPU. We proposed a new method to store the encryption keys in registers with high access speed and exchange it with other threads by using the warp shuffle operation in GPU. The block ciphers implemented in this paper operate in CTR mode, and able to achieve high encryption speed with 149 Gbps (AES-128), 143 Gbps (CAST-128), 124 Gbps (Camelia), 112 Gbps (SEED), 149 Gbps (IDEA), 111 Gbps (Blowfish) and 197 Gbps (Threefish). To the best of our knowledge, this is the first implementation of block ciphers that exploits warp shuffle, an advanced feature in NVIDIA GPU. On the other hand, block ciphers can be used as pseudorandom number generator (PRNG) when it is operating under counter mode (CTR), but the speed is usually slower compare to other PRNG using lighter operations. Hence, we attempt to modify IDEA and Blowfish in order to achieve faster PRNG generation. The modified IDEA and Blowfish manage to pass all NIST Statistical Test and TestU01 SmallCrush except the more stringent tests in TestU01 (Crush and BigCrush).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A high-performance and energy-efficient exhaustive key search approach via GPU on DES-like cryptosystems

Article 09 August 2017

Cryptography Using GPGPU

Kite attack: reshaping the cube attack for a flexible GPU-based maxterm search

Article 27 May 2019

References

Cao, W., Xu, C., Wang, Z., Yao, L., Liu, H.: CPU/GPU computing for a multi-block structured grid based high-order flow solver on a large heterogeneous system. J. Clust. Comput. 17(2), 255270 (2014)
Google Scholar
Islam, M.S., Kim, C., Kim, J.: A GPU-based (8, 4) Hamming decoder for secure transmission of watermarked medical images. J. Clus. Comput. 18(1), 333–341 (2015)
Article Google Scholar
Chopkowski, M., Walkowiak, R.: A general purpose lossless data compression method for GPU. J. Parallel Distrib. Comput. 75, 40–52 (2015)
Article Google Scholar
Osa, G.L.: Fast implementation of two hash algorithm on NVidia CUDA GPU. Master thesis, Norwegian University of Science and Technology (2009)
Bos, J.W., Osvik, D.A., Stefan, D.: Fast implementations of AES on various platforms. In: Software Performance Enhancement for Encryption and Decryption and Cryptographic Compilers (SPEED-CC), pp. 19–34 (2009)
Hu, G., Ma, J., Huang, B.: High throughput implementation of MD5 algorithm on GPU. In: IEEE Proceedings of the 4th International Conference on Ubiquitous Information Technologies & Applications, pp. 1–5 (2009)
Seo, S.C., Kim, T.H., Hong, S.K.: Accelerating elliptic curve scalar multiplication over GF($2^m$)on graphic hardware. J. Parallel Distrib. Comput. 75, 152–167 (2015)
Article Google Scholar
National Institute of Standards and Technology (NIST): FIPS-197: advanced encryption standard. http://www.itl.nist.gov/fipspubs/ (2001). Accessed 1 Sept 2015
Adams, C.: The CAST-128 Encryption Algorithm. RFC 2144 (Informational) (1997)
Lee, H.J., Lee, S.J., Yoon, J.H., Cheon, D.H., Lee, J.I., Korea Information Security Agency: The SEED encryption algorithm. The Internet Engineering Task Force RFC 4269 [online database]. http://www.ietf.org/rfc/rfc4269.txt (2005). Accessed 18 May 2015
Aoki, K., Ichikawa, T., Kanda, M., Matsui, M., Moriai, S., Nakajima, J., Tokita, T.: Specifications of Camellia a 128-bit block cipher. http://info.isl.ntt.co.jp/crypt/eng/camellia/dl/01espec (2001). Accessed 6 May 2015
Lai, X., Massey, J.L.: A proposal for a new block encryption standard. In: EUROCRYPT 1990, pp. 389404 (1990)
Schneier, B.: Description of a new variable-length key, 64-bit block cipher (Blowfish). In Fast Software Encryption, Cambridge Security Workshop Proceedings, pp. 191204. Springer (1993)
Ferguson, N., Lucks, S., Schneier, B., Whiting, D., Bellare, M., Kohno, T., Callas, J., Walker, J.: The Skein Hash Function Family, a SHA-3 candidate (2009)
Transport Layer Security (TLS) Protocol Version 1.2, RFC 5246, 2008
The Secure Sockets Layer (SSL) Protocol Version 3.0, RFC 6101 (1996)
Leetmaa, M., Skorodumova, N.V.: KMCLib 1.1: extended random number support and technical updates to the KMCLib general framework for kinetic Monte-Carlo simulations. Comput. Phys. Commun. 196, 611–613 (2015)
Article Google Scholar
NIST Statistical Test Suite: A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. In: SP800-22, Revision 1a (2010)
LEcuyer, P., Simard, R.: TestU01: a C library for empirical testing of random number generators. ACM Trans. Math. Softw. 33, 22 (2007)
CUDA C Programming Guide V7.0. NVIDIA Corporation. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (2015). Accessed 26 June 2015
Dworkin, M.: Recommendation for block cipher mode of operations. http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a (2001). Accessed 2 June 2015
Biagio, A.D., Barenghi, A., Agosta, G., Pelosi, G.: Design of a parallel AES for graphics hardware using the CUDA framework. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2009)
Mei, C., Jiang, H., Jenness, J.: CUDA-based AES parallelization with fine-tuned GPU memory utilization. In: IEEE International Symposium on Parallel and Distributed Processing, Workshops and Ph.D. Forum (IPDPSW) (2010)
Bos, J.W., Osvik, D.A., Stefan, D., Canright, D.: Fast software AES encryption. In: Proceedings of the 17th International Conference on Fast Software Encryption (2010)
Tran, N.P., Lee, M., Hong, S., Lee, S.J.: Parallel execution of AES-CTR algorithm using extended block size. In: IEEE 14th International Conference on Computational Science and Engineering (2011)
Nishikawa, N., Iwai, K., Kurokawa, T.: High-performance symmetric block ciphers on multicore CPU and GPUs. Int. J. Netw. Comput. 2(2), 251–268 (2012)
Google Scholar
Gilger, J., Barnickel, J., Meyer, U.: GPU-acceleration of block ciphers in the OpenSSL cryptographic library. In: Proceedings of the 15th International Conference on Information Security, pp. 338–353, Springer (2012)
Lee, S., Kim, D., Yi, J., Ro, W.W.: An efficient block cipher implementation on many-core graphics processing unit. J. Inf. Process. Syst. 8(1), 159–174 (2012)
Article Google Scholar
Li, Q., Zhong, C., Zhao, K., Mei, X., Chu, X.: Implementation and analysis of AES encryption on GPU. In: IEEE 14th International Conference on High Performance Computing and Communications, pp. 843–848 (2012)
Zola, W., Bona, L.C.E.: Parallel speculative encryption of multiple AES contexts on GPUs. In: IEEE International Conference on Innovative Parallel Programming, pp. 1–9 (2012)
Nishikawa, N., Iwai, K., Tanaka, H., Kurokawa, T.: Throughput and power efficiency evaluations of block ciphers on Kepler and GCN GPUs using micro-benchmark analysis. IEICE Trans. Inf. Syst. E97–D(6), 1506–1515 (2014)
Article Google Scholar
Mukherjee, R., Rehman, M.S., Kothapalli, K., Narayanan, P.J., Srinathan, K.: Fast, scalable, and secure encryption on the GPU. http://researchweb.iiit.ac.in/~rishabh_m/gpu_crypto (2011). Accessed 2 Aug 2015
Salmon, J.K., Moraes, M.A., Dror, R.O., Shaw, D.E.: Parallel random numbers: as easy as 1, 2, 3. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–12 (2011)
Tuning CUDA Applications for Kepler V7.0. NVIDIA Corporation. http://docs.nvidia.com/cuda/kepler-tuning-guide/#axzz3wKHPco4y (2015). Accessed 21 May 2015

Download references

Acknowledgments

This work was supported partially by Universiti Tunku Abdul Rahman Research Fund (UTARRF) under Grant IPSR/RMC/UTARRF/2012-C2/L04. We would also like to thank the all members in Accelerative Technology Lab, MIMOS, Malaysia for their great support. This research work is also partially supported by Ministry of Science, Technology and Innovation (MOSTI), Malaysia under Grant 01-02-11-SF0202.

Author information

Authors and Affiliations

Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, Kampar, Malaysia
Wai-Kong Lee & Hon-Sang Cheong
Faculty of Engineering, Multimedia University, Cyberjaya, Malaysia
Raphael C.-W. Phan
Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman, Sungai Long, Malaysia
Bok-Min Goi

Authors

Wai-Kong Lee
View author publications
You can also search for this author inPubMed Google Scholar
Hon-Sang Cheong
View author publications
You can also search for this author inPubMed Google Scholar
Raphael C.-W. Phan
View author publications
You can also search for this author inPubMed Google Scholar
Bok-Min Goi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Wai-Kong Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, WK., Cheong, HS., Phan, R.CW. et al. Fast implementation of block ciphers and PRNGs in Maxwell GPU architecture. Cluster Comput 19, 335–347 (2016). https://doi.org/10.1007/s10586-016-0536-2

Download citation

Received: 31 October 2015
Revised: 06 January 2016
Accepted: 07 January 2016
Published: 22 January 2016
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10586-016-0536-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast implementation of block ciphers and PRNGs in Maxwell GPU architecture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A high-performance and energy-efficient exhaustive key search approach via GPU on DES-like cryptosystems

Cryptography Using GPGPU

Kite attack: reshaping the cube attack for a flexible GPU-based maxterm search

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now