Abstract
The implementation of the RSA private operation tends to be expensive since its computationally complexity is cubic with respect to the bit-size of its private key. As a consequence, considerable effort has been put into optimizing this operation. In this work, we present a parallel implementation of the RSA private operation using the Single Instruction Multiple Thread (SIMT) threading model of Graphics Processor Unit (GPU) platforms. The underlying modular arithmetic is performed by means of the Residue Number System (RNS) representation. By combining these two approaches, we present a GPU software library that achieves high-speed timings for the RSA private operation when using 1024-, 2048- and 3072-bit secret keys.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A thread divergence occurs when the threads do not execute the same instruction at the same time. Thread divergence is an important limiting factor in the exploitation of the parallelism of a program and therefore it must be avoided as much as possible.
- 2.
Henceforth, we are assuming that the cost of one integer squaring is the same of an integer multiplication.
- 3.
We stress that the fixed-window method requires the precomputation of up to \(2^w\) values.
References
Bajard, J.C., Didier, L.S., Kornerup, P.: An RNS montgomery modular multiplication algorithm. IEEE Trans. Comput. 47(7), 766–776 (1998). http://dx.doi.org/10.1109/12.709376
Bajard, J., Imbert, L.: A full RNS implementation of RSA. IEEE Trans. Comput. 53(6), 769–774 (2004)
Barker, E.: Recommendation for key management, NIST special publication 800–57 part 1 revision 4. Technical report, Gaithersburg, MD, United States, January 2016. http://nvlpubs.nist.gov/nistpubsSpecialPublications/NIST.Spp.800-57pt1r4.pdf
Bernstein, D.J.: Multidigit modular multiplication with the explicit Chinese remainder theorem. Technical report (1995)
Dierks, T., Rescorla, E.: The Transport Layer Security (TLS) protocol version 1.2, RFC 5246. Network Working Group, IETF (2008). https://tools.ietf.org/html/rfc5246#section-8.1.1
Fadhil, H.M., Younis, M.I.: Parallelizing RSA algorithm on multicore CPU and GPU. Int. J. Comput. Appl. 87(6), 15–22 (2014)
Harris, M.: Optimizing parallel reduction in CUDA. Technical report, nVidia (2008). http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf
Jang, K., Han, S., Han, S., Moon, S., Park, K.: SSLShader: cheap SSL acceleration with commodity processors. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI 2011, pp. 1–14. USENIX Association, Berkeley (2011)
Jeljeli, H.: Accélérateurs logiciels et matériels pour l’algèbre linéaire creuse sur les corps finis. Ph.D. thesis, Inria Nancy - Grand Est, LORIA - ALGO - Department of Algorithms, Computation, Image and Geometry, July 2015. https://hal.inria.fr/tel-01178931
Jeljeli, H.: Accelerating iterative SpMV for the discrete logarithm problem using GPUs. In: Koç, Ç.K., Mesnager, S., Savaş, E. (eds.) WAIFI 2014. LNCS, vol. 9061, pp. 25–44. Springer, Cham (2015). doi:10.1007/978-3-319-16277-5_2
Moss, A., Page, D., Smart, N.P.: Toward acceleration of RSA using 3D graphics hardware. In: Galbraith, S.D. (ed.) Cryptography and Coding 2007. LNCS, vol. 4887, pp. 364–383. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77272-9_22
Neves, S., Araujo, F.: On the performance of GPU public-key cryptography. In: 2011 IEEE Proceedings of the 22nd International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2011, Santa Monica, CA, USA, pp. 133–140 (2011)
nVidia: Parallel thread execution ISA v5.0, application guide. Technical report, September 2016. http://docs.nvidia.com/cuda/pdf/ptx_isa_5.0.pdf
Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Szerwinski, R., Güneysu, T.: Exploiting the power of GPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85053-3_6
Xiao, S., Feng, W-C.: Inter-block GPU communication via fast barrier synchronization. In: 2010 IEEE Proceedings of the International Symposium on Parallel Distributed Processing, IPDPS 2010, Atlanta, GA, pp. 1–12 (2010)
Yang, Y., Guan, Z., Sun, H., Chen, Z.: Accelerating RSA with fine-grained parallelism using GPU. In: Lopez, J., Wu, Y. (eds.) ISPEC 2015. LNCS, vol. 9065, pp. 454–468. Springer, Cham (2015). doi:10.1007/978-3-319-17533-1_31
Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the floating-point computing power of GPUs for RSA. In: Chow, S.S.M., Camenisch, J., Hui, L.C.K., Yiu, S.M. (eds.) ISC 2014. LNCS, vol. 8783, pp. 198–215. Springer, Cham (2014). doi:10.1007/978-3-319-13257-0_12
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cruz-Cortés, N., Ochoa-Jiménez, E., Rivera-Zamarripa, L., Rodríguez-Henríquez, F. (2017). A GPU Parallel Implementation of the RSA Private Operation. In: Barrios Hernández, C., Gitler, I., Klapp, J. (eds) High Performance Computing. CARLA 2016. Communications in Computer and Information Science, vol 697. Springer, Cham. https://doi.org/10.1007/978-3-319-57972-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-57972-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57971-9
Online ISBN: 978-3-319-57972-6
eBook Packages: Computer ScienceComputer Science (R0)