A GPU Parallel Implementation of the RSA Private Operation

Cruz-Cortés, Nareli; Ochoa-Jiménez, Eduardo; Rivera-Zamarripa, Luis; Rodríguez-Henríquez, Francisco

doi:10.1007/978-3-319-57972-6_14

Nareli Cruz-Cortés¹³,
Eduardo Ochoa-Jiménez¹⁴,
Luis Rivera-Zamarripa¹³ &
…
Francisco Rodríguez-Henríquez¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 697))

Included in the following conference series:

Latin American High Performance Computing Conference

975 Accesses
2 Citations

Abstract

The implementation of the RSA private operation tends to be expensive since its computationally complexity is cubic with respect to the bit-size of its private key. As a consequence, considerable effort has been put into optimizing this operation. In this work, we present a parallel implementation of the RSA private operation using the Single Instruction Multiple Thread (SIMT) threading model of Graphics Processor Unit (GPU) platforms. The underlying modular arithmetic is performed by means of the Residue Number System (RNS) representation. By combining these two approaches, we present a GPU software library that achieves high-speed timings for the RSA private operation when using 1024-, 2048- and 3072-bit secret keys.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A thread divergence occurs when the threads do not execute the same instruction at the same time. Thread divergence is an important limiting factor in the exploitation of the parallelism of a program and therefore it must be avoided as much as possible.
2.
Henceforth, we are assuming that the cost of one integer squaring is the same of an integer multiplication.
3.
We stress that the fixed-window method requires the precomputation of up to \(2^w\) values.

References

Bajard, J.C., Didier, L.S., Kornerup, P.: An RNS montgomery modular multiplication algorithm. IEEE Trans. Comput. 47(7), 766–776 (1998). http://dx.doi.org/10.1109/12.709376
Article MathSciNet Google Scholar
Bajard, J., Imbert, L.: A full RNS implementation of RSA. IEEE Trans. Comput. 53(6), 769–774 (2004)
Article Google Scholar
Barker, E.: Recommendation for key management, NIST special publication 800–57 part 1 revision 4. Technical report, Gaithersburg, MD, United States, January 2016. http://nvlpubs.nist.gov/nistpubsSpecialPublications/NIST.Spp.800-57pt1r4.pdf
Bernstein, D.J.: Multidigit modular multiplication with the explicit Chinese remainder theorem. Technical report (1995)
Google Scholar
Dierks, T., Rescorla, E.: The Transport Layer Security (TLS) protocol version 1.2, RFC 5246. Network Working Group, IETF (2008). https://tools.ietf.org/html/rfc5246#section-8.1.1
Fadhil, H.M., Younis, M.I.: Parallelizing RSA algorithm on multicore CPU and GPU. Int. J. Comput. Appl. 87(6), 15–22 (2014)
Google Scholar
Harris, M.: Optimizing parallel reduction in CUDA. Technical report, nVidia (2008). http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf
Jang, K., Han, S., Han, S., Moon, S., Park, K.: SSLShader: cheap SSL acceleration with commodity processors. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI 2011, pp. 1–14. USENIX Association, Berkeley (2011)
Google Scholar
Jeljeli, H.: Accélérateurs logiciels et matériels pour l’algèbre linéaire creuse sur les corps finis. Ph.D. thesis, Inria Nancy - Grand Est, LORIA - ALGO - Department of Algorithms, Computation, Image and Geometry, July 2015. https://hal.inria.fr/tel-01178931
Jeljeli, H.: Accelerating iterative SpMV for the discrete logarithm problem using GPUs. In: Koç, Ç.K., Mesnager, S., Savaş, E. (eds.) WAIFI 2014. LNCS, vol. 9061, pp. 25–44. Springer, Cham (2015). doi:10.1007/978-3-319-16277-5_2
Google Scholar
Moss, A., Page, D., Smart, N.P.: Toward acceleration of RSA using 3D graphics hardware. In: Galbraith, S.D. (ed.) Cryptography and Coding 2007. LNCS, vol. 4887, pp. 364–383. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77272-9_22
Chapter Google Scholar
Neves, S., Araujo, F.: On the performance of GPU public-key cryptography. In: 2011 IEEE Proceedings of the 22nd International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2011, Santa Monica, CA, USA, pp. 133–140 (2011)
Google Scholar
nVidia: Parallel thread execution ISA v5.0, application guide. Technical report, September 2016. http://docs.nvidia.com/cuda/pdf/ptx_isa_5.0.pdf
Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Article MathSciNet MATH Google Scholar
Szerwinski, R., Güneysu, T.: Exploiting the power of GPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85053-3_6
Chapter Google Scholar
Xiao, S., Feng, W-C.: Inter-block GPU communication via fast barrier synchronization. In: 2010 IEEE Proceedings of the International Symposium on Parallel Distributed Processing, IPDPS 2010, Atlanta, GA, pp. 1–12 (2010)
Google Scholar
Yang, Y., Guan, Z., Sun, H., Chen, Z.: Accelerating RSA with fine-grained parallelism using GPU. In: Lopez, J., Wu, Y. (eds.) ISPEC 2015. LNCS, vol. 9065, pp. 454–468. Springer, Cham (2015). doi:10.1007/978-3-319-17533-1_31
Chapter Google Scholar
Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the floating-point computing power of GPUs for RSA. In: Chow, S.S.M., Camenisch, J., Hui, L.C.K., Yiu, S.M. (eds.) ISC 2014. LNCS, vol. 8783, pp. 198–215. Springer, Cham (2014). doi:10.1007/978-3-319-13257-0_12
Google Scholar

Download references

Author information

Authors and Affiliations

Centro de Investigación en Computación del Instituto Politécnico Nacional, Mexico City, Mexico
Nareli Cruz-Cortés & Luis Rivera-Zamarripa
Computer Science Department, CINVESTAV, Mexico City, Mexico
Eduardo Ochoa-Jiménez & Francisco Rodríguez-Henríquez

Authors

Nareli Cruz-Cortés
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Ochoa-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
Luis Rivera-Zamarripa
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Rodríguez-Henríquez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nareli Cruz-Cortés .

Editor information

Editors and Affiliations

Universidad Industrial de Santander, Bucaramanga, Colombia
Carlos Jaime Barrios Hernández
Centro de Investigación y de Estudios Avanzados, CINVESTAV-IPN, Ciudad de México, Mexico
Isidoro Gitler
Instituto Nacional de Investigaciones Nucleares, La Marquesa, Estado de México, Mexico
Jaime Klapp

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cruz-Cortés, N., Ochoa-Jiménez, E., Rivera-Zamarripa, L., Rodríguez-Henríquez, F. (2017). A GPU Parallel Implementation of the RSA Private Operation. In: Barrios Hernández, C., Gitler, I., Klapp, J. (eds) High Performance Computing. CARLA 2016. Communications in Computer and Information Science, vol 697. Springer, Cham. https://doi.org/10.1007/978-3-319-57972-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-57972-6_14
Published: 29 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57971-9
Online ISBN: 978-3-319-57972-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics