Abstract
GPU based acceleration of computations with dense matrices and blocks over large prime finite field are studied. Particular attention is paid to the following algorithms:
-
multiplication of rectangular \(N \times K\) blocks with \(N \gg K;\)
-
multiplication of \(N \times K\) blocks by square \(K \times K\) matrices;
-
LU-decomposition of matrices.
Several approaches for optimal use of GPU resources are proposed.
Efficiency analysis of implemented algorithms is provided for prime finite field with number of elements about \(2^{512},\) \(2^{768},\) \(2^{1024}\) and GPUs of different computational performance and architecture generations. Numerical experiments prove efficiency of proposed solutions.
From numerical results it follows that GPU usage allows to accelerate block operations and to expand area of almost linear parallel scalability of Lanczos method implementation by INM RAS. Moreover, a sparse system of size about 2 millions, with 82 average nonzero elements per row, over field with about \(2^{512}\) elements, on 128 nodes of Lomonosov supercomputer will be solved 2 times faster in case of GPUs used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kleinjung, T., Aoki, K., Franke, J., Lenstra, A.K., Thomé, E., Bos, J.W., Gaudry, P., Kruppa, A., Montgomery, P.L., Osvik, D.A., te Riele, H., Timofeev, A., Zimmermann, P.: Factorization of a 768-Bit RSA modulus. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol. 6223, pp. 333–350. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14623-7_18
Thome, E., et al.: Factorization of RSA-704 with CADO-NFS. Preprint, pp. 1–4 (2012)
Dorofeev, A.Ya.: Vychislenie logarifmov v konechnom prostom pole metodom lineinogo resheta. [Computation of logarithms over finite prime fields using number sieving]. Trudy po diskretnoi matematike, vol. 5. pp. 29–50 (2002)
Dorofeev, A.Y.: Solving systems of linear equations arising in the computation of logarithms in a finite prime field. Math. Aspects Crypt. 3(1), 551 (2012). Russian
Popovyan, I.A., Nestrenko, Y.V., Grechnikov, E.A.: Vychislitelno slozhnye zadachi teorii chisel. Uchebnoe posobie [Computationally hard problems of number theory. Study guide] Publishing of the Lomonosov Moscow State University (2012)
Zamarashkin, N., Zheltkov, D.: Block Lanczos–Montgomery method with reduced data exchanges. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2016. CCIS, vol. 687, pp. 15–26. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-55669-7_2
Zamarashkin, N.L.: Algoritmy dlya razrezhennykh sistem lineinykh uravneniy v GF(2). Uchebnoe posobie [Algorithms for systems of linear equations over GF(2). Study guide]. Publishing of the Lomonosov Moscow State University (2013)
Efficient basic linear algebra operations for solution of large sparse linear systems over finite fields. Russian Supercomputing Days (2016)
Nath, R., Tomov, S., Dongarra, J.: An improved MAGMA GEMM for Fermi graphics processing units. Int. J. High Perform. Comput. Appl. 24(4), 511–515 (2010)
Nvidia Corporation, CUDA C. Programming guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide
Acknowledgments
The work was supported by the Russian Science Foundation, grant 14-11-00806.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zamarashkin, N., Zheltkov, D. (2017). GPU Acceleration of Dense Matrix and Block Operations for Lanczos Method for Systems over Large Prime Finite Field. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2017. Communications in Computer and Information Science, vol 793. Springer, Cham. https://doi.org/10.1007/978-3-319-71255-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-71255-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71254-3
Online ISBN: 978-3-319-71255-0
eBook Packages: Computer ScienceComputer Science (R0)