Abstract
The Preconditioned Conjugate Gradient (PCG) method is widely used for solving linear systems of equations with sparse matrices. A recent version of PCG, Pipelined PCG (PIPECG), eliminates the dependencies in the computations of the PCG algorithm so that the non-dependent computations can be overlapped with communication. In this paper, we develop three methods for efficient execution of the Pipelined PCG algorithm on GPU accelerated heterogeneous architectures. The first two methods achieve task-parallelism using asynchronous executions of different tasks on multi-core CPU and a GPU. The third method achieves data parallelism by decomposing the workload between multi-core CPU and GPU based on a performance model. We performed experiments on both the K40 and V100 GPU systems and our methods give up to 8x speedup and on average 3x speedup over PCG CPU implementation of Paralution and PETSc libraries. They also give up to 5x speedup and on average 1.45x speedup over PCG GPU implementation of Paralution and PETSc libraries. The third method also provides an efficient solution for solving problems that cannot be fit into the GPU memory and gives up to 6.8x speedup for such problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Suitesparse matrix collection (2020). https://sparse.tamu.edu/
Balay, S., Gropp, W.D., McInnes, L.C., Smith, B.F.: Efficient management of parallelism in object oriented numerical software libraries. In: Modern Software Tools in Scientific Computing, pp. 163–202. Birkhäuser Press (1997)
Bell, N., Dalton, S., Olson, L.N.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012)
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA (2008)
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC 2009, pp. 1–1 (2009)
Ghysels, P., Vanroose, W.: Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Comput. 40(7), 224–238 (2014)
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bureau Stand. 49, 409–436 (1952)
Labs, P.: Paralution v1.1.0 (2020). http://www.paralution.com/
Li, R., Saad, Y.: GPU-accelerated preconditioned iterative linear solvers. J. Supercomput. 63(2), 443–466 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Tiwari, M., Vadhiyar, S. (2022). Strategies for Efficient Execution of Pipelined Conjugate Gradient Method on GPU Systems. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds) High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham. https://doi.org/10.1007/978-3-031-23220-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-23220-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23219-0
Online ISBN: 978-3-031-23220-6
eBook Packages: Computer ScienceComputer Science (R0)