Strategies for Efficient Execution of Pipelined Conjugate Gradient Method on GPU Systems

Tiwari, Manasi; Vadhiyar, Sathish

doi:10.1007/978-3-031-23220-6_6

Manasi Tiwari¹¹ &
Sathish Vadhiyar¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13387))

Included in the following conference series:

International Conference on High Performance Computing

735 Accesses
1 Citations

Abstract

The Preconditioned Conjugate Gradient (PCG) method is widely used for solving linear systems of equations with sparse matrices. A recent version of PCG, Pipelined PCG (PIPECG), eliminates the dependencies in the computations of the PCG algorithm so that the non-dependent computations can be overlapped with communication. In this paper, we develop three methods for efficient execution of the Pipelined PCG algorithm on GPU accelerated heterogeneous architectures. The first two methods achieve task-parallelism using asynchronous executions of different tasks on multi-core CPU and a GPU. The third method achieves data parallelism by decomposing the workload between multi-core CPU and GPU based on a performance model. We performed experiments on both the K40 and V100 GPU systems and our methods give up to 8x speedup and on average 3x speedup over PCG CPU implementation of Paralution and PETSc libraries. They also give up to 5x speedup and on average 1.45x speedup over PCG GPU implementation of Paralution and PETSc libraries. The third method also provides an efficient solution for solving problems that cannot be fit into the GPU memory and gives up to 6.8x speedup for such problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Suitesparse matrix collection (2020). https://sparse.tamu.edu/
Balay, S., Gropp, W.D., McInnes, L.C., Smith, B.F.: Efficient management of parallelism in object oriented numerical software libraries. In: Modern Software Tools in Scientific Computing, pp. 163–202. Birkhäuser Press (1997)
Google Scholar
Bell, N., Dalton, S., Olson, L.N.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012)
Google Scholar
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA (2008)
Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC 2009, pp. 1–1 (2009)
Google Scholar
Ghysels, P., Vanroose, W.: Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Comput. 40(7), 224–238 (2014)
Google Scholar
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bureau Stand. 49, 409–436 (1952)
Article MathSciNet MATH Google Scholar
Labs, P.: Paralution v1.1.0 (2020). http://www.paralution.com/
Li, R., Saad, Y.: GPU-accelerated preconditioned iterative linear solvers. J. Supercomput. 63(2), 443–466 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
Manasi Tiwari & Sathish Vadhiyar

Authors

Manasi Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Sathish Vadhiyar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manasi Tiwari .

Editor information

Editors and Affiliations

University of Tennessee, Knoxville, TN, USA
Hartwig Anzt
University of New Mexico, Albuquerque, NM, USA
Amanda Bienz
University of Tennessee, Knoxville, TN, USA
Piotr Luszczek
Université Paris-Saclay, Orsay, France
Marc Baboulin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tiwari, M., Vadhiyar, S. (2022). Strategies for Efficient Execution of Pipelined Conjugate Gradient Method on GPU Systems. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds) High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham. https://doi.org/10.1007/978-3-031-23220-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-23220-6_6
Published: 04 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23219-0
Online ISBN: 978-3-031-23220-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Strategies for Efficient Execution of Pipelined Conjugate Gradient Method on GPU Systems