Skip to main content

Strategies for Efficient Execution of Pipelined Conjugate Gradient Method on GPU Systems

  • Conference paper
  • First Online:
High Performance Computing. ISC High Performance 2022 International Workshops (ISC High Performance 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13387))

Included in the following conference series:

Abstract

The Preconditioned Conjugate Gradient (PCG) method is widely used for solving linear systems of equations with sparse matrices. A recent version of PCG, Pipelined PCG (PIPECG), eliminates the dependencies in the computations of the PCG algorithm so that the non-dependent computations can be overlapped with communication. In this paper, we develop three methods for efficient execution of the Pipelined PCG algorithm on GPU accelerated heterogeneous architectures. The first two methods achieve task-parallelism using asynchronous executions of different tasks on multi-core CPU and a GPU. The third method achieves data parallelism by decomposing the workload between multi-core CPU and GPU based on a performance model. We performed experiments on both the K40 and V100 GPU systems and our methods give up to 8x speedup and on average 3x speedup over PCG CPU implementation of Paralution and PETSc libraries. They also give up to 5x speedup and on average 1.45x speedup over PCG GPU implementation of Paralution and PETSc libraries. The third method also provides an efficient solution for solving problems that cannot be fit into the GPU memory and gives up to 6.8x speedup for such problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Suitesparse matrix collection (2020). https://sparse.tamu.edu/

  2. Balay, S., Gropp, W.D., McInnes, L.C., Smith, B.F.: Efficient management of parallelism in object oriented numerical software libraries. In: Modern Software Tools in Scientific Computing, pp. 163–202. Birkhäuser Press (1997)

    Google Scholar 

  3. Bell, N., Dalton, S., Olson, L.N.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012)

    Google Scholar 

  4. Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA (2008)

    Google Scholar 

  5. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC 2009, pp. 1–1 (2009)

    Google Scholar 

  6. Ghysels, P., Vanroose, W.: Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Comput. 40(7), 224–238 (2014)

    Google Scholar 

  7. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bureau Stand. 49, 409–436 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  8. Labs, P.: Paralution v1.1.0 (2020). http://www.paralution.com/

  9. Li, R., Saad, Y.: GPU-accelerated preconditioned iterative linear solvers. J. Supercomput. 63(2), 443–466 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manasi Tiwari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tiwari, M., Vadhiyar, S. (2022). Strategies for Efficient Execution of Pipelined Conjugate Gradient Method on GPU Systems. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds) High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham. https://doi.org/10.1007/978-3-031-23220-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23220-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23219-0

  • Online ISBN: 978-3-031-23220-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics