Communication Overlapping Pipelined Conjugate Gradients for Distributed Memory Systems and Heterogeneous Architectures

Tiwari, Manasi; Vadhiyar, Sathish

doi:10.1007/978-3-031-06156-1_45

Manasi Tiwari¹⁸ &
Sathish Vadhiyar¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13098))

Included in the following conference series:

European Conference on Parallel Processing

679 Accesses

Abstract

Preconditioned Conjugate Gradient (PCG) method has been one of the widely used methods for solving linear systems of equations for sparse problems. Pipelined PCG (PIPECG) attempts to eliminate the dependencies in the computations in the PCG algorithm and overlap non-dependent computations by reorganizing the traditional PCG code and using non-blocking allreduces . We have developed a novel pipelined PCG algorithm called PIPECG-OATI (One Allreduce per Two Iterations) which reduces the number of non-blocking allreduces to one per two iterations and provides large overlap of global communication and computations at higher number of cores in distributed memory CPU systems. PIPECG-OATI gives up to 3\(\times \) speedup over PCG and 1.73\(\times \) speedup over PIPECG at large number of cores.

For GPU accelerated heterogeneous architectures, we have developed three methods for efficient execution of the PIPECG algorithm. These methods achieve task and data parallelism. Our methods give considerable performance improvements over PCG CPU and GPU implementations of Paralution and PETSc libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available as KSPPIPECG2. URL: https://www.mcs.anl.gov/petsc/petsc-master/docs/manualpages/KSP/KSPPIPECG2.html.

References

Balay, S., Gropp, W.D., McInnes, L.C., Smith, B.F.: Efficient management of parallelism in object oriented numerical software libraries. In: Modern Software Tools in Scientific Computing, pp. 163–202. Birkhäuser Press (1997)
Google Scholar
Ghysels, P., Vanroose, W.: Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Comput. 40(7), 224–238 (2014)
Article MathSciNet Google Scholar
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stan. 49, 409–436 (1952)
Article MathSciNet Google Scholar
Labs, P.: Paralution v1.1.0 (2020). http://www.paralution.com/
Tiwari, M., Vadhiyar, S.: Pipelined preconditioned conjugate gradient methods for distributed memory systems. In: 27th IEEE International Conference on High Performance Computing, Data, and Analytics 2020
Google Scholar
Tiwari, M., Vadhiyar, S.: Efficient executions of pipelined conjugate gradient method on heterogeneous architectures (2021). arXiv.org

Download references

Author information

Authors and Affiliations

Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
Manasi Tiwari & Sathish Vadhiyar

Authors

Manasi Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Sathish Vadhiyar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manasi Tiwari .

Editor information

Editors and Affiliations

University of Lisbon, Lisbon, Portugal
Ricardo Chaves
Department of Computer Engineering, CiTIUS, University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Dora B. Heras
University of Lisbon, Lisbon, Portugal
Aleksandar Ilic
Koç University, Istanbul, Turkey
Didem Unat
Barcelona Supercomputing Center, Barcelona, Spain
Rosa M. Badia
University of Stirling, Stirling, UK
Andrea Bracciali
Louisiana State University, Baton Rouge, USA
Patrick Diehl
Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA
Anshu Dubey
Ajou University, Suwon, Korea (Republic of)
Oh Sangyoon
Tennessee Technological University, Cookeville, TN, USA
Stephen L. Scott
University of Pisa, Pisa, Italy
Laura Ricci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tiwari, M., Vadhiyar, S. (2022). Communication Overlapping Pipelined Conjugate Gradients for Distributed Memory Systems and Heterogeneous Architectures. In: Chaves, R., et al. Euro-Par 2021: Parallel Processing Workshops. Euro-Par 2021. Lecture Notes in Computer Science, vol 13098. Springer, Cham. https://doi.org/10.1007/978-3-031-06156-1_45

Download citation

DOI: https://doi.org/10.1007/978-3-031-06156-1_45
Published: 09 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06155-4
Online ISBN: 978-3-031-06156-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Communication Overlapping Pipelined Conjugate Gradients for Distributed Memory Systems and Heterogeneous Architectures