Optimal size of the block in block GMRES on GPUs: computational model and experiments

Boman, Erik Gunnar; Higgins, Andrew James; Szyld, Daniel B.

doi:10.1007/s11075-022-01439-z

Title: Optimal size of the block in block GMRES on GPUs: computational model and experiments

Journal Article · Tue Dec 13 00:00:00 EST 2022 · Numerical Algorithms

DOI:https://doi.org/10.1007/s11075-022-01439-z· OSTI ID:2311786

Boman, Erik Gunnar ^[1]; Higgins, Andrew James ^[2];

^[2]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research
Temple Univ., Philadelphia, PA (United States)

The block version of GMRES (BGMRES) is most advantageous over the single right hand side (RHS) counterpart when the cost of communication is high while the cost of floating point operations is not. This is the particular case on modern graphics processing units (GPUs), while it is generally not the case on traditional central processing units (CPUs). Here, in this paper, experiments on both GPUs and CPUs are shown that compare the performance of BGMRES against GMRES as the number of RHS increases, with a particular focus on GPU performance. The experiments indicate that there are many cases in which BGMRES is slower than GMRES on CPUs, but faster on GPUs. Furthermore, when varying the number of RHS on the GPU, there is an optimal number of RHS where BGMRES is clearly most advantageous over GMRES. A computational model for the GPU is developed using hardware specific parameters, providing insight towards how the qualitative behavior of BGMRES changes as the number of RHS increase, and this model also helps explain the phenomena observed in the experiments.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Grant/Contract Number:: NA0003525

OSTI ID:: 2311786

Report Number(s):: SAND-2023-10797J

Journal Information:: Numerical Algorithms, Vol. 92; ISSN 1017-1398

Publisher:: SpringerCopyright Statement

Country of Publication:: United States

Language:: English

References (16)

Iterative Methods for Sparse Linear Systems Saad, Yousef https://doi.org/10.1137/1.9780898718003	book	January 2003
GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems Saad, Youcef; Schultz, Martin H. SIAM Journal on Scientific and Statistical Computing, Vol. 7, Issue 3 https://doi.org/10.1137/0907058	journal	July 1986
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel Journal of Parallel and Distributed Computing, Vol. 74, Issue 12 https://doi.org/10.1016/j.jpdc.2014.07.003	journal	December 2014
On short recurrence Krylov type methods for linear systems with many right-hand sides Rashedi, Somaiyeh; Ebadi, Ghodrat; Birk, Sebastian Journal of Computational and Applied Mathematics, Vol. 300 https://doi.org/10.1016/j.cam.2015.11.040	journal	July 2016
The block conjugate gradient algorithm and related methods O'Leary, Dianne P. Linear Algebra and its Applications, Vol. 29 https://doi.org/10.1016/0024-3795(80)90247-5	journal	February 1980
Updating the QR decomposition of block tridiagonal and block Hessenberg matrices Gutknecht, Martin H.; Schmelzer, Thomas Applied Numerical Mathematics, Vol. 58, Issue 6 https://doi.org/10.1016/j.apnum.2007.04.010	journal	June 2008
PPT-GPU: Scalable GPU Performance Modeling Arafa, Yehia; Badawy, Abdel-Hameed A.; Chennupati, Gopinath IEEE Computer Architecture Letters, Vol. 18, Issue 1 https://doi.org/10.1109/LCA.2019.2904497	journal	January 2019
An updated set of basic linear algebra subprograms (BLAS) Blackford, L. Susan; Petitet, Antoine; Pozo, Roldan ACM Transactions on Mathematical Software, Vol. 28, Issue 2 https://doi.org/10.1145/567806.567807	journal	June 2002
The Stability of Block Variants of Classical Gram--Schmidt Carson, Erin; Lund, Kathryn; Rozložník, Miroslav SIAM Journal on Matrix Analysis and Applications, Vol. 42, Issue 3 https://doi.org/10.1137/21M1394424	journal	January 2021
Block Krylov Subspace Recycling for Shifted Systems with Unrelated Right-Hand Sides Soodhalter, Kirk M. SIAM Journal on Scientific Computing, Vol. 38, Issue 1 https://doi.org/10.1137/140998214	journal	January 2016
OpenMP: an industry standard API for shared-memory programming Dagum, L.; Menon, R. IEEE Computational Science and Engineering, Vol. 5, Issue 1 https://doi.org/10.1109/99.660313	journal	January 1998
Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives Yamazaki, Ichitaro; Hoemmen, Mark; Luszczek, Piotr 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2017.65	conference	May 2017
An Iterative Method for Nonsymmetric Systems with Multiple Right-Hand Sides Simoncini, V.; Gallopoulos, E. SIAM Journal on Scientific Computing, Vol. 16, Issue 4 https://doi.org/10.1137/0916053	journal	July 1995
Convergence properties of block GMRES and matrix polynomials Simoncini, V.; Gallopoulos, E. Linear Algebra and its Applications, Vol. 247 https://doi.org/10.1016/0024-3795(95)00093-3	journal	November 1996
The university of Florida sparse matrix collection Davis, Timothy A.; Hu, Yifan ACM Transactions on Mathematical Software, Vol. 38, Issue 1 https://doi.org/10.1145/2049662.2049663	journal	November 2011
A set of level 3 basic linear algebra subprograms Dongarra, J. J.; Du Croz, Jeremy; Hammarling, Sven ACM Transactions on Mathematical Software, Vol. 16, Issue 1 https://doi.org/10.1145/77626.79170	journal	March 1990

Similar Records

Accelerating solidification process simulation for large-sized system of liquid metal atoms using GPU with CUDA

Journal Article · Wed Jan 15 00:00:00 EST 2014 · Journal of Computational Physics · OSTI ID:2311786

Jie, Liang; Li, KenLi; Shi, Lin; +2 more

Porting the WAVEWATCH III (v6.07) wave action source terms to GPU

Journal Article · Fri Mar 03 00:00:00 EST 2023 · Geoscientific Model Development (Online) · OSTI ID:2311786

Ikuyajolu, Olawale James; Van Roekel, Luke; Brus, Steven R.; +3 more

An efficient mixed-precision, hybrid CPU-GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm

Journal Article · Sun Jan 01 00:00:00 EST 2012 · Journal of Computational Physics · OSTI ID:2311786

Chen, Guangye; Chacon, Luis; Barnes, Daniel C

Related Subjects

97 MATHEMATICS AND COMPUTING
Krylov subspace methods
GMRES
Multiple right hand sides
Block GMRES
GPUs

Title: Optimal size of the block in block GMRES on GPUs: computational model and experiments

Citation Formats

References (16)

Similar Records

Related Subjects