skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimal size of the block in block GMRES on GPUs: computational model and experiments

Journal Article · · Numerical Algorithms
 [1];  [2]; ORCiD logo [2]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research
  2. Temple Univ., Philadelphia, PA (United States)

The block version of GMRES (BGMRES) is most advantageous over the single right hand side (RHS) counterpart when the cost of communication is high while the cost of floating point operations is not. This is the particular case on modern graphics processing units (GPUs), while it is generally not the case on traditional central processing units (CPUs). Here, in this paper, experiments on both GPUs and CPUs are shown that compare the performance of BGMRES against GMRES as the number of RHS increases, with a particular focus on GPU performance. The experiments indicate that there are many cases in which BGMRES is slower than GMRES on CPUs, but faster on GPUs. Furthermore, when varying the number of RHS on the GPU, there is an optimal number of RHS where BGMRES is clearly most advantageous over GMRES. A computational model for the GPU is developed using hardware specific parameters, providing insight towards how the qualitative behavior of BGMRES changes as the number of RHS increase, and this model also helps explain the phenomena observed in the experiments.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
NA0003525
OSTI ID:
2311786
Report Number(s):
SAND-2023-10797J
Journal Information:
Numerical Algorithms, Vol. 92; ISSN 1017-1398
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (16)

Iterative Methods for Sparse Linear Systems book January 2003
GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems journal July 1986
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns journal December 2014
On short recurrence Krylov type methods for linear systems with many right-hand sides journal July 2016
The block conjugate gradient algorithm and related methods journal February 1980
Updating the QR decomposition of block tridiagonal and block Hessenberg matrices journal June 2008
PPT-GPU: Scalable GPU Performance Modeling journal January 2019
An updated set of basic linear algebra subprograms (BLAS) journal June 2002
The Stability of Block Variants of Classical Gram--Schmidt journal January 2021
Block Krylov Subspace Recycling for Shifted Systems with Unrelated Right-Hand Sides journal January 2016
OpenMP: an industry standard API for shared-memory programming journal January 1998
Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives conference May 2017
An Iterative Method for Nonsymmetric Systems with Multiple Right-Hand Sides journal July 1995
Convergence properties of block GMRES and matrix polynomials journal November 1996
The university of Florida sparse matrix collection journal November 2011
A set of level 3 basic linear algebra subprograms journal March 1990

Similar Records

Accelerating solidification process simulation for large-sized system of liquid metal atoms using GPU with CUDA
Journal Article · Wed Jan 15 00:00:00 EST 2014 · Journal of Computational Physics · OSTI ID:2311786

Porting the WAVEWATCH III (v6.07) wave action source terms to GPU
Journal Article · Fri Mar 03 00:00:00 EST 2023 · Geoscientific Model Development (Online) · OSTI ID:2311786

An efficient mixed-precision, hybrid CPU-GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm
Journal Article · Sun Jan 01 00:00:00 EST 2012 · Journal of Computational Physics · OSTI ID:2311786