research-article

High-performance sparse matrix-vector multiplication on GPUs for structured grid computations

Authors:

Justin Holewinski,

P. SadayappanAuthors Info & Claims

GPGPU-5: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units

Pages 47 - 56

https://doi.org/10.1145/2159430.2159436

Published: 03 March 2012 Publication History

Abstract

In this paper, we address efficient sparse matrix-vector multiplication for matrices arising from structured grid problems with high degrees of freedom at each grid node. Sparse matrix-vector multiplication is a critical step in the iterative solution of sparse linear systems of equations arising in the solution of partial differential equations using uniform grids for discretization. With uniform grids, the resulting linear system Ax = b has a matrix A that is sparse with a very regular structure. The specific focus of this paper is on sparse matrices that have a block structure due to the large number of unknowns at each grid point. Sparse matrix storage formats such as Compressed Sparse Row (CSR) and Diagonal format (DIA) are not the most effective for such matrices.

In this work, we present a new sparse matrix storage format that takes advantage of the diagonal structure of matrices for stencil operations on structured grids. Unlike other formats such as the Diagonal storage format (DIA), we specifically optimize for the case of higher degrees of freedom, where formats such as DIA are forced to explicitly represent many zero elements in the sparse matrix. We develop efficient sparse matrix-vector multiplication for structured grid computations on GPU architectures using CUDA [25].

References

[1]

Badcock, K., Richards, B., and Woodgate, M. Elements of computational fluid dynamics on block structured grids using implicit solvers. Progress in Aerospace Sciences 36, 5--6 (2000), 351--392.

[2]

Balay, S., Brown, J., Buschelman, K., Eijkhout, V., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., and Zhang, H. PETSc users manual. Tech. Rep. ANL-95/11 - Revision 3.2, Argonne National Laboratory, 2011.

[3]

Baskaran, M. M., and Bordawekar, R. Optimizing sparse matrix-vector multiplication on GPUs. Tech. Rep. RC24704 (W0812--047), IBM T. J. Watson Research Center, April 2009.

[4]

Bell, N., and Garland, M. Efficient sparse matrix-vector multiplication on CUDA. Tech. Rep. NVR-2008-004, NVIDIA Corporation, December 2008.

[5]

Bell, N., and Garland, M. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (New York, NY, USA, 2009), SC '09, ACM, pp. 18:1--18:11.

Digital Library

[6]

Bell, N., and Garland, M. Cusp: Generic parallel algorithms for sparse matrix and graph computations, 2010. Version 0.1.0.

[7]

Bolz, J., Farmer, I., Grinspun, E., and Schröoder, P. Sparse matrix solvers on the gpu: conjugate gradients and multigrid. In ACM SIGGRAPH 2003 Papers (New York, NY, USA, 2003), SIGGRAPH '03, ACM, pp. 917--924.

Digital Library

[8]

Choi, J. W., Singh, A., and Vuduc, R. W. Model-driven autotuning of sparse matrix-vector multiply on gpus. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New York, NY, USA, 2010), PPoPP '10, ACM, pp. 115--126.

Digital Library

[9]

D'Azevedo, E., Fahey, M., and Mills, R. Vectorized sparse matrix multiply for compressed row storage format. In Computational Science - ICCS 2005, V. Sunderam, G. van Albada, P. Sloot, and J. Dongarra, Eds., vol. 3514 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2005, pp. 785--789. 10.1007/11428831_13.

Digital Library

[10]

Ekambaram, A., and Montagne, E. An alternative compressed storage format for sparse matrices. In Computer and Information Sciences - ISCIS 2003, A. Yazici and C. Sener, Eds., vol. 2869 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2003, pp. 196--203. 10.1007/978-3-540-39737-3_25.

[11]

Garland, M. Sparse matrix computations on manycore gpu's. In Proceedings of the 45th annual Design Automation Conference (New York, NY, USA, 2008), DAC '08, ACM, pp. 2--6.

Digital Library

[12]

Geveler, M., Ribbrock, D., Göddeke, D., Peter, Z., and Stefan, T. Efficient finite element geometric multigrid solvers for unstructured grids on GPUs. PARENG (April 2011).

[13]

Geveler, M., Ribbrock, D., Göddeke, D., Peter, Z., and Stefan, T. Towards a complete FEM-based simulation toolkit on GPUs: Geometric multigrid solvers. ParCFD (May 2011).

[14]

Goodale, T., Allen, G., Lanfermann, G., Massó, J., Radke, T., Seidel, E., and Shalf, J. The Cactus framework and toolkit: Design and applications. In Vector and Parallel Processing -- VECPAR'2002, 5th International Conference, Lecture Notes in Computer Science (Berlin, 2003), Springer.

Digital Library

[15]

Grimes, R., Kincaid, D., and Young, D. ITPACK 2.0 user's guide, August 1979.

[16]

Im, E.-J., and Yelick, K. Optimizing sparse matrix computations for register reuse in sparsity. In Computational Science - ICCS 2001, V. Alexandrov, J. Dongarra, B. Juliano, R. Renner, and C. Tan, Eds., vol. 2073 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2001, pp. 127--136. 10.1007/3-540-45545-0_22.

Digital Library

[17]

Im, E.-J., Yelick, K., and Vuduc, R. Sparsity: Optimization framework for sparse matrix kernels. International Journal of High Performance Computing Applications 18, 1 (2004), 135--158.

Digital Library

[18]

Khronos OpenCL Working Group. The OpenCL specification - version 1.2.

[19]

Krüger, J., and Westermann, R. Linear algebra operators for gpu implementation of numerical algorithms. In ACM SIGGRAPH 2005 Courses (New York, NY, USA, 2005), SIGGRAPH '05, ACM.

Digital Library

[20]

Los Alamos National Laboratory. PFLOTRAN. http://ees.lanl.gov/source/orgs/ees/pflotran/index.shtml.

[21]

Monakov, A., Lokhmotov, A., and Avetisyan, A. Automatically tuning sparse matrix-vector multiplication for gpu architectures. In High Performance Embedded Architectures and Compilers, Y. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, and X. Martorell, Eds., vol. 5952 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2010, pp. 111--125. 10.1007/978-3-642-11515-8_10.

Digital Library

[22]

Monorchio, A., and Mittra, R. Time-domain (fe/fdtd) technique for solving complex electromagnetic problems. Microwave and Guided Wave Letters, IEEE 8, 2 (feb 1998), 93--95.

[23]

Montagne, E., and Ekambaram, A. An optimal storage format for sparse matrices. Information Processing Letters 90, 2 (2004), 87--92.

Digital Library

[24]

Nikolova, N., Tam, H., and Bakr, M. Sensitivity analysis with the fdtd method on structured grids. Microwave Theory and Techniques, IEEE Transactions on 52, 4 (april 2004), 1207--1216.

[25]

NVIDIA Corporation. CUDA C programming guide - version 4.0.

[26]

NVIDIA Corporation. OpenCL programming guide for the CUDA architecture.

[27]

Saad, Y. Sparskit: A basic toolkit for sparse matrix computations - version 2.

[28]

Saad, Y. Iterative Methods for Sparse Linear Systems. Society for Industrial Mathematics, 2003.

Digital Library

[29]

Vázquez, F., Fernández, J. J., and Garzón, E. M. A new approach for sparse matrix vector product on NVIDIA GPUs. Concurrency and Computation: Practice and Experience 23, 8 (2011), 815--826.

Digital Library

Cited By

Hu JLuo HJiang HXiao GLi K(2024)FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347743135:12(2423-2434)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3477431
Zhao ZZhang GWu YHong RYang YFu Y(2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s11227-024-05949-6
Freire MMarichal RGonzaga de Oliveira SDufrechou EEzzatti P(2024)Enhancing the Sparse Matrix Storage Using Reordering TechniquesHigh Performance Computing10.1007/978-3-031-52186-7_5(66-76)Online publication date: 28-Jan-2024
https://doi.org/10.1007/978-3-031-52186-7_5
Show More Cited By

High-performance sparse matrix-vector multiplication on GPUs for structured grid computations
1. Computing methodologies
  1. Symbolic and algebraic manipulation
    1. Symbolic and algebraic algorithms
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis

Recommendations

Fast Structured Matrix Computations: Tensor Rank and Cohn---Umans Method

We discuss a generalization of the Cohn---Umans method, a potent technique developed for studying the bilinear complexity of matrix multiplication by embedding matrices into an appropriate group algebra. We investigate how the Cohn---Umans method may be ...
On Implementing Sparse Matrix Multi-vector Multiplication on GPUs
HPCC '14: Proceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)

Sparse matrix-vector and multi-vector multiplications (SpMV and SpMM) are performance bottlenecks operations in numerous HPC applications. A variety of SpMV GPU kernels using different matrix storage formats have been developed to accelerate these ...
On improving performance of sparse matrix-matrix multiplication on GPUs
ICS '17: Proceedings of the International Conference on Supercomputing

Sparse matrix-matrix multiplication (SpGEMM) is an important primitive for many data analytics algorithms, such as Markov clustering. Unlike the dense case, where performance of matrix-matrix multiplication is considerably higher than matrix-vector ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GPGPU-5: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units

March 2012

122 pages

ISBN:9781450312332

DOI:10.1145/2159430

Editors:
David Kaeli
Northeastern University, Boston, MA
,
John Cavazos
University of Delaware, Newark, DE
,
Enqiang Sun

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 March 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

GPGPU-5

Sponsor:

ACM

GPGPU-5: The 5th Annual Workshop on General Purpose Processing with Graphics Processing Units

March 3, 2012

London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
467
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu JLuo HJiang HXiao GLi K(2024)FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347743135:12(2423-2434)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3477431
Zhao ZZhang GWu YHong RYang YFu Y(2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s11227-024-05949-6
Freire MMarichal RGonzaga de Oliveira SDufrechou EEzzatti P(2024)Enhancing the Sparse Matrix Storage Using Reordering TechniquesHigh Performance Computing10.1007/978-3-031-52186-7_5(66-76)Online publication date: 28-Jan-2024
https://doi.org/10.1007/978-3-031-52186-7_5
Freire MMarichal RDufrechou EEzzatti PPedemonte M(2023)Trajectory-based Metaheuristics for Improving Sparse Matrix Storage2023 IEEE Latin American Conference on Computational Intelligence (LA-CCI)10.1109/LA-CCI58595.2023.10409303(1-6)Online publication date: 29-Oct-2023
https://doi.org/10.1109/LA-CCI58595.2023.10409303
Mehrabi ALee DChatterjee NSorin DLee BO'Connor M(2021)Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00016(48-58)Online publication date: Mar-2021
https://doi.org/10.1109/ISPASS51385.2021.00016
Gao JXia YYin RHe G(2021)Adaptive diagonal sparse matrix-vector multiplication on GPUJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.07.007Online publication date: Jul-2021
https://doi.org/10.1016/j.jpdc.2021.07.007
Wang QLi MPang JZhu D(2020)Research on Performance Optimization for Sparse Matrix-Vector Multiplication in Multi/Many-core Architecture2020 2nd International Conference on Information Technology and Computer Application (ITCA)10.1109/ITCA52113.2020.00081(350-362)Online publication date: Dec-2020
https://doi.org/10.1109/ITCA52113.2020.00081
Augustine TSarma JPouchet LRodríguez GMcKinley KFisher K(2019)Generating piecewise-regular code from irregular structuresProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314615(625-639)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314615
Al-Mouhamed MKhan AMohammad N(2019)A review of CUDA optimization techniques and tools for structured grid computingComputing10.1007/s00607-019-00744-1102:4(977-1003)Online publication date: 26-Jul-2019
https://doi.org/10.1007/s00607-019-00744-1
Mossaiby FJoulaian MDüster A(2019)The spectral cell method for wave propagation in heterogeneous materials simulated on multiple GPUs and CPUsComputational Mechanics10.1007/s00466-018-1623-463:5(805-819)Online publication date: 1-May-2019
https://dl.acm.org/doi/10.1007/s00466-018-1623-4
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten