skip to main content
10.1145/2159430.2159436acmconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

High-performance sparse matrix-vector multiplication on GPUs for structured grid computations

Published: 03 March 2012 Publication History

Abstract

In this paper, we address efficient sparse matrix-vector multiplication for matrices arising from structured grid problems with high degrees of freedom at each grid node. Sparse matrix-vector multiplication is a critical step in the iterative solution of sparse linear systems of equations arising in the solution of partial differential equations using uniform grids for discretization. With uniform grids, the resulting linear system Ax = b has a matrix A that is sparse with a very regular structure. The specific focus of this paper is on sparse matrices that have a block structure due to the large number of unknowns at each grid point. Sparse matrix storage formats such as Compressed Sparse Row (CSR) and Diagonal format (DIA) are not the most effective for such matrices.
In this work, we present a new sparse matrix storage format that takes advantage of the diagonal structure of matrices for stencil operations on structured grids. Unlike other formats such as the Diagonal storage format (DIA), we specifically optimize for the case of higher degrees of freedom, where formats such as DIA are forced to explicitly represent many zero elements in the sparse matrix. We develop efficient sparse matrix-vector multiplication for structured grid computations on GPU architectures using CUDA [25].

References

[1]
Badcock, K., Richards, B., and Woodgate, M. Elements of computational fluid dynamics on block structured grids using implicit solvers. Progress in Aerospace Sciences 36, 5--6 (2000), 351--392.
[2]
Balay, S., Brown, J., Buschelman, K., Eijkhout, V., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., and Zhang, H. PETSc users manual. Tech. Rep. ANL-95/11 - Revision 3.2, Argonne National Laboratory, 2011.
[3]
Baskaran, M. M., and Bordawekar, R. Optimizing sparse matrix-vector multiplication on GPUs. Tech. Rep. RC24704 (W0812--047), IBM T. J. Watson Research Center, April 2009.
[4]
Bell, N., and Garland, M. Efficient sparse matrix-vector multiplication on CUDA. Tech. Rep. NVR-2008-004, NVIDIA Corporation, December 2008.
[5]
Bell, N., and Garland, M. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (New York, NY, USA, 2009), SC '09, ACM, pp. 18:1--18:11.
[6]
Bell, N., and Garland, M. Cusp: Generic parallel algorithms for sparse matrix and graph computations, 2010. Version 0.1.0.
[7]
Bolz, J., Farmer, I., Grinspun, E., and Schröoder, P. Sparse matrix solvers on the gpu: conjugate gradients and multigrid. In ACM SIGGRAPH 2003 Papers (New York, NY, USA, 2003), SIGGRAPH '03, ACM, pp. 917--924.
[8]
Choi, J. W., Singh, A., and Vuduc, R. W. Model-driven autotuning of sparse matrix-vector multiply on gpus. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New York, NY, USA, 2010), PPoPP '10, ACM, pp. 115--126.
[9]
D'Azevedo, E., Fahey, M., and Mills, R. Vectorized sparse matrix multiply for compressed row storage format. In Computational Science - ICCS 2005, V. Sunderam, G. van Albada, P. Sloot, and J. Dongarra, Eds., vol. 3514 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2005, pp. 785--789. 10.1007/11428831_13.
[10]
Ekambaram, A., and Montagne, E. An alternative compressed storage format for sparse matrices. In Computer and Information Sciences - ISCIS 2003, A. Yazici and C. Sener, Eds., vol. 2869 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2003, pp. 196--203. 10.1007/978-3-540-39737-3_25.
[11]
Garland, M. Sparse matrix computations on manycore gpu's. In Proceedings of the 45th annual Design Automation Conference (New York, NY, USA, 2008), DAC '08, ACM, pp. 2--6.
[12]
Geveler, M., Ribbrock, D., Göddeke, D., Peter, Z., and Stefan, T. Efficient finite element geometric multigrid solvers for unstructured grids on GPUs. PARENG (April 2011).
[13]
Geveler, M., Ribbrock, D., Göddeke, D., Peter, Z., and Stefan, T. Towards a complete FEM-based simulation toolkit on GPUs: Geometric multigrid solvers. ParCFD (May 2011).
[14]
Goodale, T., Allen, G., Lanfermann, G., Massó, J., Radke, T., Seidel, E., and Shalf, J. The Cactus framework and toolkit: Design and applications. In Vector and Parallel Processing -- VECPAR'2002, 5th International Conference, Lecture Notes in Computer Science (Berlin, 2003), Springer.
[15]
Grimes, R., Kincaid, D., and Young, D. ITPACK 2.0 user's guide, August 1979.
[16]
Im, E.-J., and Yelick, K. Optimizing sparse matrix computations for register reuse in sparsity. In Computational Science - ICCS 2001, V. Alexandrov, J. Dongarra, B. Juliano, R. Renner, and C. Tan, Eds., vol. 2073 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2001, pp. 127--136. 10.1007/3-540-45545-0_22.
[17]
Im, E.-J., Yelick, K., and Vuduc, R. Sparsity: Optimization framework for sparse matrix kernels. International Journal of High Performance Computing Applications 18, 1 (2004), 135--158.
[18]
Khronos OpenCL Working Group. The OpenCL specification - version 1.2.
[19]
Krüger, J., and Westermann, R. Linear algebra operators for gpu implementation of numerical algorithms. In ACM SIGGRAPH 2005 Courses (New York, NY, USA, 2005), SIGGRAPH '05, ACM.
[20]
Los Alamos National Laboratory. PFLOTRAN. http://ees.lanl.gov/source/orgs/ees/pflotran/index.shtml.
[21]
Monakov, A., Lokhmotov, A., and Avetisyan, A. Automatically tuning sparse matrix-vector multiplication for gpu architectures. In High Performance Embedded Architectures and Compilers, Y. Patt, P. Foglia, E. Duesterwald, P. Faraboschi, and X. Martorell, Eds., vol. 5952 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2010, pp. 111--125. 10.1007/978-3-642-11515-8_10.
[22]
Monorchio, A., and Mittra, R. Time-domain (fe/fdtd) technique for solving complex electromagnetic problems. Microwave and Guided Wave Letters, IEEE 8, 2 (feb 1998), 93--95.
[23]
Montagne, E., and Ekambaram, A. An optimal storage format for sparse matrices. Information Processing Letters 90, 2 (2004), 87--92.
[24]
Nikolova, N., Tam, H., and Bakr, M. Sensitivity analysis with the fdtd method on structured grids. Microwave Theory and Techniques, IEEE Transactions on 52, 4 (april 2004), 1207--1216.
[25]
NVIDIA Corporation. CUDA C programming guide - version 4.0.
[26]
NVIDIA Corporation. OpenCL programming guide for the CUDA architecture.
[27]
Saad, Y. Sparskit: A basic toolkit for sparse matrix computations - version 2.
[28]
Saad, Y. Iterative Methods for Sparse Linear Systems. Society for Industrial Mathematics, 2003.
[29]
Vázquez, F., Fernández, J. J., and Garzón, E. M. A new approach for sparse matrix vector product on NVIDIA GPUs. Concurrency and Computation: Practice and Experience 23, 8 (2011), 815--826.

Cited By

View all
  • (2024)FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347743135:12(2423-2434)Online publication date: Dec-2024
  • (2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
  • (2024)Enhancing the Sparse Matrix Storage Using Reordering TechniquesHigh Performance Computing10.1007/978-3-031-52186-7_5(66-76)Online publication date: 28-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GPGPU-5: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
March 2012
122 pages
ISBN:9781450312332
DOI:10.1145/2159430
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 March 2012

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

GPGPU-5
Sponsor:

Acceptance Rates

Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347743135:12(2423-2434)Online publication date: Dec-2024
  • (2024)Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUsThe Journal of Supercomputing10.1007/s11227-024-05949-680:10(13681-13713)Online publication date: 1-Jul-2024
  • (2024)Enhancing the Sparse Matrix Storage Using Reordering TechniquesHigh Performance Computing10.1007/978-3-031-52186-7_5(66-76)Online publication date: 28-Jan-2024
  • (2023)Trajectory-based Metaheuristics for Improving Sparse Matrix Storage2023 IEEE Latin American Conference on Computational Intelligence (LA-CCI)10.1109/LA-CCI58595.2023.10409303(1-6)Online publication date: 29-Oct-2023
  • (2021)Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS51385.2021.00016(48-58)Online publication date: Mar-2021
  • (2021)Adaptive diagonal sparse matrix-vector multiplication on GPUJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.07.007Online publication date: Jul-2021
  • (2020)Research on Performance Optimization for Sparse Matrix-Vector Multiplication in Multi/Many-core Architecture2020 2nd International Conference on Information Technology and Computer Application (ITCA)10.1109/ITCA52113.2020.00081(350-362)Online publication date: Dec-2020
  • (2019)Generating piecewise-regular code from irregular structuresProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314615(625-639)Online publication date: 8-Jun-2019
  • (2019)A review of CUDA optimization techniques and tools for structured grid computingComputing10.1007/s00607-019-00744-1102:4(977-1003)Online publication date: 26-Jul-2019
  • (2019)The spectral cell method for wave propagation in heterogeneous materials simulated on multiple GPUs and CPUsComputational Mechanics10.1007/s00466-018-1623-463:5(805-819)Online publication date: 1-May-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media