research-article

Automatic Selection of Sparse Matrix Representation on GPUs

Authors:

Naser Sedaghati,

Louis-Noel Pouchet,

Srinivasan Parthasarathy,

P. SadayappanAuthors Info & Claims

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Pages 99 - 108

https://doi.org/10.1145/2751205.2751244

Published: 08 June 2015 Publication History

Abstract

Sparse matrix-vector multiplication (SpMV) is a core kernel in numerous applications, ranging from physics simulation and large-scale solvers to data analytics. Many GPU implementations of SpMV have been proposed, targeting several sparse representations and aiming at maximizing overall performance. No single sparse matrix representation is uniformly superior, and the best performing representation varies for sparse matrices with different sparsity patterns.

In this paper, we study the inter-relation between GPU architecture, sparse matrix representation and the sparse dataset. We perform extensive characterization of pertinent sparsity features of around 700 sparse matrices, and their SpMV performance with a number of sparse representations implemented in the NVIDIA CUSP and cuSPARSE libraries. We then build a decision model using machine learning to automatically select the best representation to use for a given sparse matrix on a given target platform, based on the sparse matrix features. Experimental results on three GPUs demonstrate that the approach is very effective in selecting the best representation.

References

[1]

A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus for graph applications. In SC'14, pages 781--792, 2014.

Digital Library

[2]

A. Ashari, N. Sedaghati, J. Eisenlohr, and P. Sadayappan. An efficient two-dimensional blocking mechanism for sparse matrix-vector multiplication on gpus. In ICS'14, 2014.

Digital Library

[3]

S. Balay, J. Brown, K. Buschelman, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc Web page, 2013. http://www.mcs.anl.gov/petsc.

[4]

M. M. Baskaran and R. Bordawekar. Optimizing sparse matrix-vector multiplication on gpus. In Technical report, IBM Research Report RC24704 (W0812-047), 2008.

[5]

N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, 2008.

[6]

N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In SC'09, 2009.

Digital Library

[7]

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, California, 1984.

[8]

J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In PPoPP'10, January 2010.

Digital Library

[9]

CUSP. The nvidia library of generic parallel algorithms for sparse linear algebra and graph computations on cuda architecture gpus. https://developer.nvidia.com/cusp.

[10]

cuSPARSE. The nvidia cuda sparse matrix library. https://developer.nvidia.com/cusparse.

[11]

J. Davis and E. Chung. Spmv: A memory-bound application on the gpu stuck between a rock and a hard place. Microsoft Research Technical Report MSR-TR-2012-95, Microsoft Research, 2012.

[12]

T. A. Davis and Y. Hu. The university of florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011.

Digital Library

[13]

A. Ekambaram and E. Montagne. An alternative compressed storage format for sparse matrices. In ISCIS, pages 196--203, 2003.

[14]

J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression : A statistical view of boosting. Annals of statistics, 28(2):337--407, 2000.

[15]

J. Godwin, J. Holewinski, and P. Sadayappan. High-performance sparse matrix-vector multiplication on gpus for structured grid computations. GPGPU-5, 2012.

Digital Library

[16]

W. Liu and B. Vinter. Csr5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In ICS'15. ACM, 2015.

Digital Library

[17]

X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In ICS'13, pages 273--282. ACM, 2013.

Digital Library

[18]

N. I. of Standards and Technology. The matrix market format.

[19]

U. of Waikato. Weka 3: Data mining software in java. http://www.cs.waikato.ac.nz/ml/weka/.

[20]

I. Reguly and M. Giles. Efficient sparse matrix-vector multiplication on cache-based gpus. In Innovative Parallel Computing (InPar), pages 1--12, 2012.

[21]

D. M. Y. Roger G. Grimes, David Ronald Kincaid. ITPACK 2.0: User's Guide. 1980.

[22]

Y. Saad. Sparskit: a basic tool kit for sparse matrix computations - version 2. 1994.

[23]

N. Sedaghati, A. Ashari, L.-N. Pouchet, S. Parthasarathy, and P. Sadayappan. Characterizing dataset dependence for sparse matrix-vector multiplication on gpus. In PPAA'15, pages 17--24. ACM, 2015.

Digital Library

[24]

B.-Y. Su and K. Keutzer. clspmv: A cross-platform opencl spmv framework on gpus. In ICS'12, pages 353--364, 2012.

Digital Library

[25]

R. W. Vuduc. Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, January 2004.

Digital Library

[26]

S. Williams, L. Oliker, R. W. Vuduc, J. Shalf, K. A. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178--194, 2009.

Digital Library

[27]

S. Yan, C. Li, Y. Zhang, and H. Zhou. yaspmv: Yet another spmv framework on gpus. In PLDI'10, pages 107--118. ACM, 2014.

[28]

X. Yang, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus: implications for graph mining. Proc. VLDB Endow., 4(4):231--242, January 2011.

Digital Library

Cited By

Jain AGupta PConte TEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)RASSM: Residue-based Acceleration of Single Sparse Matrix Computation via Adaptive TilingProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707219(907-923)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707219
Wei BWang YChang FGao JJi W(2024)Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUsThe International Journal of High Performance Computing Applications10.1177/1094342024123192838:3(245-259)Online publication date: 5-Feb-2024
https://doi.org/10.1177/10943420241231928
Guo JXia RLiu JZhu XZhang X(2024)CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673042(640-649)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673042
Show More Cited By

Index Terms

Automatic Selection of Sparse Matrix Representation on GPUs
1. Mathematics of computing
  1. Mathematical software

Recommendations

An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs
ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

Sparse matrix-vector multiplication (SpMV) is one of the key operations in linear algebra. Overcoming thread divergence, load imbalance and non-coalesced and indirect memory access due to sparsity and irregularity are challenges to optimizing SpMV on ...
Characterizing dataset dependence for sparse matrix-vector multiplication on GPUs
PPAA 2015: Proceedings of the 2nd Workshop on Parallel Programming for Analytics Applications

Sparse matrix-vector multiplication (SpMV) is a widely used kernel in scientific applications as well as data analytics. Many GPU implementations of SpMV have been proposed, proposing different sparse matrix representations. However, no sparse matrix ...
On Implementing Sparse Matrix Multi-vector Multiplication on GPUs
HPCC '14: Proceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)

Sparse matrix-vector and multi-vector multiplications (SpMV and SpMM) are performance bottlenecks operations in numerous HPC applications. A variety of SpMV GPU kernels using different matrix storage formats have been developed to accelerate these ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

June 2015

446 pages

ISBN:9781450335591

DOI:10.1145/2751205

General Chair:
Laxmi N. Bhuyan
University of California, Riverside
,
Program Chairs:
Fred Chong
University of California, Santa Barbara
,
Vivek Sarkar
Rice University

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ICS'15

Sponsor:

SIGARCH

ICS'15: 2015 International Conference on Supercomputing

June 8 - 11, 2015

California, Newport Beach, USA

Acceptance Rates

ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

105
Total Citations
View Citations
657
Total Downloads

Downloads (Last 12 months)80
Downloads (Last 6 weeks)9

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jain AGupta PConte TEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)RASSM: Residue-based Acceleration of Single Sparse Matrix Computation via Adaptive TilingProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707219(907-923)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707219
Wei BWang YChang FGao JJi W(2024)Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUsThe International Journal of High Performance Computing Applications10.1177/1094342024123192838:3(245-259)Online publication date: 5-Feb-2024
https://doi.org/10.1177/10943420241231928
Guo JXia RLiu JZhu XZhang X(2024)CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673042(640-649)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673042
Wang YChang FWei BGao JJi W(2024)Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUsACM Transactions on Architecture and Code Optimization10.1145/366492421:3(1-27)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3664924
Hwang SBaek DPark JHuh J(2024)Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector MultiplicationACM Transactions on Architecture and Code Optimization10.1145/365302021:2(1-24)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3653020
Qu YQu LChen TZhao XNguyen QYin HSerra ESpezzano F(2024)Scalable Dynamic Embedding Size Search for Streaming RecommendationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679638(1941-1950)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679638
Xu LJia HZhang YWang LJiang XMencagli GDazzi PLowenthal DBadia R(2024)HAM-SpMSpV: an Optimized Parallel Algorithm for Masked Sparse Matrix-Sparse Vector Multiplications on multi-core CPUsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658680(160-173)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658680
Qu YChen TNguyen QYin HAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Budgeted Embedding Table For Recommender SystemsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635778(557-566)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635778
Shi ZZou YSong XLi SLiu FXue Q(2024)DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMVIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.348805335:12(2624-2639)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3488053
Hu JLuo HJiang HXiao GLi K(2024)FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347743135:12(2423-2434)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3477431
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten