skip to main content
10.1145/2751205.2751244acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Automatic Selection of Sparse Matrix Representation on GPUs

Published: 08 June 2015 Publication History

Abstract

Sparse matrix-vector multiplication (SpMV) is a core kernel in numerous applications, ranging from physics simulation and large-scale solvers to data analytics. Many GPU implementations of SpMV have been proposed, targeting several sparse representations and aiming at maximizing overall performance. No single sparse matrix representation is uniformly superior, and the best performing representation varies for sparse matrices with different sparsity patterns.
In this paper, we study the inter-relation between GPU architecture, sparse matrix representation and the sparse dataset. We perform extensive characterization of pertinent sparsity features of around 700 sparse matrices, and their SpMV performance with a number of sparse representations implemented in the NVIDIA CUSP and cuSPARSE libraries. We then build a decision model using machine learning to automatically select the best representation to use for a given sparse matrix on a given target platform, based on the sparse matrix features. Experimental results on three GPUs demonstrate that the approach is very effective in selecting the best representation.

References

[1]
A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus for graph applications. In SC'14, pages 781--792, 2014.
[2]
A. Ashari, N. Sedaghati, J. Eisenlohr, and P. Sadayappan. An efficient two-dimensional blocking mechanism for sparse matrix-vector multiplication on gpus. In ICS'14, 2014.
[3]
S. Balay, J. Brown, K. Buschelman, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc Web page, 2013. http://www.mcs.anl.gov/petsc.
[4]
M. M. Baskaran and R. Bordawekar. Optimizing sparse matrix-vector multiplication on gpus. In Technical report, IBM Research Report RC24704 (W0812-047), 2008.
[5]
N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, 2008.
[6]
N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In SC'09, 2009.
[7]
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, California, 1984.
[8]
J. W. Choi, A. Singh, and R. W. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In PPoPP'10, January 2010.
[9]
CUSP. The nvidia library of generic parallel algorithms for sparse linear algebra and graph computations on cuda architecture gpus. https://developer.nvidia.com/cusp.
[10]
cuSPARSE. The nvidia cuda sparse matrix library. https://developer.nvidia.com/cusparse.
[11]
J. Davis and E. Chung. Spmv: A memory-bound application on the gpu stuck between a rock and a hard place. Microsoft Research Technical Report MSR-TR-2012-95, Microsoft Research, 2012.
[12]
T. A. Davis and Y. Hu. The university of florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011.
[13]
A. Ekambaram and E. Montagne. An alternative compressed storage format for sparse matrices. In ISCIS, pages 196--203, 2003.
[14]
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression : A statistical view of boosting. Annals of statistics, 28(2):337--407, 2000.
[15]
J. Godwin, J. Holewinski, and P. Sadayappan. High-performance sparse matrix-vector multiplication on gpus for structured grid computations. GPGPU-5, 2012.
[16]
W. Liu and B. Vinter. Csr5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In ICS'15. ACM, 2015.
[17]
X. Liu, M. Smelyanskiy, E. Chow, and P. Dubey. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In ICS'13, pages 273--282. ACM, 2013.
[18]
N. I. of Standards and Technology. The matrix market format.
[19]
U. of Waikato. Weka 3: Data mining software in java. http://www.cs.waikato.ac.nz/ml/weka/.
[20]
I. Reguly and M. Giles. Efficient sparse matrix-vector multiplication on cache-based gpus. In Innovative Parallel Computing (InPar), pages 1--12, 2012.
[21]
D. M. Y. Roger G. Grimes, David Ronald Kincaid. ITPACK 2.0: User's Guide. 1980.
[22]
Y. Saad. Sparskit: a basic tool kit for sparse matrix computations - version 2. 1994.
[23]
N. Sedaghati, A. Ashari, L.-N. Pouchet, S. Parthasarathy, and P. Sadayappan. Characterizing dataset dependence for sparse matrix-vector multiplication on gpus. In PPAA'15, pages 17--24. ACM, 2015.
[24]
B.-Y. Su and K. Keutzer. clspmv: A cross-platform opencl spmv framework on gpus. In ICS'12, pages 353--364, 2012.
[25]
R. W. Vuduc. Automatic performance tuning of sparse matrix kernels. PhD thesis, University of California, January 2004.
[26]
S. Williams, L. Oliker, R. W. Vuduc, J. Shalf, K. A. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Computing, 35(3):178--194, 2009.
[27]
S. Yan, C. Li, Y. Zhang, and H. Zhou. yaspmv: Yet another spmv framework on gpus. In PLDI'10, pages 107--118. ACM, 2014.
[28]
X. Yang, S. Parthasarathy, and P. Sadayappan. Fast sparse matrix-vector multiplication on gpus: implications for graph mining. Proc. VLDB Endow., 4(4):231--242, January 2011.

Cited By

View all
  • (2025)RASSM: Residue-based Acceleration of Single Sparse Matrix Computation via Adaptive TilingProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707219(907-923)Online publication date: 30-Mar-2025
  • (2024)Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUsThe International Journal of High Performance Computing Applications10.1177/1094342024123192838:3(245-259)Online publication date: 5-Feb-2024
  • (2024)CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673042(640-649)Online publication date: 12-Aug-2024
  • Show More Cited By

Index Terms

  1. Automatic Selection of Sparse Matrix Representation on GPUs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
    June 2015
    446 pages
    ISBN:9781450335591
    DOI:10.1145/2751205
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. gpu
    2. machine learning models
    3. spmv

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation

    Conference

    ICS'15
    Sponsor:
    ICS'15: 2015 International Conference on Supercomputing
    June 8 - 11, 2015
    California, Newport Beach, USA

    Acceptance Rates

    ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)80
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)RASSM: Residue-based Acceleration of Single Sparse Matrix Computation via Adaptive TilingProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707219(907-923)Online publication date: 30-Mar-2025
    • (2024)Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUsThe International Journal of High Performance Computing Applications10.1177/1094342024123192838:3(245-259)Online publication date: 5-Feb-2024
    • (2024)CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673042(640-649)Online publication date: 12-Aug-2024
    • (2024)Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUsACM Transactions on Architecture and Code Optimization10.1145/366492421:3(1-27)Online publication date: 15-May-2024
    • (2024)Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector MultiplicationACM Transactions on Architecture and Code Optimization10.1145/365302021:2(1-24)Online publication date: 21-May-2024
    • (2024)Scalable Dynamic Embedding Size Search for Streaming RecommendationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679638(1941-1950)Online publication date: 21-Oct-2024
    • (2024)HAM-SpMSpV: an Optimized Parallel Algorithm for Masked Sparse Matrix-Sparse Vector Multiplications on multi-core CPUsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658680(160-173)Online publication date: 3-Jun-2024
    • (2024)Budgeted Embedding Table For Recommender SystemsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635778(557-566)Online publication date: 4-Mar-2024
    • (2024)DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMVIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.348805335:12(2624-2639)Online publication date: Dec-2024
    • (2024)FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347743135:12(2423-2434)Online publication date: Dec-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media