research-article

Public Access

Bridging the gap between deep learning and sparse matrix format selection

Authors:

Xipeng ShenAuthors Info & Claims

PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 94 - 108

https://doi.org/10.1145/3178487.3178495

Published: 10 February 2018 Publication History

Abstract

This work presents a systematic exploration on the promise and special challenges of deep learning for sparse matrix format selection---a problem of determining the best storage format for a matrix to maximize the performance of Sparse Matrix Vector Multiplication (SpMV). It describes how to effectively bridge the gap between deep learning and the special needs of the pillar HPC problem through a set of techniques on matrix representations, deep learning structure, and cross-architecture model migrations. The new solution cuts format selection errors by two thirds, and improves SpMV performance by 1.73X on average over the state of the art.

Supplementary Material

Artifacts Available (dnnspmv.zip)

Sparse matrix storage format selection for SpMV

Download
24.45 KB

References

[1]

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. 2006. Using machine learning to focus iterative optimization. In International Symposium on Code Generation and Optimization. 295--305.

Digital Library

[2]

L. Almagor, Keith D. Cooper, Alexander Grosul, Timothy J. Harvey, Steven W. Reeves, Devika Subramanian, Linda Torczon, and Todd Waterman. 2004. Finding effective compilation sequences. In LCTES'04. 231--239.

Digital Library

[3]

Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A Language and Compiler for Algorithmic Choice. In PLDI. Dublin, Ireland.

Digital Library

[4]

H. Anzt, J. Dongarra, M. Kreutzer, G. Wellein, and M. KÃűhler. 2016. Efficiency of General Krylov Methods on GPUs - An Experimental Study. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 683--691.

[5]

Amir Beck and Marc Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2, 1 (2009), 183--202.

Digital Library

[6]

Nathan Bell and Michael Garland. 2009. Implementing Sparse Matrix-vector Multiplication on Throughput-oriented Processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09). ACM, New York, NY, USA, Article 18, 11 pages.

Digital Library

[7]

Sanjukta Bhowmick, Brice Toth, and Padma Raghavan. 2009. Towards low-cost, high-accuracy classifiers for linear solver selection. In International Conference on Computational Science. Springer, 463--472.

Digital Library

[8]

Jeff Bolz, Ian Farmer, Eitan Grinspun, and Peter Schröoder. 2003. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. In ACM SIGGRAPH 2003 Papers (SIGGRAPH '03). ACM, New York, NY, USA, 917--924.

Digital Library

[9]

Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. In Proceedings of the Seventh International Conference on World Wide Web 7 (WWW7). Elsevier Science Publishers B. V., Amsterdam, The Netherlands, The Netherlands, 107--117. http://dl.acm.org/citation.cfm?id=297805.297827

Digital Library

[10]

Jee W. Choi, Amik Singh, and Richard W. Vuduc. 2010. Model-driven Autotuning of Sparse Matrix-vector Multiply on GPUs. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '10). ACM, New York, NY, USA, 115--126.

Digital Library

[11]

Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages.

Digital Library

[12]

Y. Ding, J. Ansel, K. Veeramachaneni, X. Shen, U. O'Reilly, and S. Amarasinghe. 2015. Autotuning Algorithmic Choice for Input Sensitivity. In Proceedings of the 36th annual ACM SIGPLAN conference on Programming Language Design and Implementation.

Digital Library

[13]

R. D. Falgout. 2006. An Introduction to Algebraic Multigrid Computing. Computing in Science Engineering 8, 6 (Nov 2006), 24--33.

Digital Library

[14]

Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Ayal Zaks, Bilha Mendelson, Edwin Bonilla, John Thomson, Hugh Leather, Chris Williams, Michael O'Boyle, Phil Barnard, Elton Ashton, Eric Courtois, and Francois Bodin. 2008. MILEPOST GCC: machine learning based research compiler. In Proceedings of the GCC Developers' Summit.

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).

[16]

Marat F. Khairoutdinov and David A. Randall. 2001. A cloud resolving model as a cloud parameterization in the NCAR Community Climate System Model: Preliminary results. Geophysical Research Letters 28, 18 (2001), 3617--3620.

[17]

Kornilios Kourtis, Vasileios Karakasis, Georgios Goumas, and Nectarios Koziris. 2011. CSX: An Extended Compression Format for Spmv on Shared Memory Systems. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP '11). ACM, New York, NY, USA, 247--256.

Digital Library

[18]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

Digital Library

[19]

Daniel Langr and Pavel Tvrdik. 2016. Evaluation Criteria for Sparse Matrix Storage Formats. IEEE Trans. Parallel Distrib. Syst. 27, 2 (Feb. 2016), 428--440.

Digital Library

[20]

Jiajia Li, Guangming Tan, Mingyu Chen, and Ninghui Sun. 2013. SMAT: An Input Adaptive Auto-tuner for Sparse Matrix-vector Multiplication. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13). ACM, New York, NY, USA, 117--126.

Digital Library

[21]

Weifeng Liu. 2016. Benchmark SpMV using CSR5. https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5. (2016).

[22]

Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, and Brian Vinter. 2017. Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides. Concurrency and Computation: Practice and Experience 29, 21 (2017), e4244-n/a.

[23]

Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, 339--350.

Digital Library

[24]

Weifeng Liu and Brian Vinter. 2015. Speculative Segmented Sum for Sparse Matrix-vector Multiplication on Heterogeneous Processors. Parallel Comput. 49, C (Nov. 2015), 179--193.

Digital Library

[25]

Xing Liu, Mikhail Smelyanskiy, Edmond Chow, and Pradeep Dubey. 2013. Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS '13). ACM, New York, NY, USA, 273--282.

Digital Library

[26]

Duane Merrill and Michael Garland. 2016. Merge-based Sparse Matrix-vector Multiplication (SpMV) Using the CSR Storage Format. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). ACM, New York, NY, USA, Article 43, 2 pages.

Digital Library

[27]

Kevin P Murphy. 2012. Machine learning: a probabilistic perspective. MIT press.

Digital Library

[28]

M Naumov, LS Chien, P Vandermersch, and U Kapasi. 2010. CUSPARSE Library: A Set of Basic Linear Algebra Subroutines for Sparse Matrices. In GPU Technology Conference, Vol. 2070.

[29]

Eunjung Park, L.-N. Pouche, J. Cavazos, A. Cohen, and P. Sadayappan. 2011. Predictive modeling in a polyhedral optimization space. In IEEE/ACM International Symposium on Code Generation and Optimization. 119 --129.

Digital Library

[30]

Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN Features off-the-shelf: an Astounding Baseline for Recognition. CoRR abs/1403.6382 (2014). http://arxiv.org/abs/1403.6382

[31]

Yousef Saad. 1994. SPARSKIT : a basic tool kit for sparse matrix computations. Technical Report. University of Minnesota.

[32]

Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, and P. Sadayappan. 2015. Automatic Selection of Sparse Matrix Representation on GPUs. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, 99--108.

Digital Library

[33]

Bor-Yiing Su and Kurt Keutzer. 2012. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS '12). ACM, New York, NY, USA, 353--364.

Digital Library

[34]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.

[35]

Guangming Tan, Junhong Liu, and Jiajia Li. 2018. Design and Implementation of Adaptive SpMV Library for Multicore and Manycore Architecture. ACM Trans. Math. Softw. (To appear) (2018).

[36]

K. Tian, Y. Jiang, E. Zhang, and X. Shen. 2010. An Input-Centric Paradigm for Program Dynamic Optimizations. In the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA).

Digital Library

[37]

Richard Wilson Vuduc. 2003. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. AAI3121741.

[38]

Richard W. Vuduc and Hyun-Jin Moon. 2005. Fast Sparse Matrix-vector Multiplication by Exploiting Variable Block Structure. In Proceedings of the First International Conference on High Performance Computing and Communications (HPCC'05). Springer-Verlag, Berlin, Heidelberg, 807--816.

Digital Library

[39]

Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi™. Springer, 167--188.

[40]

Xinliang Wang, Weifeng Liu, Wei Xue, and Li Wu. 2018. swSpTRSV: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (To appear) (PPoPP '18).

Digital Library

[41]

Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2009. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput. 35, 3 (2009), 178 -- 194. Revolutionary Technologies for Acceleration of Emerging Petascale Applications.

Digital Library

[42]

Biwei Xie, Jianfeng Zhan, Zhen Jia, Wanling Gao, Lixin Zhang, and Xu Liu. 2018. CVR: Efficient SpMV Vectorization on X86 Processors. The 2018 International Symposium on Code Generation and Optimization (To appear) (2018).

[43]

Shengen Yan, Chao Li, Yunquan Zhang, and Huiyang Zhou. 2014. yaSpMV: Yet Another SpMV Framework on GPUs. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 107--118.

Digital Library

[44]

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? CoRR abs/1411.1792 (2014). http://arxiv.org/abs/1411.1792

Digital Library

Cited By

Chen JSung HZhang RLi AShen X(2025)Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph ReorderingProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710881(16-28)Online publication date: 28-Feb-2025
https://dl.acm.org/doi/10.1145/3710848.3710881
Jain AGupta PConte TEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)RASSM: Residue-based Acceleration of Single Sparse Matrix Computation via Adaptive TilingProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707219(907-923)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707219
Gao JJi WWang Y(2024)Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU SystemsACM Transactions on Architecture and Code Optimization10.1145/367684721:4(1-24)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3676847
Show More Cited By

Index Terms

Bridging the gap between deep learning and sparse matrix format selection
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

Automatic Selection of Sparse Matrix Representation on GPUs
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Sparse matrix-vector multiplication (SpMV) is a core kernel in numerous applications, ranging from physics simulation and large-scale solvers to data analytics. Many GPU implementations of SpMV have been proposed, targeting several sparse ...
Bridging the gap between deep learning and sparse matrix format selection
PPoPP '18

This work presents a systematic exploration on the promise and special challenges of deep learning for sparse matrix format selection---a problem of determining the best storage format for a matrix to maximize the performance of Sparse Matrix Vector ...
Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format
PPoPP '16

We present a perfectly balanced, "merge-based" parallel method for computing sparse matrix-vector products (SpMV). Our algorithm operates directly upon the Compressed Sparse Row (CSR) sparse matrix format, a predominant in-memory representation for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2018

442 pages

ISBN:9781450349826

DOI:10.1145/3178487

General Chair:
Andreas Krall
Vienna University of Technology, Austria
,
Program Chair:
Thomas R. Gross
ETH Zürich, Switzerland

ACM SIGPLAN Notices Volume 53, Issue 1
PPoPP '18
January 2018
426 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3200691
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 10 February 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

IBM Ph.D. Fellowship Award
DOE Early Career Award
National Science Foundation (NSF)

Conference

PPoPP '18

Sponsor:

PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 24 - 28, 2018

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

110
Total Citations
View Citations
2,301
Total Downloads

Downloads (Last 12 months)482
Downloads (Last 6 weeks)67

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen JSung HZhang RLi AShen X(2025)Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph ReorderingProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710881(16-28)Online publication date: 28-Feb-2025
https://dl.acm.org/doi/10.1145/3710848.3710881
Jain AGupta PConte TEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)RASSM: Residue-based Acceleration of Single Sparse Matrix Computation via Adaptive TilingProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707219(907-923)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707219
Gao JJi WWang Y(2024)Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU SystemsACM Transactions on Architecture and Code Optimization10.1145/367684721:4(1-24)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3676847
Guo JXia RLiu JZhu XZhang X(2024)CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673042(640-649)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673042
Shi ZZou YSong XLi SLiu FXue Q(2024)DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMVIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.348805335:12(2624-2639)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3488053
Xia RGuo JZhang HYang SWang QLiu J(2024)Sparse Matrix Reordering Method Selection with Parallel Computing and Deep Learning2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651141(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651141
Gao JJi WLiu JWang YShi F(2024)Revisiting thread configuration of SpMV kernels on GPUJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104799185:COnline publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1016/j.jpdc.2023.104799
Xu WSun YFan SYu HFu X(2023)Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUsACM Transactions on Architecture and Code Optimization10.1145/360009220:3(1-26)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3600092
Lu YLiu WMohror KArnold DBadia R(2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607051
Chen JSung HShen XChoudhury SLi AGallivan KNikolopoulos DBeivide RGallopoulos E(2023)BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUsProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593725(264-276)Online publication date: 21-Jun-2023
https://dl.acm.org/doi/10.1145/3577193.3593725
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten