skip to main content
10.1145/3178487.3178495acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Public Access

Bridging the gap between deep learning and sparse matrix format selection

Published: 10 February 2018 Publication History

Abstract

This work presents a systematic exploration on the promise and special challenges of deep learning for sparse matrix format selection---a problem of determining the best storage format for a matrix to maximize the performance of Sparse Matrix Vector Multiplication (SpMV). It describes how to effectively bridge the gap between deep learning and the special needs of the pillar HPC problem through a set of techniques on matrix representations, deep learning structure, and cross-architecture model migrations. The new solution cuts format selection errors by two thirds, and improves SpMV performance by 1.73X on average over the state of the art.

Supplementary Material

Artifacts Available (dnnspmv.zip)
Sparse matrix storage format selection for SpMV

References

[1]
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. 2006. Using machine learning to focus iterative optimization. In International Symposium on Code Generation and Optimization. 295--305.
[2]
L. Almagor, Keith D. Cooper, Alexander Grosul, Timothy J. Harvey, Steven W. Reeves, Devika Subramanian, Linda Torczon, and Todd Waterman. 2004. Finding effective compilation sequences. In LCTES'04. 231--239.
[3]
Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A Language and Compiler for Algorithmic Choice. In PLDI. Dublin, Ireland.
[4]
H. Anzt, J. Dongarra, M. Kreutzer, G. Wellein, and M. KÃűhler. 2016. Efficiency of General Krylov Methods on GPUs - An Experimental Study. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 683--691.
[5]
Amir Beck and Marc Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2, 1 (2009), 183--202.
[6]
Nathan Bell and Michael Garland. 2009. Implementing Sparse Matrix-vector Multiplication on Throughput-oriented Processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09). ACM, New York, NY, USA, Article 18, 11 pages.
[7]
Sanjukta Bhowmick, Brice Toth, and Padma Raghavan. 2009. Towards low-cost, high-accuracy classifiers for linear solver selection. In International Conference on Computational Science. Springer, 463--472.
[8]
Jeff Bolz, Ian Farmer, Eitan Grinspun, and Peter Schröoder. 2003. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. In ACM SIGGRAPH 2003 Papers (SIGGRAPH '03). ACM, New York, NY, USA, 917--924.
[9]
Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. In Proceedings of the Seventh International Conference on World Wide Web 7 (WWW7). Elsevier Science Publishers B. V., Amsterdam, The Netherlands, The Netherlands, 107--117. http://dl.acm.org/citation.cfm?id=297805.297827
[10]
Jee W. Choi, Amik Singh, and Richard W. Vuduc. 2010. Model-driven Autotuning of Sparse Matrix-vector Multiply on GPUs. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '10). ACM, New York, NY, USA, 115--126.
[11]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages.
[12]
Y. Ding, J. Ansel, K. Veeramachaneni, X. Shen, U. O'Reilly, and S. Amarasinghe. 2015. Autotuning Algorithmic Choice for Input Sensitivity. In Proceedings of the 36th annual ACM SIGPLAN conference on Programming Language Design and Implementation.
[13]
R. D. Falgout. 2006. An Introduction to Algebraic Multigrid Computing. Computing in Science Engineering 8, 6 (Nov 2006), 24--33.
[14]
Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Ayal Zaks, Bilha Mendelson, Edwin Bonilla, John Thomson, Hugh Leather, Chris Williams, Michael O'Boyle, Phil Barnard, Elton Ashton, Eric Courtois, and Francois Bodin. 2008. MILEPOST GCC: machine learning based research compiler. In Proceedings of the GCC Developers' Summit.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).
[16]
Marat F. Khairoutdinov and David A. Randall. 2001. A cloud resolving model as a cloud parameterization in the NCAR Community Climate System Model: Preliminary results. Geophysical Research Letters 28, 18 (2001), 3617--3620.
[17]
Kornilios Kourtis, Vasileios Karakasis, Georgios Goumas, and Nectarios Koziris. 2011. CSX: An Extended Compression Format for Spmv on Shared Memory Systems. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP '11). ACM, New York, NY, USA, 247--256.
[18]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[19]
Daniel Langr and Pavel Tvrdik. 2016. Evaluation Criteria for Sparse Matrix Storage Formats. IEEE Trans. Parallel Distrib. Syst. 27, 2 (Feb. 2016), 428--440.
[20]
Jiajia Li, Guangming Tan, Mingyu Chen, and Ninghui Sun. 2013. SMAT: An Input Adaptive Auto-tuner for Sparse Matrix-vector Multiplication. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13). ACM, New York, NY, USA, 117--126.
[21]
Weifeng Liu. 2016. Benchmark SpMV using CSR5. https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5. (2016).
[22]
Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, and Brian Vinter. 2017. Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides. Concurrency and Computation: Practice and Experience 29, 21 (2017), e4244-n/a.
[23]
Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, 339--350.
[24]
Weifeng Liu and Brian Vinter. 2015. Speculative Segmented Sum for Sparse Matrix-vector Multiplication on Heterogeneous Processors. Parallel Comput. 49, C (Nov. 2015), 179--193.
[25]
Xing Liu, Mikhail Smelyanskiy, Edmond Chow, and Pradeep Dubey. 2013. Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS '13). ACM, New York, NY, USA, 273--282.
[26]
Duane Merrill and Michael Garland. 2016. Merge-based Sparse Matrix-vector Multiplication (SpMV) Using the CSR Storage Format. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). ACM, New York, NY, USA, Article 43, 2 pages.
[27]
Kevin P Murphy. 2012. Machine learning: a probabilistic perspective. MIT press.
[28]
M Naumov, LS Chien, P Vandermersch, and U Kapasi. 2010. CUSPARSE Library: A Set of Basic Linear Algebra Subroutines for Sparse Matrices. In GPU Technology Conference, Vol. 2070.
[29]
Eunjung Park, L.-N. Pouche, J. Cavazos, A. Cohen, and P. Sadayappan. 2011. Predictive modeling in a polyhedral optimization space. In IEEE/ACM International Symposium on Code Generation and Optimization. 119 --129.
[30]
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN Features off-the-shelf: an Astounding Baseline for Recognition. CoRR abs/1403.6382 (2014). http://arxiv.org/abs/1403.6382
[31]
Yousef Saad. 1994. SPARSKIT : a basic tool kit for sparse matrix computations. Technical Report. University of Minnesota.
[32]
Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, and P. Sadayappan. 2015. Automatic Selection of Sparse Matrix Representation on GPUs. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, 99--108.
[33]
Bor-Yiing Su and Kurt Keutzer. 2012. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS '12). ACM, New York, NY, USA, 353--364.
[34]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.
[35]
Guangming Tan, Junhong Liu, and Jiajia Li. 2018. Design and Implementation of Adaptive SpMV Library for Multicore and Manycore Architecture. ACM Trans. Math. Softw. (To appear) (2018).
[36]
K. Tian, Y. Jiang, E. Zhang, and X. Shen. 2010. An Input-Centric Paradigm for Program Dynamic Optimizations. In the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA).
[37]
Richard Wilson Vuduc. 2003. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. AAI3121741.
[38]
Richard W. Vuduc and Hyun-Jin Moon. 2005. Fast Sparse Matrix-vector Multiplication by Exploiting Variable Block Structure. In Proceedings of the First International Conference on High Performance Computing and Communications (HPCC'05). Springer-Verlag, Berlin, Heidelberg, 807--816.
[39]
Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi™. Springer, 167--188.
[40]
Xinliang Wang, Weifeng Liu, Wei Xue, and Li Wu. 2018. swSpTRSV: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (To appear) (PPoPP '18).
[41]
Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2009. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput. 35, 3 (2009), 178 -- 194. Revolutionary Technologies for Acceleration of Emerging Petascale Applications.
[42]
Biwei Xie, Jianfeng Zhan, Zhen Jia, Wanling Gao, Lixin Zhang, and Xu Liu. 2018. CVR: Efficient SpMV Vectorization on X86 Processors. The 2018 International Symposium on Code Generation and Optimization (To appear) (2018).
[43]
Shengen Yan, Chao Li, Yunquan Zhang, and Huiyang Zhou. 2014. yaSpMV: Yet Another SpMV Framework on GPUs. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 107--118.
[44]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? CoRR abs/1411.1792 (2014). http://arxiv.org/abs/1411.1792

Cited By

View all
  • (2025)Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph ReorderingProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710881(16-28)Online publication date: 28-Feb-2025
  • (2025)RASSM: Residue-based Acceleration of Single Sparse Matrix Computation via Adaptive TilingProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707219(907-923)Online publication date: 30-Mar-2025
  • (2024)Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU SystemsACM Transactions on Architecture and Code Optimization10.1145/367684721:4(1-24)Online publication date: 8-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2018
442 pages
ISBN:9781450349826
DOI:10.1145/3178487
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 53, Issue 1
    PPoPP '18
    January 2018
    426 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3200691
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 10 February 2018

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. SpMV
  2. convolutional neural network
  3. deep learning
  4. format selection
  5. sparse matrix

Qualifiers

  • Research-article

Funding Sources

Conference

PPoPP '18

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)482
  • Downloads (Last 6 weeks)67
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph ReorderingProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710881(16-28)Online publication date: 28-Feb-2025
  • (2025)RASSM: Residue-based Acceleration of Single Sparse Matrix Computation via Adaptive TilingProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707219(907-923)Online publication date: 30-Mar-2025
  • (2024)Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU SystemsACM Transactions on Architecture and Code Optimization10.1145/367684721:4(1-24)Online publication date: 8-Jul-2024
  • (2024)CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673042(640-649)Online publication date: 12-Aug-2024
  • (2024)DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMVIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.348805335:12(2624-2639)Online publication date: Dec-2024
  • (2024)Sparse Matrix Reordering Method Selection with Parallel Computing and Deep Learning2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651141(1-8)Online publication date: 30-Jun-2024
  • (2024)Revisiting thread configuration of SpMV kernels on GPUJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104799185:COnline publication date: 4-Mar-2024
  • (2023)Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUsACM Transactions on Architecture and Code Optimization10.1145/360009220:3(1-26)Online publication date: 19-Jul-2023
  • (2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
  • (2023)BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUsProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593725(264-276)Online publication date: 21-Jun-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media