skip to main content
10.1145/3572848.3577506acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Public Access

WISE: Predicting the Performance of Sparse Matrix Vector Multiplication with Machine Learning

Published: 21 February 2023 Publication History

Abstract

Sparse Matrix-Vector Multiplication (SpMV) is an essential sparse kernel. Numerous methods have been developed to accelerate SpMV. However, no single method consistently gives the highest performance across a wide range of matrices. For this reason, a performance prediction model is needed to predict the best SpMV method for a given sparse matrix. Unfortunately, predicting SpMV's performance is challenging due to the diversity of factors that impact it.
In this work, we develop a machine learning framework called WISE that accurately predicts the magnitude of the speedups of different SpMV methods over a baseline method for a given sparse matrix. WISE relies on a novel feature set that summarizes a matrix's size, skew, and locality traits. WISE can then select the best SpMV method for each specific matrix. With a set of nearly 1,500 matrices, we show that using WISE delivers an average speedup of 2.4× over using Intel's MKL in a 24-core server.

References

[1]
Walid Abu-Sufah and Asma Abdel Karim. 2013. Auto-tuning of Sparse Matrix-Vector Multiplication on Graphics Processors. In Supercomputing, Julian Martin Kunkel, Thomas Ludwig, and Hans Werner Meuer (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 151--164.
[2]
R. C. Agarwal, F. G. Gustavson, and M. Zubair. 1992. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Supercomputing '92:Proceedings of the 1992 ACM/IEEE Conference on Supercomputing. 32--41.
[3]
J. Arai, H. Shiokawa, T. Yamamuro, M. Onizuka, and S. Iwamura. 2016. Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 22--31.
[4]
S. Beamer, K. Asanovic, and D. Patterson. 2017. Reducing Pagerank Communication via Propagation Blocking. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 820--831.
[5]
Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. CoRR abs/1508.03619 (2015). arXiv:1508.03619 http://arxiv.org/abs/1508.03619
[6]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A Multiresolution Coordinate-free Ordering for Compressing Social Networks. In Proceedings of the 20th International Conference on World Wide Web (Hyderabad, India) (WWW '11). ACM, New York, NY, USA, 587--596.
[7]
Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. Comput. Netw. ISDN Syst. 30, 1--7 (April 1998), 107--117.
[8]
Daniele Buono, Fabrizio Petrini, Fabio Checconi, Xing Liu, Xinyu Que, Chris Long, and Tai-Ching Tuan. 2016. Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS '16). ACM, New York, NY, USA, Article 37, 12 pages.
[9]
Alfredo Buttari, Victor Eijkhout, Julien Langou, and Salvatore Filippone. 2007. Performance Optimization and Modeling of Blocked Sparse Kernels. Int. J. High Perform. Comput. Appl. 21, 4 (Nov. 2007), 467--484.
[10]
U. V. Catalyurek and C. Aykanat. 1999. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Transactions on Parallel and Distributed Systems 10, 7 (July 1999), 673--693.
[11]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A Recursive Model for Graph Mining. In Proceedings of the 2004 SIAM International Conference on Data Mining (SDM). 442--446.
[12]
Linchuan Chen, Peng Jiang, and Gagan Agrawal. 2016. Exploiting Recent SIMD Architectural Advances for Irregular Applications. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (Barcelona, Spain) (CGO '16). ACM, New York, NY, USA, 47--58.
[13]
Intel Corp. 2015. Intel Math Kernel Library Inspector-executor Sparse BLAS Routines. https://software.intel.com/en-us/articles/intel-math-kernel-library-inspector-executor-sparse-blas-routines.
[14]
Intel Corp. 2015. Sparse BLAS CSR Matrix Storage Format. https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/appendix-a-linear-solvers-basics/sparse-matrix-storage-formats/sparse-blas-csr-matrix-storage-format.
[15]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages.
[16]
Daniel Funke, Sebastian Lamm, Ulrich Meyer, Peter Sanders, Manuel Penschuck, Christian Schulz, Darren Strash, and Moritz von Looz. 2019. Communication-free Massively Distributed Graph Generation. J. Parallel and Distrib. Comput. 131, C (2019).
[17]
Eun-Jin Im, Katherine Yelick, and Richard Vuduc. 2004. Sparsity: Optimization Framework for Sparse Matrix Kernels. The International Journal of High Performance Computing Applications 18, 1 (2004), 135--158. arXiv:https://doi.org/10.1177/1094342004041296
[18]
Z. Jia, L. Wang, J. Zhan, L. Zhang, and C. Luo. 2013. Characterizing data analysis workloads in data centers. In 2013 IEEE International Symposium on Workload Characterization (IISWC). 66--76.
[19]
Vladimir Kiriansky, Yunming Zhang, and Saman Amarasinghe. 2016. Optimizing Indirect Memory References with Milk. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (Haifa, Israel) (PACT '16). ACM, New York, NY, USA, 299--312.
[20]
Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM 46, 5 (Sept. 1999), 604--632.
[21]
Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, and Alan R. Bishop. 2014. A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units. SIAM Journal on Scientific Computing 36, 5 (Jan 2014), C401--C423.
[22]
Jérôme Kunegis and Julia Preusse. 2012. Fairness on the Web: Alternatives to the Power Law. In Proceedings of the 4th Annual ACM Web Science Conference (Evanston, Illinois) (WebSci '12). Association for Computing Machinery, New York, NY, USA, 175--184.
[23]
Jiajia Li, Guangming Tan, Mingyu Chen, and Ninghui Sun. 2013. SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA) (PLDI '13). Association for Computing Machinery, New York, NY, USA, 117--126.
[24]
Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (Newport Beach, California, USA) (ICS '15). ACM, New York, NY, USA, 339--350.
[25]
Xing Liu, Mikhail Smelyanskiy, Edmond Chow, and Pradeep Dubey. 2013. Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (Eugene, Oregon, USA) (ICS '13). ACM, New York, NY, USA, 273--282.
[26]
Ke Meng, Jiajia Li, Guangming Tan, and Ninghui Sun. 2019. A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (Washington, District of Columbia) (PPoPP '19). Association for Computing Machinery, New York, NY, USA, 201--213.
[27]
Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2014. Graph Structure in the Web --- Revisited: A Trick of the Heavy Tail. In Proceedings of the 23rd International Conference on World Wide Web (Seoul, Korea) (WWW '14 Companion). ACM, New York, NY, USA, 427--432.
[28]
Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan. 2010. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In High Performance Embedded Architectures and Compilers, Yale N. Patt, Pierfrancesco Foglia, Evelyn Duesterwald, Paolo Faraboschi, and Xavier Martorell (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 111--125.
[29]
Richard C Murphy, Kyle B Wheeler, Brian W Barrett, and James A Ang. 2010. Introducing the Graph 500. Cray Users Group (CUG) 19 (2010), 45--74.
[30]
Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, and P. Sadayappan. 2015. Automatic Selection of Sparse Matrix Representation on GPUs. In Proceedings of the 29th ACM on International Conference on Supercomputing (Newport Beach, California, USA) (ICS '15). Association for Computing Machinery, New York, NY, USA, 99--108.
[31]
Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An Approach for Code Generation in the Sparse Polyhedral Framework. Parallel Comput. 53, C (April 2016), 32--57.
[32]
W. T. Tang, R. Zhao, M. Lu, Y. Liang, H. P. Huyng, X. Li, and R. S. M. Goh. 2015. Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi. In 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 136--145.
[33]
Anand Venkat, Manu Shantharam, Mary Hall, and Michelle Mills Strout. 2014. Non-Affine Extensions to Polyhedral Code Generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (Orlando, FL, USA) (CGO'14). Association for Computing Machinery, New York, NY, USA, 185--194.
[34]
R. Vuduc, J. W. Demmel, K. A. Yelick, S. Kamil, R. Nishtala, and B. Lee. 2002. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply. In SC '02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing. 26--26.
[35]
Hao Wang, Liang Geng, Rubao Lee, Kaixi Hou, Yanfeng Zhang, and Xiaodong Zhang. 2019. SEP-Graph: Finding Shortest Execution Paths for Graph Processing under a Hybrid Framework on GPU. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (Washington, District of Columbia) (PPoPP '19). Association for Computing Machinery, New York, NY, USA, 38--52.
[36]
L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, X. Li, and B. Qiu. 2014. BigDataBench: A big data benchmark suite from internet services. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 488--499.
[37]
Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup Graph Processing by Graph Ordering. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). ACM, New York, NY, USA, 1813--1828.
[38]
Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: Efficient Vectorization of SpMV on x86 Processors. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (Vienna, Austria) (CGO 2018). ACM, New York, NY, USA, 149--162.
[39]
Zhen Xie, Guangming Tan, Weifeng Liu, and Ninghui Sun. 2019. IA-SpGEMM: An Input-Aware Auto-Tuning Framework for Parallel Sparse Matrix-Matrix Multiplication. In Proceedings of the ACM International Conference on Supercomputing (Phoenix, Arizona) (ICS '19). Association for Computing Machinery, New York, NY, USA, 94--105.
[40]
Serif Yesil, Azin Heidarshenas, Adam Morrison, and Josep Torrellas. 2020. Speeding Up SpMV for Power-Law Graph Analytics by Enhancing Locality and Vectorization. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[41]
Buse Yilmaz, Bariş Aktemur, María J. Garzarán, Sam Kamin, and Furkan Kiraç. 2016. Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication. ACM Trans. Archit. Code Optim. 13, 1, Article 5 (March 2016), 26 pages.
[42]
Y. Zhang, V. Kiriansky, C. Mendis, S. Amarasinghe, and M. Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). 293--302.
[43]
Yue Zhao, Jiajia Li, Chunhua Liao, and Xipeng Shen. 2018. Bridging the Gap between Deep Learning and Sparse Matrix Format Selection. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Vienna, Austria) (PPoPP '18). Association for Computing Machinery, New York, NY, USA, 94--108.
[44]
Yue Zhao, Weijie Zhou, Xipeng Shen, and Graham Yiu. 2018. Overhead-Conscious Format Selection for SpMV-Based Applications. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 950--959.
[45]
Weijie Zhou, Yue Zhao, Xipeng Shen, and Wang Chen. 2020. Enabling Runtime SpMV Format Selection through an Overhead Conscious Method. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2020), 80--93.

Cited By

View all
  • (2024)Optimizing SpMV on Heterogeneous Multi-Core DSPs through Improved Locality and VectorizationProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673061(1145-1155)Online publication date: 12-Aug-2024
  • (2024)CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673042(640-649)Online publication date: 12-Aug-2024
  • (2024)POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU ClustersProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638488(469-471)Online publication date: 2-Mar-2024
  • Show More Cited By

Index Terms

  1. WISE: Predicting the Performance of Sparse Matrix Vector Multiplication with Machine Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
    February 2023
    480 pages
    ISBN:9798400700156
    DOI:10.1145/3572848
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 February 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. SpMV
    2. machine learning
    3. sparse matrix

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    PPoPP '23

    Acceptance Rates

    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)446
    • Downloads (Last 6 weeks)38
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Optimizing SpMV on Heterogeneous Multi-Core DSPs through Improved Locality and VectorizationProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673061(1145-1155)Online publication date: 12-Aug-2024
    • (2024)CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPUProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673042(640-649)Online publication date: 12-Aug-2024
    • (2024)POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU ClustersProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638488(469-471)Online publication date: 2-Mar-2024
    • (2024)Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMMProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640427(1200-1217)Online publication date: 27-Apr-2024
    • (2024)Seer: Predictive Runtime Kernel Selection for Irregular ProblemsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444812(133-142)Online publication date: 2-Mar-2024
    • (2024)Review on Recent Matrix Multiplication Optimization Using Deep LearningIntelligent and Fuzzy Systems10.1007/978-3-031-70018-7_41(359-371)Online publication date: 1-Sep-2024
    • (2023)Large-Scale Simulation of Structural Dynamics Computing on GPU ClustersProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607082(1-14)Online publication date: 12-Nov-2023
    • (2023)DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607051(1-14)Online publication date: 12-Nov-2023
    • (2023)SPADE: A Flexible and Scalable Accelerator for SpMM and SDDMMProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589054(1-15)Online publication date: 17-Jun-2023
    • (2023)HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00025(209-220)Online publication date: 31-Oct-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media