Skip to main content

Performance Prediction for Sparse Matrix Vector Multiplication Using Structure-Dependent Features

  • Conference paper
  • First Online:
Euro-Par 2023: Parallel Processing Workshops (Euro-Par 2023)

Abstract

Sparse matrix-vector multiplication (SpMV) is one of the most important kernels in high-performance computing, with numerous applications in scientific computing, machine learning, and many other fields. Consequently, it has been studied extensively in the past decades, and many ideas for optimizing and predicting SpMV performance have been proposed. Unlike dense matrix operations, where performance is mostly determined by the floating-point capabilities of the system, the performance of sparse kernels like SpMV can vary widely for each combination of algorithm, the matrix itself, and hardware architecture. And while these performance differences can be puzzling, it is widely understood that the reuse of the data elements is a crucial determinant of SpMV performance. This in turn means that a simple reordering of a matrix using methods such as the Cuthill-McKee algorithm can have a massive impact on the SpMV performance.

However, performing such a reordering is costly and not all sparse matrices benefit from reordering. Therefore, it would be desirable to predict whether reordering a matrix will benefit SpMV performance. Most existing systems for SpMV performance prediction are not suitable for this purpose since they only use order-invariant features such as the relative length of the rows. Consequently, they are incapable of predicting the performance of a reordered matrix.

In this work we present a machine learning system based on order-dependent features that is capable of such predictions. We perform an experimental evaluation for large instances on multiple modern CPU architectures, showing that our system is capable of predicting reordering benefit with 94% accuracy.

This work was supported by the European High-Performance Computing Joint Undertaking under grant agreement No. 956213. The research presented in this paper has benefited from the Experimental Infrastructure for Exploration of Exascale Computing (eX3), which is financially supported by the Research Council of Norway under contract 270053.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Azad, A., Jacquelin, M., Buluç, A., Ng, E.G.: The reverse Cuthill-McKee algorithm in distributed-memory. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 22–31. IEEE (2017)

    Google Scholar 

  2. Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. Technical report, Citeseer (2008)

    Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  4. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  5. Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference. ACM ’69, pp. 157–172. Association for Computing Machinery, New York, NY, USA (1969). https://doi.org/10.1145/800195.805928

  6. Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)

    Google Scholar 

  7. Dhandhania, S., Deodhar, A., Pogorelov, K., Biswas, S., Langguth, J.: Explaining the performance of supervised and semi-supervised methods for automated sparse matrix format selection. In: 50th International Conference on Parallel Processing Workshop, pp. 1–10 (2021)

    Google Scholar 

  8. Dongarra, J., Luszczek, P., Heroux, M.: HPCG technical specification. Sandia National Laboratories, Sandia Report SAND2013-8752 (2013)

    Google Scholar 

  9. Kunegis, J.: KONECT - The Koblenz network collection. In: Proceedings of International Conference on on World Wide Web Companion, pp. 1343–1350 (2013). http://dl.acm.org/citation.cfm?id=2488173

  10. Langguth, J., Arevalo, H., Hustad, K.G., Cai, X.: Towards detailed real-time simulations of cardiac arrhythmia. In: 2019 Computing in Cardiology (CinC), p. 1. IEEE (2019)

    Google Scholar 

  11. Langguth, J., Sourouri, M., Lines, G.T., Baden, S.B., Cai, X.: Scalable heterogeneous CPU-GPU computations for unstructured tetrahedral meshes. IEEE Micro 35(4), 6–15 (2015)

    Article  Google Scholar 

  12. Langguth, J., Wu, N., Chai, J., Cai, X.: Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes. J. Parallel Distrib. Comput. 76, 120–131 (2015). https://doi.org/10.1016/j.jpdc.2014.10.005

  13. Markov, Z., Russell, I.: An introduction to the WEKA data mining system. ACM SIGCSE Bull. 38(3), 367–368 (2006)

    Article  Google Scholar 

  14. Merrill, D., Garland, M.: Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2016)

    Google Scholar 

  15. Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray Users Group (CUG) 19, 45–74 (2010)

    Google Scholar 

  16. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)

    Google Scholar 

  17. Papadimitriou, C.H.: The NP-completeness of the bandwidth minimization problem. Computing 16(3), 263–270 (1976)

    Article  MathSciNet  Google Scholar 

  18. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  19. Pichel, J.C., Singh, D.E., Carretero, J.: Reordering algorithms for increasing locality on multicore processors. In: 2008 10th IEEE International Conference on High Performance Computing and Communications, pp. 123–130. IEEE (2008)

    Google Scholar 

  20. Sanders, P., Schulz, C.: Think locally, act globally: highly balanced graph partitioning. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 164–175. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38527-8_16

    Chapter  Google Scholar 

  21. Toledo, S.: Improving the memory-system performance of sparse-matrix vector multiplication. IBM J. Res. Dev. 41(6), 711–725 (1997)

    Article  Google Scholar 

  22. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)

    Article  MathSciNet  Google Scholar 

  23. Zhao, Y., Li, J., Liao, C., Shen, X.: Bridging the gap between deep learning and sparse matrix format selection. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 94–108 (2018)

    Google Scholar 

  24. Zhao, Y., Zhou, W., Shen, X., Yiu, G.: Overhead-conscious format selection for SpMV-based applications. In: IEEE International Parallel and Distributed Processing Symposium, pp. 950–959, May 2018

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johannes Langguth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pogorelov, K., Trotter, J., Langguth, J. (2024). Performance Prediction for Sparse Matrix Vector Multiplication Using Structure-Dependent Features. In: Zeinalipour, D., et al. Euro-Par 2023: Parallel Processing Workshops. Euro-Par 2023. Lecture Notes in Computer Science, vol 14351. Springer, Cham. https://doi.org/10.1007/978-3-031-50684-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50684-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50683-3

  • Online ISBN: 978-3-031-50684-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics