Performance Prediction for Sparse Matrix Vector Multiplication Using Structure-Dependent Features

Pogorelov, Konstantin; Trotter, James; Langguth, Johannes

doi:10.1007/978-3-031-50684-0_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14351))

Included in the following conference series:

European Conference on Parallel Processing

73 Accesses

Abstract

Sparse matrix-vector multiplication (SpMV) is one of the most important kernels in high-performance computing, with numerous applications in scientific computing, machine learning, and many other fields. Consequently, it has been studied extensively in the past decades, and many ideas for optimizing and predicting SpMV performance have been proposed. Unlike dense matrix operations, where performance is mostly determined by the floating-point capabilities of the system, the performance of sparse kernels like SpMV can vary widely for each combination of algorithm, the matrix itself, and hardware architecture. And while these performance differences can be puzzling, it is widely understood that the reuse of the data elements is a crucial determinant of SpMV performance. This in turn means that a simple reordering of a matrix using methods such as the Cuthill-McKee algorithm can have a massive impact on the SpMV performance.

However, performing such a reordering is costly and not all sparse matrices benefit from reordering. Therefore, it would be desirable to predict whether reordering a matrix will benefit SpMV performance. Most existing systems for SpMV performance prediction are not suitable for this purpose since they only use order-invariant features such as the relative length of the rows. Consequently, they are incapable of predicting the performance of a reordered matrix.

In this work we present a machine learning system based on order-dependent features that is capable of such predictions. We perform an experimental evaluation for large instances on multiple modern CPU architectures, showing that our system is capable of predicting reordering benefit with 94% accuracy.

This work was supported by the European High-Performance Computing Joint Undertaking under grant agreement No. 956213. The research presented in this paper has benefited from the Experimental Infrastructure for Exploration of Exascale Computing (eX3), which is financially supported by the Research Council of Norway under contract 270053.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Azad, A., Jacquelin, M., Buluç, A., Ng, E.G.: The reverse Cuthill-McKee algorithm in distributed-memory. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 22–31. IEEE (2017)
Google Scholar
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. Technical report, Citeseer (2008)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Google Scholar
Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference. ACM ’69, pp. 157–172. Association for Computing Machinery, New York, NY, USA (1969). https://doi.org/10.1145/800195.805928
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)
Google Scholar
Dhandhania, S., Deodhar, A., Pogorelov, K., Biswas, S., Langguth, J.: Explaining the performance of supervised and semi-supervised methods for automated sparse matrix format selection. In: 50th International Conference on Parallel Processing Workshop, pp. 1–10 (2021)
Google Scholar
Dongarra, J., Luszczek, P., Heroux, M.: HPCG technical specification. Sandia National Laboratories, Sandia Report SAND2013-8752 (2013)
Google Scholar
Kunegis, J.: KONECT - The Koblenz network collection. In: Proceedings of International Conference on on World Wide Web Companion, pp. 1343–1350 (2013). http://dl.acm.org/citation.cfm?id=2488173
Langguth, J., Arevalo, H., Hustad, K.G., Cai, X.: Towards detailed real-time simulations of cardiac arrhythmia. In: 2019 Computing in Cardiology (CinC), p. 1. IEEE (2019)
Google Scholar
Langguth, J., Sourouri, M., Lines, G.T., Baden, S.B., Cai, X.: Scalable heterogeneous CPU-GPU computations for unstructured tetrahedral meshes. IEEE Micro 35(4), 6–15 (2015)
Article Google Scholar
Langguth, J., Wu, N., Chai, J., Cai, X.: Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes. J. Parallel Distrib. Comput. 76, 120–131 (2015). https://doi.org/10.1016/j.jpdc.2014.10.005
Markov, Z., Russell, I.: An introduction to the WEKA data mining system. ACM SIGCSE Bull. 38(3), 367–368 (2006)
Article Google Scholar
Merrill, D., Garland, M.: Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2016)
Google Scholar
Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray Users Group (CUG) 19, 45–74 (2010)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Google Scholar
Papadimitriou, C.H.: The NP-completeness of the bandwidth minimization problem. Computing 16(3), 263–270 (1976)
Article MathSciNet Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Pichel, J.C., Singh, D.E., Carretero, J.: Reordering algorithms for increasing locality on multicore processors. In: 2008 10th IEEE International Conference on High Performance Computing and Communications, pp. 123–130. IEEE (2008)
Google Scholar
Sanders, P., Schulz, C.: Think locally, act globally: highly balanced graph partitioning. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 164–175. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38527-8_16
Chapter Google Scholar
Toledo, S.: Improving the memory-system performance of sparse-matrix vector multiplication. IBM J. Res. Dev. 41(6), 711–725 (1997)
Article Google Scholar
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)
Article MathSciNet Google Scholar
Zhao, Y., Li, J., Liao, C., Shen, X.: Bridging the gap between deep learning and sparse matrix format selection. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 94–108 (2018)
Google Scholar
Zhao, Y., Zhou, W., Shen, X., Yiu, G.: Overhead-conscious format selection for SpMV-based applications. In: IEEE International Parallel and Distributed Processing Symposium, pp. 950–959, May 2018
Google Scholar

Download references

Author information

Authors and Affiliations

Simula Research Laboratory, Oslo, Norway
Konstantin Pogorelov, James Trotter & Johannes Langguth
University of Bergen, Bergen, Norway
Johannes Langguth

Authors

Konstantin Pogorelov
View author publications
You can also search for this author in PubMed Google Scholar
James Trotter
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Langguth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johannes Langguth .

Editor information

Editors and Affiliations

University of Cyprus, Nicosia, Cyprus
Demetris Zeinalipour
University of Santiago de Compostela, Santiago de Compostela, Spain
Dora Blanco Heras
University of Cyprus, Nicosia, Cyprus
George Pallis
Cyprus University of Technology, Limassol, Cyprus
Herodotos Herodotou
University of Nicosia, Nicosia, Cyprus
Demetris Trihinas
Inria, Nantes, France
Daniel Balouek
Louisiana State University, Baton Rouge, LA, USA
Patrick Diehl
Karlsruhe Institute of Technology, Karlsruhe, Germany
Terry Cojean
Ludwig-Maximilians-Universität, Munich, Germany
Karl Fürlinger
Roskilde University, Roskilde, Denmark
Maja Hanne Kirkeby
Bank of Italy, Rome, Italy
Matteo Nardelli
Roma Tre University, Rome, Italy
Pierangelo Di Sanzo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pogorelov, K., Trotter, J., Langguth, J. (2024). Performance Prediction for Sparse Matrix Vector Multiplication Using Structure-Dependent Features. In: Zeinalipour, D., et al. Euro-Par 2023: Parallel Processing Workshops. Euro-Par 2023. Lecture Notes in Computer Science, vol 14351. Springer, Cham. https://doi.org/10.1007/978-3-031-50684-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-50684-0_11
Published: 16 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50683-3
Online ISBN: 978-3-031-50684-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance Prediction for Sparse Matrix Vector Multiplication Using Structure-Dependent Features