Abstract
Sparse matrix-vector multiplication (SpMV) is one of the most important kernels in high-performance computing, with numerous applications in scientific computing, machine learning, and many other fields. Consequently, it has been studied extensively in the past decades, and many ideas for optimizing and predicting SpMV performance have been proposed. Unlike dense matrix operations, where performance is mostly determined by the floating-point capabilities of the system, the performance of sparse kernels like SpMV can vary widely for each combination of algorithm, the matrix itself, and hardware architecture. And while these performance differences can be puzzling, it is widely understood that the reuse of the data elements is a crucial determinant of SpMV performance. This in turn means that a simple reordering of a matrix using methods such as the Cuthill-McKee algorithm can have a massive impact on the SpMV performance.
However, performing such a reordering is costly and not all sparse matrices benefit from reordering. Therefore, it would be desirable to predict whether reordering a matrix will benefit SpMV performance. Most existing systems for SpMV performance prediction are not suitable for this purpose since they only use order-invariant features such as the relative length of the rows. Consequently, they are incapable of predicting the performance of a reordered matrix.
In this work we present a machine learning system based on order-dependent features that is capable of such predictions. We perform an experimental evaluation for large instances on multiple modern CPU architectures, showing that our system is capable of predicting reordering benefit with 94% accuracy.
This work was supported by the European High-Performance Computing Joint Undertaking under grant agreement No. 956213. The research presented in this paper has benefited from the Experimental Infrastructure for Exploration of Exascale Computing (eX3), which is financially supported by the Research Council of Norway under contract 270053.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Azad, A., Jacquelin, M., Buluç, A., Ng, E.G.: The reverse Cuthill-McKee algorithm in distributed-memory. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 22–31. IEEE (2017)
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. Technical report, Citeseer (2008)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference. ACM ’69, pp. 157–172. Association for Computing Machinery, New York, NY, USA (1969). https://doi.org/10.1145/800195.805928
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)
Dhandhania, S., Deodhar, A., Pogorelov, K., Biswas, S., Langguth, J.: Explaining the performance of supervised and semi-supervised methods for automated sparse matrix format selection. In: 50th International Conference on Parallel Processing Workshop, pp. 1–10 (2021)
Dongarra, J., Luszczek, P., Heroux, M.: HPCG technical specification. Sandia National Laboratories, Sandia Report SAND2013-8752 (2013)
Kunegis, J.: KONECT - The Koblenz network collection. In: Proceedings of International Conference on on World Wide Web Companion, pp. 1343–1350 (2013). http://dl.acm.org/citation.cfm?id=2488173
Langguth, J., Arevalo, H., Hustad, K.G., Cai, X.: Towards detailed real-time simulations of cardiac arrhythmia. In: 2019 Computing in Cardiology (CinC), p. 1. IEEE (2019)
Langguth, J., Sourouri, M., Lines, G.T., Baden, S.B., Cai, X.: Scalable heterogeneous CPU-GPU computations for unstructured tetrahedral meshes. IEEE Micro 35(4), 6–15 (2015)
Langguth, J., Wu, N., Chai, J., Cai, X.: Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes. J. Parallel Distrib. Comput. 76, 120–131 (2015). https://doi.org/10.1016/j.jpdc.2014.10.005
Markov, Z., Russell, I.: An introduction to the WEKA data mining system. ACM SIGCSE Bull. 38(3), 367–368 (2006)
Merrill, D., Garland, M.: Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2016)
Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray Users Group (CUG) 19, 45–74 (2010)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Papadimitriou, C.H.: The NP-completeness of the bandwidth minimization problem. Computing 16(3), 263–270 (1976)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pichel, J.C., Singh, D.E., Carretero, J.: Reordering algorithms for increasing locality on multicore processors. In: 2008 10th IEEE International Conference on High Performance Computing and Communications, pp. 123–130. IEEE (2008)
Sanders, P., Schulz, C.: Think locally, act globally: highly balanced graph partitioning. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 164–175. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38527-8_16
Toledo, S.: Improving the memory-system performance of sparse-matrix vector multiplication. IBM J. Res. Dev. 41(6), 711–725 (1997)
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)
Zhao, Y., Li, J., Liao, C., Shen, X.: Bridging the gap between deep learning and sparse matrix format selection. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 94–108 (2018)
Zhao, Y., Zhou, W., Shen, X., Yiu, G.: Overhead-conscious format selection for SpMV-based applications. In: IEEE International Parallel and Distributed Processing Symposium, pp. 950–959, May 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pogorelov, K., Trotter, J., Langguth, J. (2024). Performance Prediction for Sparse Matrix Vector Multiplication Using Structure-Dependent Features. In: Zeinalipour, D., et al. Euro-Par 2023: Parallel Processing Workshops. Euro-Par 2023. Lecture Notes in Computer Science, vol 14351. Springer, Cham. https://doi.org/10.1007/978-3-031-50684-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-50684-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50683-3
Online ISBN: 978-3-031-50684-0
eBook Packages: Computer ScienceComputer Science (R0)