Abstract
Sparse matrix-vector multiplication (SpMV) is extensively used in scientific computing and often accounts for a significant portion of the overall computational overhead. Therefore, improving the performance of SpMV is crucial. However, sparse matrices exhibit a sporadic and irregular distribution of non-zero elements, resulting in workload imbalance among threads and challenges in vectorization. To address these issues, numerous efforts have focused on optimizing SpMV based on the hardware characteristics of computing platforms. In this paper, we present an optimization on CSR-Based SpMV, since the CSR format is the most widely used and supported by various high-performance sparse computing libraries, on a novel MIMD computing platform Pezy-SC3s. Based on the hardware characteristics of Pezy-SC3s, we tackle poor data locality, workload imbalance, and vectorization challenges in CSR-Based SpMV by employing matrix chunking, applying Atomic Cache for workload scheduling, and utilizing SIMD instructions during performing SpMV. As the first study to investigate SpMV optimization on Pezy-SC3s, we evaluate the performance of our work by comparing it with the CSR-Based SpMV and SpMV provided by Nvidia’s CuSparse. Through experiments conducted on 2092 matrices obtained from SuiteSparse, we demonstrate that our optimization achieves a maximum speedup ratio of x17.63 and an average of x1.56 over CSR-Based SpMV and an average bandwidth utilization of 35.22\(\%\) for large-scale matrices (\(nnz \ge 10^{6}\)) compared with 36.17\(\%\) obtained using CuSparse. These results demonstrate that our optimization effectively harnesses the hardware resources of Pezy-SC3s, leading to improved performance of CSR-Based SpMV.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ashari, A., Sedaghati, N., Eisenlohr, J., Parthasarathy, S., Sadayappan, P.: Fast Sparse Matrix-Vector Multiplication on GPUS for Graph Applications. IEEE
Ashari, A., Sedaghati, N., Eisenlohr, J., Sadayappan, P.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs (2014)
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Conference on High Performance Computing Networking (2009)
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Conference on High Performance Computing Networking (2009)
Bolz, J., Farmer, I., Grinspun, E., Schrder, P.: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. ACM (2003)
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1–25 (2011)
Heller, M., Oberhuber, T.: Adaptive row-grouped CSR format for storing of sparse matrices on GPU. Comput. Sci. (2012)
Karakasis, V., Goumas, G.I., Koziris, N.: Perfomance models for blocked sparse matrix-vector multiplication kernels. In: International Conference on Parallel Processing (ICPP 2009), Vienna, 22–25 September 2009 (2009)
Kepner, J., Bade, D., Buluc, A., Gilbert, J., Mattson, T., Meyerhenke, H.: Graphs, matrices, and the graphblas: seven good reasons. arXiv preprints (2015)
Kepner, J., Gilbert, J.: Graph algorithms in the language of linear algebra. In: Opencoursesfree Org, pp. 315–337 (2011). https://doi.org/10.1137/1.9780898719918
Khairoutdinov, M.F., Randall, D.A.: A cloud resolving model as a cloud parameterization in the ncar community climate system model: preliminary results. Geophys. Res. Lett. 28(18), 3617–3620 (2001)
Krotkiewski, M., Dabrowski, M.: Parallel symmetric sparse matrix-vector product on scalar multi-core cpus. Parall. Comput. 364, 181–198 (2010)
Langr, D., Tvrdik, P.: Evaluation criteria for sparse matrix storage formats. IEEE Trans. Parallel Distrib. Syst. 27(2), 428–440 (2015)
Li, C., Xia, T., Zhao, W., Zheng, N., Ren, P.: Spv8: pursuing optimal vectorization and regular computation pattern in SPMV. In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 661–666. IEEE (2021)
Li, Y., et al.: VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors. J. Supercomput. 76, 2063–2081 (2020)
Liang, Y., Tang, W.T., Zhao, R., Lu, M., Huynh, H.P., Goh, R.S.M.: Scale-free sparse matrix-vector multiplication on many-core architectures. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(12), 2106–2119 (2017). https://doi.org/10.1109/TCAD.2017.2681072
Liu, W., Vinter, B.: Csr5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 339–350 (2015)
Maggioni, M., Bergerwolf, T.Y.: Adell: an adaptive warp-balancing ell format for efficient sparse matrix-vector multiplication on gpus. IEEE Comput. Soc. 11–20 (2013)
Merrill, D., Garland, M.: Merge-based parallel sparse matrix-vector multiplication. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2016) (2017)
Merrill, D., Garland, M.: Merge-based sparse matrix-vector multiplication (SPMV) using the CSR storage format. ACM Sigplan Notices 51(8), 1–2 (2016)
Mohri, M.: Semiring frameworks and algorithms for shortest-distance problems. J. Automata. Lang. Combinator. (2002)
Mu, S., et al.: GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication. Concurr. Comput. Pract. Exp. (2015)
Ravishankar, M., et al.: Distributed memory code generation for mixed irregular/regular computations. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 65–75 (2015)
Shalf, J., Dosanjh, S.S., Morrison, J.: Exascale computing technology challenges. In: High Performance Computing for Computational Science - 9th International conference, Berkeley, 22–25 June 2010 (VECPAR 2010), Revised Selected Papers (2010)
STREAM: Sustainable memory bandwidth in high performance computers. http://www.cs.virginia.edu/stream/
Sundaram, N., et al.: Graphmat: high performance graph analytics made productive. arXiv preprints (2015)
Venkat, A., Hall, M., Strout, M.: Loop and data transformations for sparse matrix code. ACM SIGPLAN Notices 50(6), 521–532 (2015)
Vázquez, F., Fernández, J., Garzón, E.: A new approach for sparse matrix vector product on nvidia gpus. Concurr. Comput. Pract. Exp. 23(8), 815–826 (2011)
Vázquez, F., Ortega, G., Fernández, J., Garzón, E.: Improving the performance of the sparse matrix vector product with gpus. In: IEEE International Conference on Computer and Information Technology (2010)
Wang, E., et al.: Intel math kernel library. In: Wang, E., et al. (eds.) High-Performance Computing on the Intel® Xeon Phi™: How to Fully Exploit MIC Architectures, pp. 167–188. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06486-4_7
Wang, Y., et al.: Gunrock: GPU Graph Analytics (2017)
Wright, M.: The Opportunities and Challenges of Exascale Computing (2010)
Xie, B., et al.: CVR: efficient vectorization of SPMV on x86 processors. In: Proceedings of the 2018 International Symposium on Code Generation and Optimization, pp. 149–162 (2018)
Yan, J., Chen, X., Liu, J.: CSR &RV: an efficient value compression format for sparse matrix-vector multiplication. In: Liu, S., Wei, X. (eds.) Network and Parallel Computing: 19th IFIP WG 10.3 International Conference, NPC 2022, Jinan, 24–25 September 2022, Proceedings, pp. 54–60. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21395-3_5
Yan, S., et al.: YASPMV: yet another SPMV framework on GPUS. ACM SIGPLAN Notices (2014)
Zheng, C., Gu, S., Gu, T.X., Yang, B., Liu, X.P.: Biell: a bisection ellpack-based storage format for optimizing SPMV on GPUS. J. Parall. Distrib. Comput. 74(7), 2639–2647 (2014)
Acknowledgements
This research was funded by the R &D project 2023YFA1011704, and we would like to thank the ICA3PP 2023 reviewers for their valuable revision comments. We will continue our research in the future to explore more efficient SpMV implementations on Pezy-SC3s and other platforms.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Guo, J., Liu, J., Wang, Q., Zhu, X. (2024). Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3s. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14488. Springer, Singapore. https://doi.org/10.1007/978-981-97-0801-7_2
Download citation
DOI: https://doi.org/10.1007/978-981-97-0801-7_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0800-0
Online ISBN: 978-981-97-0801-7
eBook Packages: Computer ScienceComputer Science (R0)