Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3s

Guo, Jihu; Liu, Jie; Wang, Qinglin; Zhu, Xiaoxiong

doi:10.1007/978-981-97-0801-7_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14488))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

153 Accesses

Abstract

Sparse matrix-vector multiplication (SpMV) is extensively used in scientific computing and often accounts for a significant portion of the overall computational overhead. Therefore, improving the performance of SpMV is crucial. However, sparse matrices exhibit a sporadic and irregular distribution of non-zero elements, resulting in workload imbalance among threads and challenges in vectorization. To address these issues, numerous efforts have focused on optimizing SpMV based on the hardware characteristics of computing platforms. In this paper, we present an optimization on CSR-Based SpMV, since the CSR format is the most widely used and supported by various high-performance sparse computing libraries, on a novel MIMD computing platform Pezy-SC3s. Based on the hardware characteristics of Pezy-SC3s, we tackle poor data locality, workload imbalance, and vectorization challenges in CSR-Based SpMV by employing matrix chunking, applying Atomic Cache for workload scheduling, and utilizing SIMD instructions during performing SpMV. As the first study to investigate SpMV optimization on Pezy-SC3s, we evaluate the performance of our work by comparing it with the CSR-Based SpMV and SpMV provided by Nvidia’s CuSparse. Through experiments conducted on 2092 matrices obtained from SuiteSparse, we demonstrate that our optimization achieves a maximum speedup ratio of x17.63 and an average of x1.56 over CSR-Based SpMV and an average bandwidth utilization of 35.22\(\%\) for large-scale matrices (\(nnz \ge 10^{6}\)) compared with 36.17\(\%\) obtained using CuSparse. These results demonstrate that our optimization effectively harnesses the hardware resources of Pezy-SC3s, leading to improved performance of CSR-Based SpMV.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ashari, A., Sedaghati, N., Eisenlohr, J., Parthasarathy, S., Sadayappan, P.: Fast Sparse Matrix-Vector Multiplication on GPUS for Graph Applications. IEEE
Google Scholar
Ashari, A., Sedaghati, N., Eisenlohr, J., Sadayappan, P.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs (2014)
Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Conference on High Performance Computing Networking (2009)
Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Conference on High Performance Computing Networking (2009)
Google Scholar
Bolz, J., Farmer, I., Grinspun, E., Schrder, P.: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. ACM (2003)
Google Scholar
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1–25 (2011)
MathSciNet Google Scholar
Heller, M., Oberhuber, T.: Adaptive row-grouped CSR format for storing of sparse matrices on GPU. Comput. Sci. (2012)
Google Scholar
Karakasis, V., Goumas, G.I., Koziris, N.: Perfomance models for blocked sparse matrix-vector multiplication kernels. In: International Conference on Parallel Processing (ICPP 2009), Vienna, 22–25 September 2009 (2009)
Google Scholar
Kepner, J., Bade, D., Buluc, A., Gilbert, J., Mattson, T., Meyerhenke, H.: Graphs, matrices, and the graphblas: seven good reasons. arXiv preprints (2015)
Google Scholar
Kepner, J., Gilbert, J.: Graph algorithms in the language of linear algebra. In: Opencoursesfree Org, pp. 315–337 (2011). https://doi.org/10.1137/1.9780898719918
Khairoutdinov, M.F., Randall, D.A.: A cloud resolving model as a cloud parameterization in the ncar community climate system model: preliminary results. Geophys. Res. Lett. 28(18), 3617–3620 (2001)
Article Google Scholar
Krotkiewski, M., Dabrowski, M.: Parallel symmetric sparse matrix-vector product on scalar multi-core cpus. Parall. Comput. 364, 181–198 (2010)
Google Scholar
Langr, D., Tvrdik, P.: Evaluation criteria for sparse matrix storage formats. IEEE Trans. Parallel Distrib. Syst. 27(2), 428–440 (2015)
Article Google Scholar
Li, C., Xia, T., Zhao, W., Zheng, N., Ren, P.: Spv8: pursuing optimal vectorization and regular computation pattern in SPMV. In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 661–666. IEEE (2021)
Google Scholar
Li, Y., et al.: VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors. J. Supercomput. 76, 2063–2081 (2020)
Google Scholar
Liang, Y., Tang, W.T., Zhao, R., Lu, M., Huynh, H.P., Goh, R.S.M.: Scale-free sparse matrix-vector multiplication on many-core architectures. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(12), 2106–2119 (2017). https://doi.org/10.1109/TCAD.2017.2681072
Liu, W., Vinter, B.: Csr5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 339–350 (2015)
Google Scholar
Maggioni, M., Bergerwolf, T.Y.: Adell: an adaptive warp-balancing ell format for efficient sparse matrix-vector multiplication on gpus. IEEE Comput. Soc. 11–20 (2013)
Google Scholar
Merrill, D., Garland, M.: Merge-based parallel sparse matrix-vector multiplication. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2016) (2017)
Google Scholar
Merrill, D., Garland, M.: Merge-based sparse matrix-vector multiplication (SPMV) using the CSR storage format. ACM Sigplan Notices 51(8), 1–2 (2016)
Google Scholar
Mohri, M.: Semiring frameworks and algorithms for shortest-distance problems. J. Automata. Lang. Combinator. (2002)
Google Scholar
Mu, S., et al.: GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication. Concurr. Comput. Pract. Exp. (2015)
Google Scholar
Ravishankar, M., et al.: Distributed memory code generation for mixed irregular/regular computations. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 65–75 (2015)
Google Scholar
Shalf, J., Dosanjh, S.S., Morrison, J.: Exascale computing technology challenges. In: High Performance Computing for Computational Science - 9th International conference, Berkeley, 22–25 June 2010 (VECPAR 2010), Revised Selected Papers (2010)
Google Scholar
STREAM: Sustainable memory bandwidth in high performance computers. http://www.cs.virginia.edu/stream/
Sundaram, N., et al.: Graphmat: high performance graph analytics made productive. arXiv preprints (2015)
Google Scholar
Venkat, A., Hall, M., Strout, M.: Loop and data transformations for sparse matrix code. ACM SIGPLAN Notices 50(6), 521–532 (2015)
Article Google Scholar
Vázquez, F., Fernández, J., Garzón, E.: A new approach for sparse matrix vector product on nvidia gpus. Concurr. Comput. Pract. Exp. 23(8), 815–826 (2011)
Google Scholar
Vázquez, F., Ortega, G., Fernández, J., Garzón, E.: Improving the performance of the sparse matrix vector product with gpus. In: IEEE International Conference on Computer and Information Technology (2010)
Google Scholar
Wang, E., et al.: Intel math kernel library. In: Wang, E., et al. (eds.) High-Performance Computing on the Intel® Xeon Phi™: How to Fully Exploit MIC Architectures, pp. 167–188. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06486-4_7
Wang, Y., et al.: Gunrock: GPU Graph Analytics (2017)
Google Scholar
Wright, M.: The Opportunities and Challenges of Exascale Computing (2010)
Google Scholar
Xie, B., et al.: CVR: efficient vectorization of SPMV on x86 processors. In: Proceedings of the 2018 International Symposium on Code Generation and Optimization, pp. 149–162 (2018)
Google Scholar
Yan, J., Chen, X., Liu, J.: CSR &RV: an efficient value compression format for sparse matrix-vector multiplication. In: Liu, S., Wei, X. (eds.) Network and Parallel Computing: 19th IFIP WG 10.3 International Conference, NPC 2022, Jinan, 24–25 September 2022, Proceedings, pp. 54–60. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21395-3_5
Yan, S., et al.: YASPMV: yet another SPMV framework on GPUS. ACM SIGPLAN Notices (2014)
Google Scholar
Zheng, C., Gu, S., Gu, T.X., Yang, B., Liu, X.P.: Biell: a bisection ellpack-based storage format for optimizing SPMV on GPUS. J. Parall. Distrib. Comput. 74(7), 2639–2647 (2014)
Google Scholar

Download references

Acknowledgements

This research was funded by the R &D project 2023YFA1011704, and we would like to thank the ICA3PP 2023 reviewers for their valuable revision comments. We will continue our research in the future to explore more efficient SpMV implementations on Pezy-SC3s and other platforms.

Author information

Authors and Affiliations

Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, 410073, China
Jihu Guo, Jie Liu, Qinglin Wang & Xiaoxiong Zhu
National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha, 410073, China
Jihu Guo, Jie Liu, Qinglin Wang & Xiaoxiong Zhu

Authors

Jihu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qinglin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxiong Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Liu .

Editor information

Editors and Affiliations

Royal Melbourne Institute of Technology, Melbourne, VIC, Australia
Zahir Tari
Tianjin University, Tianjin, China
Keqiu Li
University of Arizona, Tucson, AZ, USA
Hongyi Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, J., Liu, J., Wang, Q., Zhu, X. (2024). Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3s. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14488. Springer, Singapore. https://doi.org/10.1007/978-981-97-0801-7_2

Download citation

DOI: https://doi.org/10.1007/978-981-97-0801-7_2
Published: 01 March 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0800-0
Online ISBN: 978-981-97-0801-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3s