Skip to main content

Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3s

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2023)

Abstract

Sparse matrix-vector multiplication (SpMV) is extensively used in scientific computing and often accounts for a significant portion of the overall computational overhead. Therefore, improving the performance of SpMV is crucial. However, sparse matrices exhibit a sporadic and irregular distribution of non-zero elements, resulting in workload imbalance among threads and challenges in vectorization. To address these issues, numerous efforts have focused on optimizing SpMV based on the hardware characteristics of computing platforms. In this paper, we present an optimization on CSR-Based SpMV, since the CSR format is the most widely used and supported by various high-performance sparse computing libraries, on a novel MIMD computing platform Pezy-SC3s. Based on the hardware characteristics of Pezy-SC3s, we tackle poor data locality, workload imbalance, and vectorization challenges in CSR-Based SpMV by employing matrix chunking, applying Atomic Cache for workload scheduling, and utilizing SIMD instructions during performing SpMV. As the first study to investigate SpMV optimization on Pezy-SC3s, we evaluate the performance of our work by comparing it with the CSR-Based SpMV and SpMV provided by Nvidia’s CuSparse. Through experiments conducted on 2092 matrices obtained from SuiteSparse, we demonstrate that our optimization achieves a maximum speedup ratio of x17.63 and an average of x1.56 over CSR-Based SpMV and an average bandwidth utilization of 35.22\(\%\) for large-scale matrices (\(nnz \ge 10^{6}\)) compared with 36.17\(\%\) obtained using CuSparse. These results demonstrate that our optimization effectively harnesses the hardware resources of Pezy-SC3s, leading to improved performance of CSR-Based SpMV.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ashari, A., Sedaghati, N., Eisenlohr, J., Parthasarathy, S., Sadayappan, P.: Fast Sparse Matrix-Vector Multiplication on GPUS for Graph Applications. IEEE

    Google Scholar 

  2. Ashari, A., Sedaghati, N., Eisenlohr, J., Sadayappan, P.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs (2014)

    Google Scholar 

  3. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Conference on High Performance Computing Networking (2009)

    Google Scholar 

  4. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Conference on High Performance Computing Networking (2009)

    Google Scholar 

  5. Bolz, J., Farmer, I., Grinspun, E., Schrder, P.: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. ACM (2003)

    Google Scholar 

  6. Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1–25 (2011)

    MathSciNet  Google Scholar 

  7. Heller, M., Oberhuber, T.: Adaptive row-grouped CSR format for storing of sparse matrices on GPU. Comput. Sci. (2012)

    Google Scholar 

  8. Karakasis, V., Goumas, G.I., Koziris, N.: Perfomance models for blocked sparse matrix-vector multiplication kernels. In: International Conference on Parallel Processing (ICPP 2009), Vienna, 22–25 September 2009 (2009)

    Google Scholar 

  9. Kepner, J., Bade, D., Buluc, A., Gilbert, J., Mattson, T., Meyerhenke, H.: Graphs, matrices, and the graphblas: seven good reasons. arXiv preprints (2015)

    Google Scholar 

  10. Kepner, J., Gilbert, J.: Graph algorithms in the language of linear algebra. In: Opencoursesfree Org, pp. 315–337 (2011). https://doi.org/10.1137/1.9780898719918

  11. Khairoutdinov, M.F., Randall, D.A.: A cloud resolving model as a cloud parameterization in the ncar community climate system model: preliminary results. Geophys. Res. Lett. 28(18), 3617–3620 (2001)

    Article  Google Scholar 

  12. Krotkiewski, M., Dabrowski, M.: Parallel symmetric sparse matrix-vector product on scalar multi-core cpus. Parall. Comput. 364, 181–198 (2010)

    Google Scholar 

  13. Langr, D., Tvrdik, P.: Evaluation criteria for sparse matrix storage formats. IEEE Trans. Parallel Distrib. Syst. 27(2), 428–440 (2015)

    Article  Google Scholar 

  14. Li, C., Xia, T., Zhao, W., Zheng, N., Ren, P.: Spv8: pursuing optimal vectorization and regular computation pattern in SPMV. In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 661–666. IEEE (2021)

    Google Scholar 

  15. Li, Y., et al.: VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors. J. Supercomput. 76, 2063–2081 (2020)

    Google Scholar 

  16. Liang, Y., Tang, W.T., Zhao, R., Lu, M., Huynh, H.P., Goh, R.S.M.: Scale-free sparse matrix-vector multiplication on many-core architectures. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(12), 2106–2119 (2017). https://doi.org/10.1109/TCAD.2017.2681072

  17. Liu, W., Vinter, B.: Csr5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 339–350 (2015)

    Google Scholar 

  18. Maggioni, M., Bergerwolf, T.Y.: Adell: an adaptive warp-balancing ell format for efficient sparse matrix-vector multiplication on gpus. IEEE Comput. Soc. 11–20 (2013)

    Google Scholar 

  19. Merrill, D., Garland, M.: Merge-based parallel sparse matrix-vector multiplication. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2016) (2017)

    Google Scholar 

  20. Merrill, D., Garland, M.: Merge-based sparse matrix-vector multiplication (SPMV) using the CSR storage format. ACM Sigplan Notices 51(8), 1–2 (2016)

    Google Scholar 

  21. Mohri, M.: Semiring frameworks and algorithms for shortest-distance problems. J. Automata. Lang. Combinator. (2002)

    Google Scholar 

  22. Mu, S., et al.: GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication. Concurr. Comput. Pract. Exp. (2015)

    Google Scholar 

  23. Ravishankar, M., et al.: Distributed memory code generation for mixed irregular/regular computations. In: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 65–75 (2015)

    Google Scholar 

  24. Shalf, J., Dosanjh, S.S., Morrison, J.: Exascale computing technology challenges. In: High Performance Computing for Computational Science - 9th International conference, Berkeley, 22–25 June 2010 (VECPAR 2010), Revised Selected Papers (2010)

    Google Scholar 

  25. STREAM: Sustainable memory bandwidth in high performance computers. http://www.cs.virginia.edu/stream/

  26. Sundaram, N., et al.: Graphmat: high performance graph analytics made productive. arXiv preprints (2015)

    Google Scholar 

  27. Venkat, A., Hall, M., Strout, M.: Loop and data transformations for sparse matrix code. ACM SIGPLAN Notices 50(6), 521–532 (2015)

    Article  Google Scholar 

  28. Vázquez, F., Fernández, J., Garzón, E.: A new approach for sparse matrix vector product on nvidia gpus. Concurr. Comput. Pract. Exp. 23(8), 815–826 (2011)

    Google Scholar 

  29. Vázquez, F., Ortega, G., Fernández, J., Garzón, E.: Improving the performance of the sparse matrix vector product with gpus. In: IEEE International Conference on Computer and Information Technology (2010)

    Google Scholar 

  30. Wang, E., et al.: Intel math kernel library. In: Wang, E., et al. (eds.) High-Performance Computing on the Intel® Xeon Phi™: How to Fully Exploit MIC Architectures, pp. 167–188. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06486-4_7

  31. Wang, Y., et al.: Gunrock: GPU Graph Analytics (2017)

    Google Scholar 

  32. Wright, M.: The Opportunities and Challenges of Exascale Computing (2010)

    Google Scholar 

  33. Xie, B., et al.: CVR: efficient vectorization of SPMV on x86 processors. In: Proceedings of the 2018 International Symposium on Code Generation and Optimization, pp. 149–162 (2018)

    Google Scholar 

  34. Yan, J., Chen, X., Liu, J.: CSR &RV: an efficient value compression format for sparse matrix-vector multiplication. In: Liu, S., Wei, X. (eds.) Network and Parallel Computing: 19th IFIP WG 10.3 International Conference, NPC 2022, Jinan, 24–25 September 2022, Proceedings, pp. 54–60. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21395-3_5

  35. Yan, S., et al.: YASPMV: yet another SPMV framework on GPUS. ACM SIGPLAN Notices (2014)

    Google Scholar 

  36. Zheng, C., Gu, S., Gu, T.X., Yang, B., Liu, X.P.: Biell: a bisection ellpack-based storage format for optimizing SPMV on GPUS. J. Parall. Distrib. Comput. 74(7), 2639–2647 (2014)

    Google Scholar 

Download references

Acknowledgements

This research was funded by the R &D project 2023YFA1011704, and we would like to thank the ICA3PP 2023 reviewers for their valuable revision comments. We will continue our research in the future to explore more efficient SpMV implementations on Pezy-SC3s and other platforms.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guo, J., Liu, J., Wang, Q., Zhu, X. (2024). Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3s. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14488. Springer, Singapore. https://doi.org/10.1007/978-981-97-0801-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0801-7_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0800-0

  • Online ISBN: 978-981-97-0801-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics