skip to main content
10.1145/3606043.3606057acmotherconferencesArticle/Chapter ViewAbstractPublication Pageshp3cConference Proceedingsconference-collections
research-article

GTLB:A Load-Balanced SpMV Computation Method on GPU

Published:16 November 2023Publication History

ABSTRACT

Sparse Matrix-Vector Multiplication (SpMV) has been widely used in the field of scientific computing. Optimization of SpMV’s computational performance can bring significant benefits to its usage. In recent years, with the development of GPU hardware technology, SpMV can provide tens of times better performance than CPUs for simple computational tasks. Optimization of SpMV on the GPU platform can bring great performance improvements. Load balancing is crucial for achieving performance improvements when using GPU platforms with thousands of computing cores. This study proposes a two-level average block partitioning strategy based on the CSR format to achieve balanced block-level and thread-level task partitioning. The study also designed a parallel merge scheme for data merging among threads, warps, and blocks, further improving the parallelism of SpMV computation.

References

  1. Mohammad Almasri and Walid Abu-Sufah. 2020. CCF: An efficient SpMV storage format for AVX512 platforms. Parallel Comput. 100 (2020), 102710.Google ScholarGoogle ScholarCross RefCross Ref
  2. Hossein Amiri and Asadollah Shahbahrami. 2020. SIMD programming using Intel vector extensions. J. Parallel and Distrib. Comput. 135 (2020), 83–100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Nathan Bell and Michael Garland. 2008. Efficient sparse matrix-vector multiplication on CUDA. Technical Report. Nvidia Technical Report NVR-2008-004, Nvidia Corporation.Google ScholarGoogle Scholar
  4. Nathan Bell and Michael Garland. 2009. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of the conference on high performance computing networking, storage and analysis. 1–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Haodong Bian, Jianqiang Huang, Runting Dong, Yuluo Guo, Lingbin Liu, Dongqiang Huang, and Xiaoying Wang. 2021. A simple and efficient storage format for SIMD-accelerated SpMV. Cluster Computing 24, 4 (2021), 3431–3448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Haodong Bian, Jianqiang Huang, Runting Dong, Lingbin Liu, and Xiaoying Wang. 2020. CSR2: a new format for SIMD-accelerated SpMV. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). IEEE, 350–359.Google ScholarGoogle ScholarCross RefCross Ref
  7. Haodong Bian, Jianqiang Huang, Lingbin Liu, Dongqiang Huang, and Xiaoying Wang. 2021. ALBUS: A method for efficiently processing SpMV using SIMD and Load balancing. Future Generation Computer Systems 116 (2021), 371–392.Google ScholarGoogle ScholarCross RefCross Ref
  8. Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. 233–244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chong Chen. 2022. Explicit caching HYB: a new high-performance SpMV framework on GPGPU. arXiv preprint arXiv:2204.06666 (2022).Google ScholarGoogle Scholar
  10. Edoardo Coronado-Barrientos, Mario Antonioletti, and A Garcia-Loureiro. 2021. A new AXT format for an efficient SpMV product using AVX-512 instructions and CUDA. Advances in Engineering Software 156 (2021), 102997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. NVIDIA Corporation. 2022. NVIDIA cuSPARSE v12.0. The API reference guide for cuSPARSE.Google ScholarGoogle Scholar
  12. Maryam Mehri Dehnavi, David M Fernández, and Dennis Giannacopoulos. 2010. Finite-element sparse matrix vector multiplication on graphic processing units. IEEE Transactions on Magnetics 46, 8 (2010), 2982–2985.Google ScholarGoogle ScholarCross RefCross Ref
  13. Eduardo F D’Azevedo, Mark R Fahey, and Richard T Mills. 2005. Vectorized sparse matrix multiply for compressed row storage format. In International Conference on Computational Science. Springer, 99–106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Joseph L Greathouse and Mayank Daga. 2014. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 769–780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guixia He and Jiaquan Gao. 2016. A novel CSR-based sparse matrix-vector multiplication on GPUs. Mathematical Problems in Engineering 2016 (2016).Google ScholarGoogle Scholar
  16. Eun-Jin Im, Katherine Yelick, and Richard Vuduc. 2004. Sparsity: Optimization framework for sparse matrix kernels. The International Journal of High Performance Computing Applications 18, 1 (2004), 135–158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Vasileios Karakasis, Georgios Goumas, and Nectarios Koziris. 2009. A comparative study of blocking storage methods for sparse matrices on multicore architectures. In 2009 International Conference on Computational Science and Engineering, Vol. 1. IEEE, 247–256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Weifeng Liu and Brian Vinter. 2015. CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing. 339–350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yongchao Liu and Bertil Schmidt. 2018. Lightspmv: faster cuda-compatible sparse matrix-vector multiplication using compressed sparse rows. Journal of Signal Processing Systems 90, 1 (2018), 69–86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Duane Merrill and Michael Garland. 2016. Merge-based parallel sparse matrix-vector multiplication. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 678–689.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, and Guangming Tan. 2021. Tilespmv: A tiled algorithm for sparse matrix-vector multiplication on GPUs. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 68–78.Google ScholarGoogle ScholarCross RefCross Ref
  22. Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. Cvr: Efficient vectorization of spmv on x86 processors. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. 149–162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shengen Yan, Chao Li, Yunquan Zhang, and Huiyang Zhou. 2014. yaSpMV: Yet another SpMV framework on GPUs. Acm Sigplan Notices 49, 8 (2014), 107–118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yufeng Zhang, Wangdong Yang, Kenli Li, Dahai Tang, and Keqin Li. 2021. Performance analysis and optimization for SpMV based on aligned storage formats on an ARM processor. J. Parallel and Distrib. Comput. 158 (2021), 126–137.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GTLB:A Load-Balanced SpMV Computation Method on GPU

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      HP3C '23: Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications
      June 2023
      354 pages
      ISBN:9781450399883
      DOI:10.1145/3606043

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 November 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)11

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format