ABSTRACT
Sparse Matrix-Vector Multiplication (SpMV) has been widely used in the field of scientific computing. Optimization of SpMV’s computational performance can bring significant benefits to its usage. In recent years, with the development of GPU hardware technology, SpMV can provide tens of times better performance than CPUs for simple computational tasks. Optimization of SpMV on the GPU platform can bring great performance improvements. Load balancing is crucial for achieving performance improvements when using GPU platforms with thousands of computing cores. This study proposes a two-level average block partitioning strategy based on the CSR format to achieve balanced block-level and thread-level task partitioning. The study also designed a parallel merge scheme for data merging among threads, warps, and blocks, further improving the parallelism of SpMV computation.
- Mohammad Almasri and Walid Abu-Sufah. 2020. CCF: An efficient SpMV storage format for AVX512 platforms. Parallel Comput. 100 (2020), 102710.Google ScholarCross Ref
- Hossein Amiri and Asadollah Shahbahrami. 2020. SIMD programming using Intel vector extensions. J. Parallel and Distrib. Comput. 135 (2020), 83–100.Google ScholarDigital Library
- Nathan Bell and Michael Garland. 2008. Efficient sparse matrix-vector multiplication on CUDA. Technical Report. Nvidia Technical Report NVR-2008-004, Nvidia Corporation.Google Scholar
- Nathan Bell and Michael Garland. 2009. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of the conference on high performance computing networking, storage and analysis. 1–11.Google ScholarDigital Library
- Haodong Bian, Jianqiang Huang, Runting Dong, Yuluo Guo, Lingbin Liu, Dongqiang Huang, and Xiaoying Wang. 2021. A simple and efficient storage format for SIMD-accelerated SpMV. Cluster Computing 24, 4 (2021), 3431–3448.Google ScholarDigital Library
- Haodong Bian, Jianqiang Huang, Runting Dong, Lingbin Liu, and Xiaoying Wang. 2020. CSR2: a new format for SIMD-accelerated SpMV. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). IEEE, 350–359.Google ScholarCross Ref
- Haodong Bian, Jianqiang Huang, Lingbin Liu, Dongqiang Huang, and Xiaoying Wang. 2021. ALBUS: A method for efficiently processing SpMV using SIMD and Load balancing. Future Generation Computer Systems 116 (2021), 371–392.Google ScholarCross Ref
- Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. 233–244.Google ScholarDigital Library
- Chong Chen. 2022. Explicit caching HYB: a new high-performance SpMV framework on GPGPU. arXiv preprint arXiv:2204.06666 (2022).Google Scholar
- Edoardo Coronado-Barrientos, Mario Antonioletti, and A Garcia-Loureiro. 2021. A new AXT format for an efficient SpMV product using AVX-512 instructions and CUDA. Advances in Engineering Software 156 (2021), 102997.Google ScholarDigital Library
- NVIDIA Corporation. 2022. NVIDIA cuSPARSE v12.0. The API reference guide for cuSPARSE.Google Scholar
- Maryam Mehri Dehnavi, David M Fernández, and Dennis Giannacopoulos. 2010. Finite-element sparse matrix vector multiplication on graphic processing units. IEEE Transactions on Magnetics 46, 8 (2010), 2982–2985.Google ScholarCross Ref
- Eduardo F D’Azevedo, Mark R Fahey, and Richard T Mills. 2005. Vectorized sparse matrix multiply for compressed row storage format. In International Conference on Computational Science. Springer, 99–106.Google ScholarDigital Library
- Joseph L Greathouse and Mayank Daga. 2014. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 769–780.Google ScholarDigital Library
- Guixia He and Jiaquan Gao. 2016. A novel CSR-based sparse matrix-vector multiplication on GPUs. Mathematical Problems in Engineering 2016 (2016).Google Scholar
- Eun-Jin Im, Katherine Yelick, and Richard Vuduc. 2004. Sparsity: Optimization framework for sparse matrix kernels. The International Journal of High Performance Computing Applications 18, 1 (2004), 135–158.Google ScholarDigital Library
- Vasileios Karakasis, Georgios Goumas, and Nectarios Koziris. 2009. A comparative study of blocking storage methods for sparse matrices on multicore architectures. In 2009 International Conference on Computational Science and Engineering, Vol. 1. IEEE, 247–256.Google ScholarDigital Library
- Weifeng Liu and Brian Vinter. 2015. CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing. 339–350.Google ScholarDigital Library
- Yongchao Liu and Bertil Schmidt. 2018. Lightspmv: faster cuda-compatible sparse matrix-vector multiplication using compressed sparse rows. Journal of Signal Processing Systems 90, 1 (2018), 69–86.Google ScholarDigital Library
- Duane Merrill and Michael Garland. 2016. Merge-based parallel sparse matrix-vector multiplication. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 678–689.Google ScholarDigital Library
- Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, and Guangming Tan. 2021. Tilespmv: A tiled algorithm for sparse matrix-vector multiplication on GPUs. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 68–78.Google ScholarCross Ref
- Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. Cvr: Efficient vectorization of spmv on x86 processors. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. 149–162.Google ScholarDigital Library
- Shengen Yan, Chao Li, Yunquan Zhang, and Huiyang Zhou. 2014. yaSpMV: Yet another SpMV framework on GPUs. Acm Sigplan Notices 49, 8 (2014), 107–118.Google ScholarDigital Library
- Yufeng Zhang, Wangdong Yang, Kenli Li, Dahai Tang, and Keqin Li. 2021. Performance analysis and optimization for SpMV based on aligned storage formats on an ARM processor. J. Parallel and Distrib. Comput. 158 (2021), 126–137.Google ScholarDigital Library
Index Terms
- GTLB:A Load-Balanced SpMV Computation Method on GPU
Recommendations
Efficient Algorithm Design of Optimizing SpMV on GPU
HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed ComputingSparse matrix-vector multiplication (SpMV) is a fundamental building block for various numerical computing applications. However, most existing GPU-SpMV approaches may suffer from either long preprocessing overhead, load imbalance, format conversion, ...
A Cross-Platform SpMV Framework on Many-Core Architectures
Sparse Matrix-Vector multiplication (SpMV) is a key operation in engineering and scientific computing. Although the previous work has shown impressive progress in optimizing SpMV on many-core architectures, load imbalance and high memory bandwidth ...
Efficient PageRank and SpMV Computation on AMD GPUs
ICPP '10: Proceedings of the 2010 39th International Conference on Parallel ProcessingGoogle's famous PageRank algorithm is widely used to determine the importance of web pages in search engines. Given the large number of web pages on the World Wide Web, efficient computation of PageRank becomes a challenging problem. We accelerated the ...
Comments