Abstract
The Sparse Matrix-Matrix Multiplication (SpMM) operation is widely used in different fields, especially the recently popular GNN framework. Researchers have designed many kernels on the GPU to accelerate the SpMM operation. Existing methods mostly adopt a row splitting strategy to obtain better parallelism and memory access efficiency. However, due to irregularities of sparse matrices such as short rows with few non-zero elements, current methods suffer from the underutilization of thread resources in GPU. In this paper, We rearrange the distribution of non-zero elements in the sparse matrix and design the SpMM kernel based on the row group splitting strategy. In contrast to previous methods which assign a “row” task unit to a warp for processing, we combine short rows in a sparse matrix into “row groups” as a task unit, which allocate more appropriate non-zero elements tasks to the GPU resources. This method reduces the thread divergence in a warp and improves load balancing among warps. Our experimental data comes from the SNAP Matrix Collection. The results show that our kernel is faster than cuSPARSE and GE-SpMM, with an average speedup of 1.61 and 1.42 respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wang, M., Zheng, D., Ye, Z., et al.: Deep graph library: a graph-centric, highly performant package for graph neural networks. In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)
Hu, Y., Ye, Z., et al.: FeatGraph: a flexible and efficient backend for graph neural network systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC (2020)
Merrill, D., Garland, M.: Merge-based parallel sparse matrix-vector multiplication. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, USA (2016)
Winter, M., Mlakar, D., et al.: Adaptive sparse matrix-matrix multiplication on the GPU. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming - PPoPP 2019, pp. 68–81. ACM Press, New York (2019)
The API reference guide for cuSPARSE, the CUDA sparse matrix library (v11.4 ed.). http://docs.nvidia.com/cuda/cusparse/index.html
Yang, C., Buluç, A., Owens, J.D.: Design principles for sparse matrix multiplication on the GPU. In: Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, 27–31 August 2018, Proceedings, pp. 672–687 (2018)
Huang, G., Dai, G., Wang, Y., Yang, H.: GE-SpMM: general purpose sparse matrix-matrix multiplication on GPUs for graph neural networks. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12 (2020)
Davis, T.A., Hu, Y.: The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1–25 (2011)
Acknowledgments
This work is supported financially by the National Natural Science Foundation of China (61672438), Natural Science Foundation of Sichuan, China (2022NSFSC0894, 2022NSFSC0940, 23NSFJQ0112), Special Project of China Association of Higher Education (21SZYB16).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Guo, M. et al. (2022). Rgs-SpMM: Accelerate Sparse Matrix-Matrix Multiplication by Row Group Splitting Strategy on the GPU. In: Liu, S., Wei, X. (eds) Network and Parallel Computing. NPC 2022. Lecture Notes in Computer Science, vol 13615. Springer, Cham. https://doi.org/10.1007/978-3-031-21395-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-21395-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21394-6
Online ISBN: 978-3-031-21395-3
eBook Packages: Computer ScienceComputer Science (R0)