Skip to main content

Rgs-SpMM: Accelerate Sparse Matrix-Matrix Multiplication by Row Group Splitting Strategy on the GPU

  • Conference paper
  • First Online:
Network and Parallel Computing (NPC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13615))

Included in the following conference series:

Abstract

The Sparse Matrix-Matrix Multiplication (SpMM) operation is widely used in different fields, especially the recently popular GNN framework. Researchers have designed many kernels on the GPU to accelerate the SpMM operation. Existing methods mostly adopt a row splitting strategy to obtain better parallelism and memory access efficiency. However, due to irregularities of sparse matrices such as short rows with few non-zero elements, current methods suffer from the underutilization of thread resources in GPU. In this paper, We rearrange the distribution of non-zero elements in the sparse matrix and design the SpMM kernel based on the row group splitting strategy. In contrast to previous methods which assign a “row” task unit to a warp for processing, we combine short rows in a sparse matrix into “row groups” as a task unit, which allocate more appropriate non-zero elements tasks to the GPU resources. This method reduces the thread divergence in a warp and improves load balancing among warps. Our experimental data comes from the SNAP Matrix Collection. The results show that our kernel is faster than cuSPARSE and GE-SpMM, with an average speedup of 1.61 and 1.42 respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wang, M., Zheng, D., Ye, Z., et al.: Deep graph library: a graph-centric, highly performant package for graph neural networks. In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)

    Google Scholar 

  2. Hu, Y., Ye, Z., et al.: FeatGraph: a flexible and efficient backend for graph neural network systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC (2020)

    Google Scholar 

  3. Merrill, D., Garland, M.: Merge-based parallel sparse matrix-vector multiplication. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, USA (2016)

    Google Scholar 

  4. Winter, M., Mlakar, D., et al.: Adaptive sparse matrix-matrix multiplication on the GPU. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming - PPoPP 2019, pp. 68–81. ACM Press, New York (2019)

    Google Scholar 

  5. The API reference guide for cuSPARSE, the CUDA sparse matrix library (v11.4 ed.). http://docs.nvidia.com/cuda/cusparse/index.html

  6. Yang, C., Buluç, A., Owens, J.D.: Design principles for sparse matrix multiplication on the GPU. In: Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, 27–31 August 2018, Proceedings, pp. 672–687 (2018)

    Google Scholar 

  7. Huang, G., Dai, G., Wang, Y., Yang, H.: GE-SpMM: general purpose sparse matrix-matrix multiplication on GPUs for graph neural networks. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12 (2020)

    Google Scholar 

  8. Davis, T.A., Hu, Y.: The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1–25 (2011)

    Google Scholar 

Download references

Acknowledgments

This work is supported financially by the National Natural Science Foundation of China (61672438), Natural Science Foundation of Sichuan, China (2022NSFSC0894, 2022NSFSC0940, 23NSFJQ0112), Special Project of China Association of Higher Education (21SZYB16).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaobin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guo, M. et al. (2022). Rgs-SpMM: Accelerate Sparse Matrix-Matrix Multiplication by Row Group Splitting Strategy on the GPU. In: Liu, S., Wei, X. (eds) Network and Parallel Computing. NPC 2022. Lecture Notes in Computer Science, vol 13615. Springer, Cham. https://doi.org/10.1007/978-3-031-21395-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21395-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21394-6

  • Online ISBN: 978-3-031-21395-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics