An effective SPMV based on block strategy and hybrid compression on GPU

Cui, Huanyu; Wang, Nianbin; Wang, Yuhua; Han, Qilong; Xu, Yuezhu

doi:10.1007/s11227-021-04123-6

An effective SPMV based on block strategy and hybrid compression on GPU

Published: 18 October 2021

Volume 78, pages 6318–6339, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Huanyu Cui¹,
Nianbin Wang¹,
Yuhua Wang¹,
Qilong Han¹ &
…
Yuezhu Xu¹

485 Accesses
4 Citations
Explore all metrics

Abstract

Due to the non-uniformity of the sparse matrix, the calculation of SPMV (sparse matrix vector multiplication) will lead to redundancy in calculation, redundancy in storage, unbalanced load and low GPU utilization. In this study, a new matrix compression method based on CSR and COO is proposed for the above analysis: PBC algorithm. This method considers the load balancing condition in the calculation process of SPMV, and blocks are divided according to the strategy of row main order to ensure the minimum standard deviation between each block, aiming to satisfy the maximum similarity in the number of nonzero elements between each block. This paper preprocesses the original matrix based on block splitting algorithm to meet the conditions of load balancing for each block stored in the form of CSR and COO. Finally, the experimental results show that the time of SPMV preprocessing is within the acceptable range of the algorithm. Compared with the serial code without CSR optimization, the parallel method in this paper has an acceleration ratio of 178x. In addition, compared with the serial code for CSR optimization, the parallel method in this paper has an acceleration ratio of 6x. And a representative matrix compression method is also selected for performing comparative analysis. The experimental results show that the PBC algorithm has a good efficiency improvement compared with the comparison algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Article Open access 17 April 2024

References

Ernesto D, Pablo E (2018) Solving sparse triangular linear systems in modern GPUs: a synchronization-free algorithm. In: 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), 1:196–203
Ahmadi A, Manganiello F, Khademi A et al (2021) A parallel Jacobi-embedded Gauss-Seidel mMethod. IEEE Trans Parallel Distrib Syst 76:8883–8900
Google Scholar
Barreda M, Dolz MF et al (2020) Performance modeling of the sparse matrix-vector product via convolutional neural networks. J Supercomput 76:8883–8900
Article Google Scholar
Benatia A, Ji, WX, Wang, YZ (2016) Sparse matrix format selection with multiclass SVM for SPMV on GPU. In: 45th International Conference on Parallel Processing (ICPP), Comm. ACM 38(4):393–422
Nathan B, Michael G (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the ACM/IEEE Conference on High Performance Computing, SC 2009, pp. 1–11
Francisco V, Ortega G, José-Jesús F (2010) Improving the performance of the sparse matrix vector product with GPUs. In: 10th IEEE International Conference on Computer and Information Technology, CIT Bradford, West Yorkshire, UK, pp. 1146–1151
Dominik G, Anton L (2011) Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation. In: Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, GPGPU, pp. 1–8
Juan C, Francisco F, Marcos F et al (2012) Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs. Microprocess Microsyst 36(2):65–77
Article Google Scholar
Yzelman AJN, Roose D (2013) High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Tras Parallel Distrib Syst 25(1):116–125
Article Google Scholar
Ashari A, Sedaghati N, Eisenlohr J et al (2014) An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: Proceedings of the 28th ACM international conference on Supercomputing. ACM, pp. 273–282
Yang W, Li K, Mo Z et al (2015) Performance optimization using partitioned SPMV on GPUs and multicore CPUs. IEEE Trans Comput 64(9):2623–2636
Article MathSciNet Google Scholar
Cheng K, Tian J, Ma RL (2018) Study on efficient storage format of sparse matrix based on GPU. Comput Eng 491(08):60–66
Google Scholar
Buatois L, Caumon G, Levy B (2009) Concurrent number cruncher: a GPU implementation of a general sparse linear solver. Int J Parallel Emerg Distrib Syst 24(3):205–223
Article MathSciNet Google Scholar
Oberhuber T, Suzuki A, Vacata J (2011) New row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA. Acta Tech 56(4):447–466
MathSciNet Google Scholar
Belgin M, Back G, Ribbens CJ (2009) Pattern-based sparse matrix representation for memory-efficient SMVM kernels. In: Proceedings of the 23rd International Conference on Supercomputing, ACM, pp 100–109
Williams S, Oliker L, Vuduc R (2009) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput 35(3):178–194
Article Google Scholar
Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: International Conference on High-Performance Embedded Architectures and Compilers. DBLP, pp 111–125
Choi JW, Singh A (2010) Model-driven autotuning of sparse matrix vector multiply on GPUs. In: PPoPP ’10: Proceedings of the 15th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pp 115–126
Yzelman AN, Bisseling RH (2011) Two-dimensional cache-oblivious sparse matrix vector multiplication. Parallel Comput 37(12):806–819
Article Google Scholar
Abubaker NFT, Kadir A, Cevdet A (2019) Spatiotemporal graph and hypergraph partitioning models for sparse matrix-vector multiplication on many-core architectures. IEEE Trans Parallel Distrib Syst 30(2):445–458
Article Google Scholar
Yang C, Aydin B, Owens J (2018) Design principles for sparse matrix multiplication on the GPU. In: Euro-Par, pp 1–16
Yan SG, Li C, Zhang YQ (2014). yaSPMV: yet another SPMV framework on GPUs. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp 107–118
Liu W, Vinter B (2015) CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In: The 29th ACM International Conference on Supercomputing (ICS ’15). ACM, ACM, pp. 1–12
Tan GM, Liu JH, Li JJ (2018) Design and implementation of adaptive SPMV library for multicore and many-core architecture. ACM Trans Math Softw 44(4):1–25
Article MathSciNet Google Scholar
Yang W, Li K, Li K (2018) A parallel computing method using blocked format with optimal partitioning for SPMV on GPU. J Comput Syst Sci 92:152–170
Article MathSciNet Google Scholar
Benatia A, Ji W, Wang Y et al (2019) BestSF: A sparse meta-format for optimizing SPMV on GPU. ACM Trans Arch Code Optim 15(3):1–27
Google Scholar
Yang WD, Li KL, Liu YZ et al (2014) Optimization of Quasi - diagonal matrix-vector multiplication on GPU[J]. Int J High Perform Comput Appl 28(2):183–195
Article Google Scholar
Fukaya T, Ishida K, Miura A et al (2021) Accelerating the SpMV kernel on standard CPUs by exploiting the partially diagonal structures[J]. Preprints
He G, Chen Q, Gao J (2021) A new diagonal storage for efficient implementation of sparse matrix-vector multiplication on graphics processing unit. Concurr Comput Pract Exp 2021(4):1–15
Article Google Scholar
Gao J, Xia Y, Yin R et al (2021) Adaptive diagonal sparse matrix-vector multiplication on GPU[J]. J Parallel Distrib Comput 157(11):1–53
Google Scholar

Download references

Acknowledgements

This work was partially supported by NSFC (National Natural Science Foundation of China) Number No. 61672181.

Author information

Authors and Affiliations

College of Computer Science and Technology, Harbin Engineering University, Harbin, China
Huanyu Cui, Nianbin Wang, Yuhua Wang, Qilong Han & Yuezhu Xu

Authors

Huanyu Cui
View author publications
You can also search for this author in PubMed Google Scholar
Nianbin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuhua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qilong Han
View author publications
You can also search for this author in PubMed Google Scholar
Yuezhu Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuhua Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cui, H., Wang, N., Wang, Y. et al. An effective SPMV based on block strategy and hybrid compression on GPU. J Supercomput 78, 6318–6339 (2022). https://doi.org/10.1007/s11227-021-04123-6

Download citation

Accepted: 29 September 2021
Published: 18 October 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11227-021-04123-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective SPMV based on block strategy and hybrid compression on GPU

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

Parallelizing the dual revised simplex method

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An effective SPMV based on block strategy and hybrid compression on GPU

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

Parallelizing the dual revised simplex method

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation