research-article

GTLB:A Load-Balanced SpMV Computation Method on GPU

Authors:
Jiafan Jiang

Department of Computer Applications and Technology, Qinghai University, China

Department of Computer Applications and Technology, Qinghai University, China

0009-0003-6689-959X
View Profile

,
Jianqiang Huang

Department of Computer Applications and Technology, Qinghai University, China

Department of Computer Applications and Technology, Qinghai University, China

0000-0002-4454-7919
View Profile

,
Haodong Bian

Department of Computer Applications and Technology, Qinghai University, China

Department of Computer Applications and Technology, Qinghai University, China

0000-0003-0907-288X
View Profile

HP3C '23: Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and CommunicationsJune 2023Pages 101–107https://doi.org/10.1145/3606043.3606057

Published:16 November 2023Publication History

HP3C '23: Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications

Pages 101–107

ABSTRACT

Sparse Matrix-Vector Multiplication (SpMV) has been widely used in the field of scientific computing. Optimization of SpMV’s computational performance can bring significant benefits to its usage. In recent years, with the development of GPU hardware technology, SpMV can provide tens of times better performance than CPUs for simple computational tasks. Optimization of SpMV on the GPU platform can bring great performance improvements. Load balancing is crucial for achieving performance improvements when using GPU platforms with thousands of computing cores. This study proposes a two-level average block partitioning strategy based on the CSR format to achieve balanced block-level and thread-level task partitioning. The study also designed a parallel merge scheme for data merging among threads, warps, and blocks, further improving the parallelism of SpMV computation.

References

Mohammad Almasri and Walid Abu-Sufah. 2020. CCF: An efficient SpMV storage format for AVX512 platforms. Parallel Comput. 100 (2020), 102710.Google ScholarCross Ref
Hossein Amiri and Asadollah Shahbahrami. 2020. SIMD programming using Intel vector extensions. J. Parallel and Distrib. Comput. 135 (2020), 83–100.Google ScholarDigital Library
Nathan Bell and Michael Garland. 2008. Efficient sparse matrix-vector multiplication on CUDA. Technical Report. Nvidia Technical Report NVR-2008-004, Nvidia Corporation.Google Scholar
Nathan Bell and Michael Garland. 2009. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proceedings of the conference on high performance computing networking, storage and analysis. 1–11.Google ScholarDigital Library
Haodong Bian, Jianqiang Huang, Runting Dong, Yuluo Guo, Lingbin Liu, Dongqiang Huang, and Xiaoying Wang. 2021. A simple and efficient storage format for SIMD-accelerated SpMV. Cluster Computing 24, 4 (2021), 3431–3448.Google ScholarDigital Library
Haodong Bian, Jianqiang Huang, Runting Dong, Lingbin Liu, and Xiaoying Wang. 2020. CSR2: a new format for SIMD-accelerated SpMV. In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). IEEE, 350–359.Google ScholarCross Ref
Haodong Bian, Jianqiang Huang, Lingbin Liu, Dongqiang Huang, and Xiaoying Wang. 2021. ALBUS: A method for efficiently processing SpMV using SIMD and Load balancing. Future Generation Computer Systems 116 (2021), 371–392.Google ScholarCross Ref
Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. 233–244.Google ScholarDigital Library
Chong Chen. 2022. Explicit caching HYB: a new high-performance SpMV framework on GPGPU. arXiv preprint arXiv:2204.06666 (2022).Google Scholar
Edoardo Coronado-Barrientos, Mario Antonioletti, and A Garcia-Loureiro. 2021. A new AXT format for an efficient SpMV product using AVX-512 instructions and CUDA. Advances in Engineering Software 156 (2021), 102997.Google ScholarDigital Library
NVIDIA Corporation. 2022. NVIDIA cuSPARSE v12.0. The API reference guide for cuSPARSE.Google Scholar
Maryam Mehri Dehnavi, David M Fernández, and Dennis Giannacopoulos. 2010. Finite-element sparse matrix vector multiplication on graphic processing units. IEEE Transactions on Magnetics 46, 8 (2010), 2982–2985.Google ScholarCross Ref
Eduardo F D’Azevedo, Mark R Fahey, and Richard T Mills. 2005. Vectorized sparse matrix multiply for compressed row storage format. In International Conference on Computational Science. Springer, 99–106.Google ScholarDigital Library
Joseph L Greathouse and Mayank Daga. 2014. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 769–780.Google ScholarDigital Library
Guixia He and Jiaquan Gao. 2016. A novel CSR-based sparse matrix-vector multiplication on GPUs. Mathematical Problems in Engineering 2016 (2016).Google Scholar
Eun-Jin Im, Katherine Yelick, and Richard Vuduc. 2004. Sparsity: Optimization framework for sparse matrix kernels. The International Journal of High Performance Computing Applications 18, 1 (2004), 135–158.Google ScholarDigital Library
Vasileios Karakasis, Georgios Goumas, and Nectarios Koziris. 2009. A comparative study of blocking storage methods for sparse matrices on multicore architectures. In 2009 International Conference on Computational Science and Engineering, Vol. 1. IEEE, 247–256.Google ScholarDigital Library
Weifeng Liu and Brian Vinter. 2015. CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing. 339–350.Google ScholarDigital Library
Yongchao Liu and Bertil Schmidt. 2018. Lightspmv: faster cuda-compatible sparse matrix-vector multiplication using compressed sparse rows. Journal of Signal Processing Systems 90, 1 (2018), 69–86.Google ScholarDigital Library
Duane Merrill and Michael Garland. 2016. Merge-based parallel sparse matrix-vector multiplication. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 678–689.Google ScholarDigital Library
Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, and Guangming Tan. 2021. Tilespmv: A tiled algorithm for sparse matrix-vector multiplication on GPUs. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 68–78.Google ScholarCross Ref
Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. Cvr: Efficient vectorization of spmv on x86 processors. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. 149–162.Google ScholarDigital Library
Shengen Yan, Chao Li, Yunquan Zhang, and Huiyang Zhou. 2014. yaSpMV: Yet another SpMV framework on GPUs. Acm Sigplan Notices 49, 8 (2014), 107–118.Google ScholarDigital Library
Yufeng Zhang, Wangdong Yang, Kenli Li, Dahai Tang, and Keqin Li. 2021. Performance analysis and optimization for SpMV based on aligned storage formats on an ARM processor. J. Parallel and Distrib. Comput. 158 (2021), 126–137.Google ScholarDigital Library

Index Terms

GTLB:A Load-Balanced SpMV Computation Method on GPU
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms

Recommendations

Efficient Algorithm Design of Optimizing SpMV on GPU
HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

Sparse matrix-vector multiplication (SpMV) is a fundamental building block for various numerical computing applications. However, most existing GPU-SpMV approaches may suffer from either long preprocessing overhead, load imbalance, format conversion, ...
Read More
A Cross-Platform SpMV Framework on Many-Core Architectures

Sparse Matrix-Vector multiplication (SpMV) is a key operation in engineering and scientific computing. Although the previous work has shown impressive progress in optimizing SpMV on many-core architectures, load imbalance and high memory bandwidth ...
Read More
Efficient PageRank and SpMV Computation on AMD GPUs
ICPP '10: Proceedings of the 2010 39th International Conference on Parallel Processing

Google's famous PageRank algorithm is widely used to determine the importance of web pages in search engines. Given the large number of web pages on the World Wide Web, efficient computation of PageRank becomes a challenging problem. We accelerated the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HP3C '23: Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications
June 2023
354 pages
ISBN:9781450399883
DOI:10.1145/3606043

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
Load Balancing
SpMV
sparse matrix
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 27
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

GTLB:A Load-Balanced SpMV Computation Method on GPU

HP3C '23: Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient Algorithm Design of Optimizing SpMV on GPU

A Cross-Platform SpMV Framework on Many-Core Architectures

Efficient PageRank and SpMV Computation on AMD GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

GTLB:A Load-Balanced SpMV Computation Method on GPU

HP3C '23: Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient Algorithm Design of Optimizing SpMV on GPU

A Cross-Platform SpMV Framework on Many-Core Architectures

Efficient PageRank and SpMV Computation on AMD GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media