Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs

Mukunoki, Daichi; Takahashi, Daisuke

doi:10.1007/978-3-642-39640-3_15

Daichi Mukunoki²⁴ &
Daisuke Takahashi²⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7975))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1952 Accesses
7 Citations

Abstract

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific and engineering computing. This paper presents optimization techniques for SpMV for the Compressed Row Storage (CRS) format on NVIDIA Kepler architecture GPUs using CUDA. Our implementation is based on an existing method proposed for the Fermi architecture, an earlier generation of the GPU, and takes advantage of some of the new features of the Kepler architecture. On a Tesla K20 Kepler architecture GPU on double precision operations, our implementation is, on average, approximately 1.29 times faster than that the Fermi optimized implementation for 200 different types of matrices. As a result, our implementation outperforms the NVIDIA cuSPARSE library’s CRS format SpMV in CUDA 5.0 on 174 of the 200 matrices, and the average speedup compared to the cuSPARSE SpMV routine across all 200 matrices is approximately 1.45.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baskaran, M.M., Bordawekar, R.: Optimizing Sparse Matrix-Vector Multiplication on GPUs. IBM Research Report RC24704 (2009)
Google Scholar
Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004 (2008)
Google Scholar
NVIDIA Corporation: Whitepaper NVIDIAs Next Generation CUDA Compute Architecture: Kepler GK110. itepaper.pdf (2012), http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Wh
Davis, J.D., Chung, E.S.: SpMV: A Memory-Bound Application on the GPU Stuck Between a Rock and a Hard Place. Microsoft Technical Report MSR–TR–2012–95 (2012)
Google Scholar
Davis, T., Hu, Y.: The University of Florida Sparse Matrix Collection, http://www.cise.ufl.edu/research/sparse/matrices/
El Zein, A.H., Rendell, A.P.: Generating Optimal CUDA Sparse Matrix Vector Product Implementations for Evolving GPU Hardware. Concurrency and Computation: Practice and Experience 24, 3–13 (2012)
Article Google Scholar
Feng, X., Jin, H., Zheng, R., Hu, K., Zeng, J., Shao, Z.: Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs. In: Proc. IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS 2011), pp. 165–172 (2011)
Google Scholar
Guo, P., Wang, L.: Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs. In: Proc. International Conference on Computational and Information Sciences (ICCIS 2010), pp. 1154–1157 (2010)
Google Scholar
Kubota, Y., Takahashi, D.: Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part II. LNCS, vol. 6783, pp. 547–561. Springer, Heidelberg (2011)
Chapter Google Scholar
Matam, K., Kothapalli, K.: Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU. In: Proc. International Conference on Parallel Processing (ICPP 2011), pp. 612–621 (2011)
Google Scholar
NVIDIA Corporation: cuSPARSE Library (included in CUDA Toolkit), https://developer.nvidia.com/cusparse
Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Proc. Innovative Parallel Computing: Foundations and Applications of GPU, Manycore, and Heterogeneous Systems (InPar 2012), pp. 1–12 (2012)
Google Scholar
Xu, W., Zhang, H., Jiao, S., Wang, D., Song, F., Liu, Z.: Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU. In: Proc. 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2012), pp. 231–235 (2012)
Google Scholar
Yoshizawa, H., Takahashi, D.: Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS format on GPUs. In: Proc. 15th IEEE International Conference on Computational Science and Engineering (CSE 2012), pp. 130–136 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Systems and Information Engineering, University of Tsukuba, Japan
Daichi Mukunoki
Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8573, Japan
Daisuke Takahashi

Authors

Daichi Mukunoki
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

L-I.S.U.T. - D.A.P.I.t. Facoltà Ingegneria, Università degli Studi della Basilicata, Viale dell’Ateneo Lucano, 10, 85100, Potenza, Italy
Beniamino Murgante
Covenant University, Canaanland OTA, Nigeria
Sanjay Misra
Partimento di Scienze e Tecnologie per LAgricoltura, le Foreste, la Natura e lEnergia, Università degli Studi della Tuscia, Via S. Camillo de Lellis, snc, 01100, Viterbo, Italy
Maurizio Carlini
Dipartimento di Scienze dell’Ingegneria Civile e dell’Architecttura, Politecnico di Bari, Via Orabona, 4, 70125, Bari, Italy
Carmelo M. Torre
International University VNU-HCM, Quarter 6, Linh Trung, Thu Duc, Ho Chi Minh City, Vietnam
Hong-Quang Nguyen
School of Business Systems, Monash University, 3800, Clayton, VIC, Australia
David Taniar
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, 813-8503, Higashi-ku, Fukuoka, Japan
Bernady O. Apduhan
Department of Mathematics and Computer Science, University of Perugia, Via Vanvitelli, 1, 06123, Perugia, Italy
Osvaldo Gervasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukunoki, D., Takahashi, D. (2013). Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39640-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-39640-3_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39639-7
Online ISBN: 978-3-642-39640-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics