Abstract
Sparse matrix-vector multiplication (SpMV) is an important operation in scientific and engineering computing. This paper presents optimization techniques for SpMV for the Compressed Row Storage (CRS) format on NVIDIA Kepler architecture GPUs using CUDA. Our implementation is based on an existing method proposed for the Fermi architecture, an earlier generation of the GPU, and takes advantage of some of the new features of the Kepler architecture. On a Tesla K20 Kepler architecture GPU on double precision operations, our implementation is, on average, approximately 1.29 times faster than that the Fermi optimized implementation for 200 different types of matrices. As a result, our implementation outperforms the NVIDIA cuSPARSE library’s CRS format SpMV in CUDA 5.0 on 174 of the 200 matrices, and the average speedup compared to the cuSPARSE SpMV routine across all 200 matrices is approximately 1.45.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baskaran, M.M., Bordawekar, R.: Optimizing Sparse Matrix-Vector Multiplication on GPUs. IBM Research Report RC24704 (2009)
Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004 (2008)
NVIDIA Corporation: Whitepaper NVIDIAs Next Generation CUDA Compute Architecture: Kepler GK110. itepaper.pdf (2012), http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Wh
Davis, J.D., Chung, E.S.: SpMV: A Memory-Bound Application on the GPU Stuck Between a Rock and a Hard Place. Microsoft Technical Report MSR–TR–2012–95 (2012)
Davis, T., Hu, Y.: The University of Florida Sparse Matrix Collection, http://www.cise.ufl.edu/research/sparse/matrices/
El Zein, A.H., Rendell, A.P.: Generating Optimal CUDA Sparse Matrix Vector Product Implementations for Evolving GPU Hardware. Concurrency and Computation: Practice and Experience 24, 3–13 (2012)
Feng, X., Jin, H., Zheng, R., Hu, K., Zeng, J., Shao, Z.: Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs. In: Proc. IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS 2011), pp. 165–172 (2011)
Guo, P., Wang, L.: Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs. In: Proc. International Conference on Computational and Information Sciences (ICCIS 2010), pp. 1154–1157 (2010)
Kubota, Y., Takahashi, D.: Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part II. LNCS, vol. 6783, pp. 547–561. Springer, Heidelberg (2011)
Matam, K., Kothapalli, K.: Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU. In: Proc. International Conference on Parallel Processing (ICPP 2011), pp. 612–621 (2011)
NVIDIA Corporation: cuSPARSE Library (included in CUDA Toolkit), https://developer.nvidia.com/cusparse
Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Proc. Innovative Parallel Computing: Foundations and Applications of GPU, Manycore, and Heterogeneous Systems (InPar 2012), pp. 1–12 (2012)
Xu, W., Zhang, H., Jiao, S., Wang, D., Song, F., Liu, Z.: Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU. In: Proc. 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2012), pp. 231–235 (2012)
Yoshizawa, H., Takahashi, D.: Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS format on GPUs. In: Proc. 15th IEEE International Conference on Computational Science and Engineering (CSE 2012), pp. 130–136 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mukunoki, D., Takahashi, D. (2013). Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39640-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-39640-3_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39639-7
Online ISBN: 978-3-642-39640-3
eBook Packages: Computer ScienceComputer Science (R0)