Skip to main content

Sparse Matrix-Vector Product

  • Chapter
  • First Online:
Numerical Computations with GPUs

Abstract

The sparse matrix-vector (SpMV) multiplication is one of the key kernels in scientific computing. We present the foundations of its implementation on CUDA- and OpenCL-enabled devices. After introducing the subject, we briefly present three most popular formats: COO, CRS and ELL. They serve as exemplary data structures on which we discuss hardware-related issues associated with efficient SpMV kernel design, such as matrix size, ordering of data, memory boundedness, storage overhead, thread divergence, and coalescence of memory transfers. Next, we present three widely available libraries with stable and validated SpMV kernels: cuSPARSE, CUSP and Paralution. We present and discuss complete codes of several SpMV kernels for both basic SpMV formats and some of its derivatives, including CMRS, and briefly discuss the principles beyond other popular format extensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–11. ACM, New York (2009). doi:http://doi.acm.org/10.1145/1654059.1654078

  2. Choi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. SIGPLAN Not. 45(5), 115–126 (2010). http://doi.acm.org/10.1145/1837853.1693471

  3. Dziekonski, A., Lamecki, A., Mrozowski, M.: A memory efficient and fast sparse matrix vector product on a GPU. Prog. Electromagn. Res. 116, 49–63 (2011). http://www.jpier.org/PIER/pier116/03.11031607.pdf

  4. Feng, X., Jin, H., Zheng, R., Hu, K., Zeng, J., Shao, Z.: Optimization of sparse matrix-vector multiplication with variant CSR on GPUs. In: 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), pp. 165–172. IEEE, Tainan (2011)

    Google Scholar 

  5. Koza, Z., Matyka, M., Szkoda, S., Mirosław, Ł.: Compressed multirow storage format for sparse matrices on graphics processing units. SIAM J. Sci. Comput. 36(2), 219–239 (2014). http://dx.doi.org/10.1137/120900216

  6. Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt, Y., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) High Performance Embedded Architectures and Compilers. Lecture Notes in Computer Science, vol. 5952, pp. 111–125. Springer, Heidelberg (2010). http://dx.doi.org/10.1007/978-3-642-11515-8_10.

  7. Mukunoki, D., Takahashi, D.: Optimization of sparse matrix-vector multiplication for CRS format on NVIDIA Kepler architecture GPUs. In: Murgante, B., Misra, S., Carlini, M., Torre, C.M., Nguyen, H.Q., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.) Computational Science and Its Applications – ICCSA 2013. Lecture Notes in Computer Science, vol. 7975, pp. 211–223. Springer, Heidelberg (2013). http://dx.doi.org/10.1007/978-3-642-39640-3_15

  8. Su, B.Y., Keutzer, K.: clSpMV: a cross-platform opencl spmv framework on GPUs. In: Proceedings of the International Conference on Supercomputing, ICS ’12 (2012)

    Google Scholar 

  9. Vázquez, F., Garzón, E.M., Martınez, J.A., Fernández, J.: Accelerating sparse matrix vector product with GPUs. In: Proceedings of the International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE 2009), pp. 1081–1092. CMMSE, Gijón (2009)

    Google Scholar 

  10. Vázquez, F., Fernández, J.J., Garzón, E.M.: Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach. Parallel Comput. 38(8), 408–420 (2012). doi:10.1016/j.parco.2011.08.003. http://www.sciencedirect.com/science/article/pii/S0167819111001050

  11. Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels. Ph.D. thesis, University of California (2003)

    Google Scholar 

  12. Yang, X., Parthasarathy, S., Sadayappan, P.: Fast sparse matrix-vector multiplication on GPUs: implications for graph mining. Proc. VLDB Endowment 4(4), 231–242 (2011)

    Article  Google Scholar 

  13. Yoshizawa, H., Takahashi, D.: Automatic tuning of sparse matrix-vector multiplication for CRS format on GPUs. In: 2012 IEEE 15th International Conference on Computational Science and Engineering, pp. 130–136 (2012). doi:http://doi.ieeecomputersociety.org/10.1109/ICCSE.2012.28

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zbigniew Koza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Koza, Z., Matyka, M., Mirosław, Ł., Poła, J. (2014). Sparse Matrix-Vector Product. In: Kindratenko, V. (eds) Numerical Computations with GPUs. Springer, Cham. https://doi.org/10.1007/978-3-319-06548-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06548-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06547-2

  • Online ISBN: 978-3-319-06548-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics