Skip to main content

Parallel Sparse Matrix-Vector Multiplication Using Accelerators

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2016 (ICCSA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9787))

Included in the following conference series:

Abstract

Sparse matrix-vector multiplication (SpMV) is an essential computational kernel for many applications such as scientific computing. Recently, the number of computing systems equipped with NVIDIA’s GPU and Intel’s Xeon Phi coprocessor based on the MIC architecture has been increasing. Therefore, the importance of effective algorithms for SpMV in these systems is increasing. To the best of our knowledge, while previous studies have reported CPU and GPU implementations of SpMV for a cluster and MIC implementations for a single node, implementations of SpMV for the MIC cluster have not yet been reported. In this paper, we implemented and evaluated parallel SpMV on a GPU cluster and a MIC cluster. As shown by the results, the implementation for MIC achieved relatively high performance in some matrices with a single process, but it could not achieve higher performance than other implementations with 64 MPI processes. Therefore, we implemented and evaluated the single SpMV kernel to improve the performance of parallel SpMV.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. MVAPICH Benchmarks. http://mvapich.cse.ohio-state.edu/benchmarks/

  2. Davis, T.: University of Florida Sparse Matrix Collection: sparse matrices from a wide range of applications. http://www.cise.ufl.edu/research/sparse/matrices/

  3. Alexandersen, J., Lazarov, B., Dammann, B.: Parallel Sparse Matrix - Vector Product: Pure MPI and hybrid MPI-OpenMP implementation. IMM-Technical report-2012 (2012)

    Google Scholar 

  4. Catalyurek, U., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10(7), 673–693 (1999)

    Article  Google Scholar 

  5. Cevahir, A., Nukada, A., Matsuoka, S.: CG on GPU-enhanced clusters. IPSJ SIG Tech. Rep. 2009(15), 1–8 (2009)

    Google Scholar 

  6. Kudo, M., Kuroda, H., Katagiri, T., Kanada, Y.: The effect of optimal algorithm selection of parallel sparse matrix-vector multiplication. IPSJ SIG Tech. Rep. 2002(22), 151–156 (2002). (in Japanese)

    Google Scholar 

  7. Lange, M., Gorman, G., Weiland, M., Mitchell, L., Southern, J.: Achieving efficient strong scaling with PETSc using hybrid MPI/OpenMP optimisation. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 97–108. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Liu, W., Vinter, B.: bhSPARSEBenchmark SpMV using CSR5. https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5

  9. Liu, W., Vinter, B.: CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. CoRR abs/1503.05032 (2015)

    Google Scholar 

  10. Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ICS 2013, pp. 273–282. ACM (2013)

    Google Scholar 

  11. Maeda, H., Takahashi, D.: Performance evaluation of sparse matrix-vector multiplication using GPU/MIC cluster. In: 2015 Third International Symposium on Computing and Networking (CANDAR 2015). 3rd International Workshop on Computer Systems and Architectures (CSA 2015), pp. 396–399 (2015)

    Google Scholar 

  12. Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 111–125. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Ohshima, S., Sakurai, T., Katagiri, T., Nakajima, K., Kuroda, H., Naono, K., Igai, M., Itoh, S.: Optimized implementation of segmented scan method for CUDA. IPSJ Tech. Rep. 2010-HPC-126(1), 1–7 (2010). (in Japanese)

    Google Scholar 

  14. Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. SC 1999. ACM (1999)

    Google Scholar 

  15. Saule, E., Kaya, K.: Performance evaluation of sparse matrix multiplication kernels on intel Xeon Phi. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) Parallel Processing and Applied Mathematics. LNCS, vol. 8384, pp. 559–570. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  16. Tang, W.T., Tan, W.J., Ray, R., Wong, Y.W., Chen, W., Kuo, S., Goh, R.S.M., Turner, S.J., Wong, W.: Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. SC 2013, pp. 26:1–26:12 (2013)

    Google Scholar 

  17. Ye, F., Calvin, C., Petiton, S.G.: A study of SpMV implementation using MPI and OpenMP on intel many-core architecture. In: Daydé, M., Marques, O., Nakajima, K. (eds.) VECPAR 2014. LNCS, vol. 8969, pp. 43–56. Springer, Heidelberg (2015)

    Google Scholar 

Download references

Acknowledgments

This research was supported by Core Research for Evolutional Science and Technology (CREST) of Japan Science and Technology Agency (JST).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daisuke Takahashi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Maeda, H., Takahashi, D. (2016). Parallel Sparse Matrix-Vector Multiplication Using Accelerators. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9787. Springer, Cham. https://doi.org/10.1007/978-3-319-42108-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42108-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42107-0

  • Online ISBN: 978-3-319-42108-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics