Parallel Sparse Matrix-Vector Multiplication Using Accelerators

Maeda, Hiroshi; Takahashi, Daisuke

doi:10.1007/978-3-319-42108-7_1

Hiroshi Maeda²² &
Daisuke Takahashi²³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9787))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1752 Accesses
4 Citations

Abstract

Sparse matrix-vector multiplication (SpMV) is an essential computational kernel for many applications such as scientific computing. Recently, the number of computing systems equipped with NVIDIA’s GPU and Intel’s Xeon Phi coprocessor based on the MIC architecture has been increasing. Therefore, the importance of effective algorithms for SpMV in these systems is increasing. To the best of our knowledge, while previous studies have reported CPU and GPU implementations of SpMV for a cluster and MIC implementations for a single node, implementations of SpMV for the MIC cluster have not yet been reported. In this paper, we implemented and evaluated parallel SpMV on a GPU cluster and a MIC cluster. As shown by the results, the implementation for MIC achieved relatively high performance in some matrices with a single process, but it could not achieve higher performance than other implementations with 64 MPI processes. Therefore, we implemented and evaluated the single SpMV kernel to improve the performance of parallel SpMV.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Study of SpMV Implementation Using MPI and OpenMP on Intel Many-Core Architecture

Parallel Sparse Matrix Vector Multiplication on Intel MIC: Performance Analysis

Optimizing sparse general matrix–matrix multiplication for DCUs

Article Open access 30 May 2024

References

MVAPICH Benchmarks. http://mvapich.cse.ohio-state.edu/benchmarks/
Davis, T.: University of Florida Sparse Matrix Collection: sparse matrices from a wide range of applications. http://www.cise.ufl.edu/research/sparse/matrices/
Alexandersen, J., Lazarov, B., Dammann, B.: Parallel Sparse Matrix - Vector Product: Pure MPI and hybrid MPI-OpenMP implementation. IMM-Technical report-2012 (2012)
Google Scholar
Catalyurek, U., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10(7), 673–693 (1999)
Article Google Scholar
Cevahir, A., Nukada, A., Matsuoka, S.: CG on GPU-enhanced clusters. IPSJ SIG Tech. Rep. 2009(15), 1–8 (2009)
Google Scholar
Kudo, M., Kuroda, H., Katagiri, T., Kanada, Y.: The effect of optimal algorithm selection of parallel sparse matrix-vector multiplication. IPSJ SIG Tech. Rep. 2002(22), 151–156 (2002). (in Japanese)
Google Scholar
Lange, M., Gorman, G., Weiland, M., Mitchell, L., Southern, J.: Achieving efficient strong scaling with PETSc using hybrid MPI/OpenMP optimisation. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 97–108. Springer, Heidelberg (2013)
Chapter Google Scholar
Liu, W., Vinter, B.: bhSPARSEBenchmark SpMV using CSR5. https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5
Liu, W., Vinter, B.: CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. CoRR abs/1503.05032 (2015)
Google Scholar
Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ICS 2013, pp. 273–282. ACM (2013)
Google Scholar
Maeda, H., Takahashi, D.: Performance evaluation of sparse matrix-vector multiplication using GPU/MIC cluster. In: 2015 Third International Symposium on Computing and Networking (CANDAR 2015). 3rd International Workshop on Computer Systems and Architectures (CSA 2015), pp. 396–399 (2015)
Google Scholar
Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 111–125. Springer, Heidelberg (2010)
Chapter Google Scholar
Ohshima, S., Sakurai, T., Katagiri, T., Nakajima, K., Kuroda, H., Naono, K., Igai, M., Itoh, S.: Optimized implementation of segmented scan method for CUDA. IPSJ Tech. Rep. 2010-HPC-126(1), 1–7 (2010). (in Japanese)
Google Scholar
Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. SC 1999. ACM (1999)
Google Scholar
Saule, E., Kaya, K.: Performance evaluation of sparse matrix multiplication kernels on intel Xeon Phi. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) Parallel Processing and Applied Mathematics. LNCS, vol. 8384, pp. 559–570. Springer, Heidelberg (2014)
Chapter Google Scholar
Tang, W.T., Tan, W.J., Ray, R., Wong, Y.W., Chen, W., Kuo, S., Goh, R.S.M., Turner, S.J., Wong, W.: Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. SC 2013, pp. 26:1–26:12 (2013)
Google Scholar
Ye, F., Calvin, C., Petiton, S.G.: A study of SpMV implementation using MPI and OpenMP on intel many-core architecture. In: Daydé, M., Marques, O., Nakajima, K. (eds.) VECPAR 2014. LNCS, vol. 8969, pp. 43–56. Springer, Heidelberg (2015)
Google Scholar

Download references

Acknowledgments

This research was supported by Core Research for Evolutional Science and Technology (CREST) of Japan Science and Technology Agency (JST).

Author information

Authors and Affiliations

Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8573, Japan
Hiroshi Maeda
Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
Daisuke Takahashi

Authors

Hiroshi Maeda
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daisuke Takahashi .

Editor information

Editors and Affiliations

University of Perugia , Perugia, Italy
Osvaldo Gervasi
University of Basilicata , Potenza, Italy
Beniamino Murgante
Covenant University , Ota, Nigeria
Sanjay Misra
University of Minho , Braga, Portugal
Ana Maria A.C. Rocha
Polytechnic University , Bari, Italy
Carmelo M. Torre
Monash University , Clayton, Victoria, Australia
David Taniar
Kyushu Sangyo University , Fukuoka, Japan
Bernady O. Apduhan
Saint Petersburg State University , Saint Petersburg, Russia
Elena Stankova
Beijing University of Posts & Telecommunication , Beijing, China
Shangguang Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maeda, H., Takahashi, D. (2016). Parallel Sparse Matrix-Vector Multiplication Using Accelerators. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9787. Springer, Cham. https://doi.org/10.1007/978-3-319-42108-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-42108-7_1
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42107-0
Online ISBN: 978-3-319-42108-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallel Sparse Matrix-Vector Multiplication Using Accelerators

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Study of SpMV Implementation Using MPI and OpenMP on Intel Many-Core Architecture

Parallel Sparse Matrix Vector Multiplication on Intel MIC: Performance Analysis

Optimizing sparse general matrix–matrix multiplication for DCUs

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Parallel Sparse Matrix-Vector Multiplication Using Accelerators

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Study of SpMV Implementation Using MPI and OpenMP on Intel Many-Core Architecture

Parallel Sparse Matrix Vector Multiplication on Intel MIC: Performance Analysis

Optimizing sparse general matrix–matrix multiplication for DCUs

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation