SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision

Hishinuma, Toshiaki; Hasegawa, Hidehiko; Tanaka, Teruo

doi:10.1007/978-3-319-61982-8_4

Toshiaki Hishinuma¹⁷,
Hidehiko Hasegawa^17,18 &
Teruo Tanaka¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10150))

Included in the following conference series:

International Conference on Vector and Parallel Processing

491 Accesses
4 Citations

Abstract

We accelerate a double-precision sparse matrix and DD vector multiplication (DD-SpMV) and its transposition and DD vector multiplication (DD-TSpMV) using SIMD AVX2. AVX2 requires changing the memory access pattern to allow four consecutive 64-bit elements to be read at once. In our previous research, DD-SpMV in CRS using AVX2 needed non-continuous memory load, processing for the remainder, and the summation of four elements in the AVX2 register. These factors degrade the performance of DD-SpMV. In this paper, we compare the storage formats of DD-SpMV and DD-TSpMV for AVX2 to eliminate the performance degradation factors in CRS. Our result indicates that BCRS4x1, whose block size fits the AVX2 register’s length, is effective for DD-SpMV and DD-TSpMV.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kouya, T.: A highly efficient implementation of multiple precision sparse matrix-vector multiplication and its application to product-type Krylov subspace methods. Int. J. Numer. Methods Appl. 7(2), 107–119 (2012)
MathSciNet MATH Google Scholar
Bailey, D.H.: High-precision floating-point arithmetic in scientific computation. Comput. Sci. Eng. 7, 54–61 (2005)
Article Google Scholar
Intel. http://software.intel.com/en-us/articles/intel-intrinsics-guide
Hishinuma, T., Fujii, A., Tanaka, T., Hasegawa, H.: AVX acceleration of DD arithmetic between a sparse matrix and vector. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013. LNCS, vol. 8384, pp. 622–631. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55224-3_58
Chapter Google Scholar
Barrett, R., et al.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, pp. 57–65. SIAM (1994)
Google Scholar
Hishinuma, T., Fujii, A., Tanaka, T., Hasegawa, H.: AVX2 acceleration of double precision sparse matrix in BCRS format and DD vector product. IPSJ Trans. Adv. Comput. Syst. 7(4), 25–33 (2014). (in a Japanese)
Google Scholar
Li, X., et al.: Design, implementation and testing of extended and mixed precision BLAS. ACS Trans. Math. Softw. 28(2), 152–205 (2002)
Article MathSciNet Google Scholar
Lis: Library of Iterative Solvers for Linear Systems. http://www.ssisc.org/lis/
Karakasis, V., Goumas, G., Koziris, N.: Exploring the effect of block shapes on the performance of sparse kernels. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–8 (2009)
Google Scholar
Im, E., Yelick, K., Vuduc, R.: SPARSITY: optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl. 18(1), 135–158 (2004)
Article Google Scholar
Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: 27th International Conference on Supercomputing, pp. 273–282 (2013)
Google Scholar
Dekker, T.: A floating-point technique for extending the available precision. Numerische Mathematik 18, 224–242 (1971)
Article MathSciNet MATH Google Scholar
Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1969)
MATH Google Scholar
The University of Florida Sparse Matrix Collection. http://www.cise.uhl.edu/research/sparse/matrices/

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 25330144. The authors thank the reviewers for their helpful comments.

Author information

Authors and Affiliations

University of Tsukuba, Tsukuba, Japan
Toshiaki Hishinuma & Hidehiko Hasegawa
Kogakuin University, Tokyo, Japan
Hidehiko Hasegawa & Teruo Tanaka

Authors

Toshiaki Hishinuma
View author publications
You can also search for this author in PubMed Google Scholar
Hidehiko Hasegawa
View author publications
You can also search for this author in PubMed Google Scholar
Teruo Tanaka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toshiaki Hishinuma .

Editor information

Editors and Affiliations

University of Porto, Porto, Portugal
Inês Dutra
University of Porto, Porto, Portugal
Rui Camacho
University of Porto, Porto, Portugal
Jorge Barbosa
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Osni Marques

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hishinuma, T., Hasegawa, H., Tanaka, T. (2017). SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision. In: Dutra, I., Camacho, R., Barbosa, J., Marques, O. (eds) High Performance Computing for Computational Science – VECPAR 2016. VECPAR 2016. Lecture Notes in Computer Science(), vol 10150. Springer, Cham. https://doi.org/10.1007/978-3-319-61982-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-61982-8_4
Published: 14 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61981-1
Online ISBN: 978-3-319-61982-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics