Abstract
We accelerate a double-precision sparse matrix and DD vector multiplication (DD-SpMV) and its transposition and DD vector multiplication (DD-TSpMV) using SIMD AVX2. AVX2 requires changing the memory access pattern to allow four consecutive 64-bit elements to be read at once. In our previous research, DD-SpMV in CRS using AVX2 needed non-continuous memory load, processing for the remainder, and the summation of four elements in the AVX2 register. These factors degrade the performance of DD-SpMV. In this paper, we compare the storage formats of DD-SpMV and DD-TSpMV for AVX2 to eliminate the performance degradation factors in CRS. Our result indicates that BCRS4x1, whose block size fits the AVX2 register’s length, is effective for DD-SpMV and DD-TSpMV.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kouya, T.: A highly efficient implementation of multiple precision sparse matrix-vector multiplication and its application to product-type Krylov subspace methods. Int. J. Numer. Methods Appl. 7(2), 107–119 (2012)
Bailey, D.H.: High-precision floating-point arithmetic in scientific computation. Comput. Sci. Eng. 7, 54–61 (2005)
Intel. http://software.intel.com/en-us/articles/intel-intrinsics-guide
Hishinuma, T., Fujii, A., Tanaka, T., Hasegawa, H.: AVX acceleration of DD arithmetic between a sparse matrix and vector. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013. LNCS, vol. 8384, pp. 622–631. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55224-3_58
Barrett, R., et al.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, pp. 57–65. SIAM (1994)
Hishinuma, T., Fujii, A., Tanaka, T., Hasegawa, H.: AVX2 acceleration of double precision sparse matrix in BCRS format and DD vector product. IPSJ Trans. Adv. Comput. Syst. 7(4), 25–33 (2014). (in a Japanese)
Li, X., et al.: Design, implementation and testing of extended and mixed precision BLAS. ACS Trans. Math. Softw. 28(2), 152–205 (2002)
Lis: Library of Iterative Solvers for Linear Systems. http://www.ssisc.org/lis/
Karakasis, V., Goumas, G., Koziris, N.: Exploring the effect of block shapes on the performance of sparse kernels. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–8 (2009)
Im, E., Yelick, K., Vuduc, R.: SPARSITY: optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl. 18(1), 135–158 (2004)
Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: 27th International Conference on Supercomputing, pp. 273–282 (2013)
Dekker, T.: A floating-point technique for extending the available precision. Numerische Mathematik 18, 224–242 (1971)
Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1969)
The University of Florida Sparse Matrix Collection. http://www.cise.uhl.edu/research/sparse/matrices/
Acknowledgments
This work was supported by JSPS KAKENHI Grant Number 25330144. The authors thank the reviewers for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Hishinuma, T., Hasegawa, H., Tanaka, T. (2017). SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision. In: Dutra, I., Camacho, R., Barbosa, J., Marques, O. (eds) High Performance Computing for Computational Science – VECPAR 2016. VECPAR 2016. Lecture Notes in Computer Science(), vol 10150. Springer, Cham. https://doi.org/10.1007/978-3-319-61982-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-61982-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61981-1
Online ISBN: 978-3-319-61982-8
eBook Packages: Computer ScienceComputer Science (R0)