Skip to main content

SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision

  • Conference paper
  • First Online:
High Performance Computing for Computational Science – VECPAR 2016 (VECPAR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10150))

Included in the following conference series:

Abstract

We accelerate a double-precision sparse matrix and DD vector multiplication (DD-SpMV) and its transposition and DD vector multiplication (DD-TSpMV) using SIMD AVX2. AVX2 requires changing the memory access pattern to allow four consecutive 64-bit elements to be read at once. In our previous research, DD-SpMV in CRS using AVX2 needed non-continuous memory load, processing for the remainder, and the summation of four elements in the AVX2 register. These factors degrade the performance of DD-SpMV. In this paper, we compare the storage formats of DD-SpMV and DD-TSpMV for AVX2 to eliminate the performance degradation factors in CRS. Our result indicates that BCRS4x1, whose block size fits the AVX2 register’s length, is effective for DD-SpMV and DD-TSpMV.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kouya, T.: A highly efficient implementation of multiple precision sparse matrix-vector multiplication and its application to product-type Krylov subspace methods. Int. J. Numer. Methods Appl. 7(2), 107–119 (2012)

    MathSciNet  MATH  Google Scholar 

  2. Bailey, D.H.: High-precision floating-point arithmetic in scientific computation. Comput. Sci. Eng. 7, 54–61 (2005)

    Article  Google Scholar 

  3. Intel. http://software.intel.com/en-us/articles/intel-intrinsics-guide

  4. Hishinuma, T., Fujii, A., Tanaka, T., Hasegawa, H.: AVX acceleration of DD arithmetic between a sparse matrix and vector. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013. LNCS, vol. 8384, pp. 622–631. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55224-3_58

    Chapter  Google Scholar 

  5. Barrett, R., et al.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, pp. 57–65. SIAM (1994)

    Google Scholar 

  6. Hishinuma, T., Fujii, A., Tanaka, T., Hasegawa, H.: AVX2 acceleration of double precision sparse matrix in BCRS format and DD vector product. IPSJ Trans. Adv. Comput. Syst. 7(4), 25–33 (2014). (in a Japanese)

    Google Scholar 

  7. Li, X., et al.: Design, implementation and testing of extended and mixed precision BLAS. ACS Trans. Math. Softw. 28(2), 152–205 (2002)

    Article  MathSciNet  Google Scholar 

  8. Lis: Library of Iterative Solvers for Linear Systems. http://www.ssisc.org/lis/

  9. Karakasis, V., Goumas, G., Koziris, N.: Exploring the effect of block shapes on the performance of sparse kernels. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–8 (2009)

    Google Scholar 

  10. Im, E., Yelick, K., Vuduc, R.: SPARSITY: optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl. 18(1), 135–158 (2004)

    Article  Google Scholar 

  11. Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: 27th International Conference on Supercomputing, pp. 273–282 (2013)

    Google Scholar 

  12. Dekker, T.: A floating-point technique for extending the available precision. Numerische Mathematik 18, 224–242 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  13. Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1969)

    MATH  Google Scholar 

  14. The University of Florida Sparse Matrix Collection. http://www.cise.uhl.edu/research/sparse/matrices/

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 25330144. The authors thank the reviewers for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toshiaki Hishinuma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Hishinuma, T., Hasegawa, H., Tanaka, T. (2017). SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision. In: Dutra, I., Camacho, R., Barbosa, J., Marques, O. (eds) High Performance Computing for Computational Science – VECPAR 2016. VECPAR 2016. Lecture Notes in Computer Science(), vol 10150. Springer, Cham. https://doi.org/10.1007/978-3-319-61982-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-61982-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-61981-1

  • Online ISBN: 978-3-319-61982-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics