Skip to main content

Multilayer Approach for Joint Direct and Transposed Sparse Matrix Vector Multiplication for Multithreaded CPUs

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10777))

Abstract

One of the most common operations executed on modern high-performance computing systems is multiplication of a sparse matrix by a dense vector within a shared-memory computational node. Strongly related but far less studied problem is joint direct and transposed sparse matrix-vector multiplication, which is widely needed by certain types of iterative solvers. We propose a multilayer approach for joint sparse multiplication that balances the workload of threads. Measurements prove that our algorithm is scalable and achieve high computational performance for multiple benchmark matrices that arise from various scientific and engineering disciplines.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This assumption limits the order of matrix to \(2^{32}-1\), but we consider this limit as not restrictive for a contemporary single-node HPC computation. Although, we plan to remove this limitations in near future.

References

  1. Aktulga, H.M., Buluç, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS 2014, pp. 1213–1222. IEEE Computer Society, Washington (2014). https://doi.org/10.1109/IPDPS.2014.125

  2. Axelsson, O.: Iterative Solution Methods. Cambridge University Press, Cambridge (1994)

    Book  MATH  Google Scholar 

  3. Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)

    Book  MATH  Google Scholar 

  4. Cotofana, M.S., Cotofana, S., Stathis, P., Vassiliadis, S.: Direct and transposed sparse matrix-vector. In: Proceedings of the 2002 Euromicro Conference on Massively-Parallel Computing Systems, MPCS-2002, pp. 1–9 (2002)

    Google Scholar 

  5. Davis, T.A., Hu, Y.F.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)

    MathSciNet  MATH  Google Scholar 

  6. Intel® company: Intel® Math Kernel Library. https://software.intel.com/en-us/mkl, https://software.intel.com/en-us/mkl. Accessed 13 Aug 2017

  7. Karsavuran, M.O., Akbudak, K., Aykanat, C.: Locality-aware parallel sparse matrix-vector and matrix-transpose-vector multiplication on many-core processors. IEEE Trans. Parallel Distrib. Syst. 27(6), 1713–1726 (2016)

    Article  Google Scholar 

  8. Langr, D., Šimeček, I., Tvrdík, P.: Storing sparse matrices in the adaptive-blocking hierarchical storage format. In: Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2013), pp. 479–486. IEEE Xplore Digital Library, September 2013

    Google Scholar 

  9. Langr, D., Šimeček, I., Tvrdík, P., Dytrych, T., Draayer, J.P.: Adaptive-blocking hierarchical storage format for sparse matrices. In: Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2012), pp. 545–551. IEEE Xplore Digital Library (2012)

    Google Scholar 

  10. Langr, D., Tvrdík, P.: Evaluation criteria for sparse matrix storage formats. IEEE Trans. Parallel Distrib. Syst. 27(2), 428–440 (2016)

    Article  Google Scholar 

  11. Leavitt, N.: Big iron moves toward exascale computing. Computer 45(11), 14–17 (2012)

    Article  Google Scholar 

  12. Liu, W., Vinter, B.: CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, pp. 339–350. ACM, New York (2015). https://doi.org/10.1145/2751205.2751209

  13. Martone, M., Filippone, S., Paprzycki, M., Tucci, S.: On blas operations with recursively stored sparse matrices. In: 2010 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 49–56, September 2010

    Google Scholar 

  14. Martone, M.: Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the recursive sparse blocks format. Parallel Comput. 40(7), 251–270 (2014). https://doi.org/10.1016/j.parco.2014.03.008

    Article  MathSciNet  Google Scholar 

  15. Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. IBM Ltd. (1966)

    Google Scholar 

  16. Nair, R.: Exascale computing. In: Padua, D. (ed.) Encycl. Parallel Comput., pp. 638–644. Springer, New York (2011). https://doi.org/10.1007/978-0-387-09766-4_284

    Google Scholar 

  17. Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2003)

    Book  MATH  Google Scholar 

  18. Šimeček, I., Langr, D.: Space and execution efficient formats for modern processor architectures. In: Proceedings of the 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2015), pp. 98–105. IEEE Computer Society (2015)

    Google Scholar 

  19. Tao, Y., Deng, Y., Mu, S., Zhang, Z., Zhu, M., Xiao, L., Ruan, L.: GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication. Concurr. Comput. Pract. Exp. 27(14), 3771–3789 (2015)

    Article  Google Scholar 

  20. Tvrdík, P., Šimeček, I.: A new diagonal blocking format and model of cache behavior for sparse matrices. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 164–171. Springer, Heidelberg (2006). https://doi.org/10.1007/11752578_21. http://dl.acm.org/citation.cfm?id=2096870.2096894

    Chapter  Google Scholar 

  21. Šimeček, I., Tvrdík, P.: Sparse matrix-vector multiplication - final solution? In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 156–165. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68111-3_17

    Chapter  Google Scholar 

  22. Šimeček, I., Langr, D., Kotenkov, I.: Multilayer approach for joint direct and transposed sparse matrix vector multiplication for multithreaded CPUs (2017). https://bitbucket.org/pctc/parallel_spmmtv/. Accessed 13 Aug 2017

  23. Yzelman, A.J., Roose, D.: High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Trans. Parallel Distrib. Syst. 25(1), 116–125 (2014)

    Article  Google Scholar 

  24. Yzelman, A.J., Bisseling, R.H.: Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods. SIAM J. Sci. Comput. 31(4), 3128–3154 (2009). https://lirias.kuleuven.be/handle/123456789/319143

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

This research has been supported by CTU internal grant SGS17/215/OHK3/3T/18.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Šimeček .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Šimeček, I., Langr, D., Kotenkov, I. (2018). Multilayer Approach for Joint Direct and Transposed Sparse Matrix Vector Multiplication for Multithreaded CPUs. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78024-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78023-8

  • Online ISBN: 978-3-319-78024-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics