Multilayer Approach for Joint Direct and Transposed Sparse Matrix Vector Multiplication for Multithreaded CPUs

Šimeček, Ivan; Langr, Daniel; Kotenkov, Ivan

doi:10.1007/978-3-319-78024-5_5

Multilayer Approach for Joint Direct and Transposed Sparse Matrix Vector Multiplication for Multithreaded CPUs

Ivan Šimeček¹⁷,
Daniel Langr¹⁷ &
Ivan Kotenkov¹⁷

Conference paper
First Online: 23 March 2018

1550 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10777))

Abstract

One of the most common operations executed on modern high-performance computing systems is multiplication of a sparse matrix by a dense vector within a shared-memory computational node. Strongly related but far less studied problem is joint direct and transposed sparse matrix-vector multiplication, which is widely needed by certain types of iterative solvers. We propose a multilayer approach for joint sparse multiplication that balances the workload of threads. Measurements prove that our algorithm is scalable and achieve high computational performance for multiple benchmark matrices that arise from various scientific and engineering disciplines.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
This assumption limits the order of matrix to \(2^{32}-1\), but we consider this limit as not restrictive for a contemporary single-node HPC computation. Although, we plan to remove this limitations in near future.

References

Aktulga, H.M., Buluç, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS 2014, pp. 1213–1222. IEEE Computer Society, Washington (2014). https://doi.org/10.1109/IPDPS.2014.125
Axelsson, O.: Iterative Solution Methods. Cambridge University Press, Cambridge (1994)
Book MATH Google Scholar
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)
Book MATH Google Scholar
Cotofana, M.S., Cotofana, S., Stathis, P., Vassiliadis, S.: Direct and transposed sparse matrix-vector. In: Proceedings of the 2002 Euromicro Conference on Massively-Parallel Computing Systems, MPCS-2002, pp. 1–9 (2002)
Google Scholar
Davis, T.A., Hu, Y.F.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)
MathSciNet MATH Google Scholar
Intel® company: Intel® Math Kernel Library. https://software.intel.com/en-us/mkl, https://software.intel.com/en-us/mkl. Accessed 13 Aug 2017
Karsavuran, M.O., Akbudak, K., Aykanat, C.: Locality-aware parallel sparse matrix-vector and matrix-transpose-vector multiplication on many-core processors. IEEE Trans. Parallel Distrib. Syst. 27(6), 1713–1726 (2016)
Article Google Scholar
Langr, D., Šimeček, I., Tvrdík, P.: Storing sparse matrices in the adaptive-blocking hierarchical storage format. In: Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2013), pp. 479–486. IEEE Xplore Digital Library, September 2013
Google Scholar
Langr, D., Šimeček, I., Tvrdík, P., Dytrych, T., Draayer, J.P.: Adaptive-blocking hierarchical storage format for sparse matrices. In: Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2012), pp. 545–551. IEEE Xplore Digital Library (2012)
Google Scholar
Langr, D., Tvrdík, P.: Evaluation criteria for sparse matrix storage formats. IEEE Trans. Parallel Distrib. Syst. 27(2), 428–440 (2016)
Article Google Scholar
Leavitt, N.: Big iron moves toward exascale computing. Computer 45(11), 14–17 (2012)
Article Google Scholar
Liu, W., Vinter, B.: CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, pp. 339–350. ACM, New York (2015). https://doi.org/10.1145/2751205.2751209
Martone, M., Filippone, S., Paprzycki, M., Tucci, S.: On blas operations with recursively stored sparse matrices. In: 2010 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 49–56, September 2010
Google Scholar
Martone, M.: Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the recursive sparse blocks format. Parallel Comput. 40(7), 251–270 (2014). https://doi.org/10.1016/j.parco.2014.03.008
Article MathSciNet Google Scholar
Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. IBM Ltd. (1966)
Google Scholar
Nair, R.: Exascale computing. In: Padua, D. (ed.) Encycl. Parallel Comput., pp. 638–644. Springer, New York (2011). https://doi.org/10.1007/978-0-387-09766-4_284
Google Scholar
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2003)
Book MATH Google Scholar
Šimeček, I., Langr, D.: Space and execution efficient formats for modern processor architectures. In: Proceedings of the 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2015), pp. 98–105. IEEE Computer Society (2015)
Google Scholar
Tao, Y., Deng, Y., Mu, S., Zhang, Z., Zhu, M., Xiao, L., Ruan, L.: GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication. Concurr. Comput. Pract. Exp. 27(14), 3771–3789 (2015)
Article Google Scholar
Tvrdík, P., Šimeček, I.: A new diagonal blocking format and model of cache behavior for sparse matrices. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 164–171. Springer, Heidelberg (2006). https://doi.org/10.1007/11752578_21. http://dl.acm.org/citation.cfm?id=2096870.2096894
Chapter Google Scholar
Šimeček, I., Tvrdík, P.: Sparse matrix-vector multiplication - final solution? In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 156–165. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68111-3_17
Chapter Google Scholar
Šimeček, I., Langr, D., Kotenkov, I.: Multilayer approach for joint direct and transposed sparse matrix vector multiplication for multithreaded CPUs (2017). https://bitbucket.org/pctc/parallel_spmmtv/. Accessed 13 Aug 2017
Yzelman, A.J., Roose, D.: High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Trans. Parallel Distrib. Syst. 25(1), 116–125 (2014)
Article Google Scholar
Yzelman, A.J., Bisseling, R.H.: Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods. SIAM J. Sci. Comput. 31(4), 3128–3154 (2009). https://lirias.kuleuven.be/handle/123456789/319143
Article MathSciNet MATH Google Scholar

Download references

Acknowledgement

This research has been supported by CTU internal grant SGS17/215/OHK3/3T/18.

Author information

Authors and Affiliations

Department of Computer Systems, Faculty of Information Technology, Czech Technical University in Prague, Prague, Czech Republic
Ivan Šimeček, Daniel Langr & Ivan Kotenkov

Authors

Ivan Šimeček
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Langr
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Kotenkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Šimeček .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra
University of Southern California, Marina Del Rey, California, USA
Ewa Deelman
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Šimeček, I., Langr, D., Kotenkov, I. (2018). Multilayer Approach for Joint Direct and Transposed Sparse Matrix Vector Multiplication for Multithreaded CPUs. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-78024-5_5
Published: 23 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78023-8
Online ISBN: 978-3-319-78024-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics