Abstract
One of the most common operations executed on modern high-performance computing systems is multiplication of a sparse matrix by a dense vector within a shared-memory computational node. Strongly related but far less studied problem is joint direct and transposed sparse matrix-vector multiplication, which is widely needed by certain types of iterative solvers. We propose a multilayer approach for joint sparse multiplication that balances the workload of threads. Measurements prove that our algorithm is scalable and achieve high computational performance for multiple benchmark matrices that arise from various scientific and engineering disciplines.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
This assumption limits the order of matrix to \(2^{32}-1\), but we consider this limit as not restrictive for a contemporary single-node HPC computation. Although, we plan to remove this limitations in near future.
References
Aktulga, H.M., Buluç, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS 2014, pp. 1213–1222. IEEE Computer Society, Washington (2014). https://doi.org/10.1109/IPDPS.2014.125
Axelsson, O.: Iterative Solution Methods. Cambridge University Press, Cambridge (1994)
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)
Cotofana, M.S., Cotofana, S., Stathis, P., Vassiliadis, S.: Direct and transposed sparse matrix-vector. In: Proceedings of the 2002 Euromicro Conference on Massively-Parallel Computing Systems, MPCS-2002, pp. 1–9 (2002)
Davis, T.A., Hu, Y.F.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)
Intel® company: Intel® Math Kernel Library. https://software.intel.com/en-us/mkl, https://software.intel.com/en-us/mkl. Accessed 13 Aug 2017
Karsavuran, M.O., Akbudak, K., Aykanat, C.: Locality-aware parallel sparse matrix-vector and matrix-transpose-vector multiplication on many-core processors. IEEE Trans. Parallel Distrib. Syst. 27(6), 1713–1726 (2016)
Langr, D., Šimeček, I., Tvrdík, P.: Storing sparse matrices in the adaptive-blocking hierarchical storage format. In: Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2013), pp. 479–486. IEEE Xplore Digital Library, September 2013
Langr, D., Šimeček, I., Tvrdík, P., Dytrych, T., Draayer, J.P.: Adaptive-blocking hierarchical storage format for sparse matrices. In: Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS 2012), pp. 545–551. IEEE Xplore Digital Library (2012)
Langr, D., Tvrdík, P.: Evaluation criteria for sparse matrix storage formats. IEEE Trans. Parallel Distrib. Syst. 27(2), 428–440 (2016)
Leavitt, N.: Big iron moves toward exascale computing. Computer 45(11), 14–17 (2012)
Liu, W., Vinter, B.: CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, pp. 339–350. ACM, New York (2015). https://doi.org/10.1145/2751205.2751209
Martone, M., Filippone, S., Paprzycki, M., Tucci, S.: On blas operations with recursively stored sparse matrices. In: 2010 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 49–56, September 2010
Martone, M.: Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the recursive sparse blocks format. Parallel Comput. 40(7), 251–270 (2014). https://doi.org/10.1016/j.parco.2014.03.008
Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. IBM Ltd. (1966)
Nair, R.: Exascale computing. In: Padua, D. (ed.) Encycl. Parallel Comput., pp. 638–644. Springer, New York (2011). https://doi.org/10.1007/978-0-387-09766-4_284
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2003)
Šimeček, I., Langr, D.: Space and execution efficient formats for modern processor architectures. In: Proceedings of the 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2015), pp. 98–105. IEEE Computer Society (2015)
Tao, Y., Deng, Y., Mu, S., Zhang, Z., Zhu, M., Xiao, L., Ruan, L.: GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication. Concurr. Comput. Pract. Exp. 27(14), 3771–3789 (2015)
Tvrdík, P., Šimeček, I.: A new diagonal blocking format and model of cache behavior for sparse matrices. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 164–171. Springer, Heidelberg (2006). https://doi.org/10.1007/11752578_21. http://dl.acm.org/citation.cfm?id=2096870.2096894
Šimeček, I., Tvrdík, P.: Sparse matrix-vector multiplication - final solution? In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 156–165. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68111-3_17
Šimeček, I., Langr, D., Kotenkov, I.: Multilayer approach for joint direct and transposed sparse matrix vector multiplication for multithreaded CPUs (2017). https://bitbucket.org/pctc/parallel_spmmtv/. Accessed 13 Aug 2017
Yzelman, A.J., Roose, D.: High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Trans. Parallel Distrib. Syst. 25(1), 116–125 (2014)
Yzelman, A.J., Bisseling, R.H.: Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods. SIAM J. Sci. Comput. 31(4), 3128–3154 (2009). https://lirias.kuleuven.be/handle/123456789/319143
Acknowledgement
This research has been supported by CTU internal grant SGS17/215/OHK3/3T/18.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Šimeček, I., Langr, D., Kotenkov, I. (2018). Multilayer Approach for Joint Direct and Transposed Sparse Matrix Vector Multiplication for Multithreaded CPUs. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-78024-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78023-8
Online ISBN: 978-3-319-78024-5
eBook Packages: Computer ScienceComputer Science (R0)