Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Patwary, Md. Mostofa Ali; Satish, Nadathur Rajagopalan; Sundaram, Narayanan; Park, Jongsoo; Anderson, Michael J.; Vadlamudi, Satya Gautam; Das, Dipankar; Pudov, Sergey G.; Pirogov, Vadim O.; Dubey, Pradeep

doi:10.1007/978-3-319-20119-1_4

Md. Mostofa Ali Patwary¹⁵,
Nadathur Rajagopalan Satish¹⁵,
Narayanan Sundaram¹⁵,
Jongsoo Park¹⁵,
Michael J. Anderson¹⁵,
Satya Gautam Vadlamudi¹⁵,
Dipankar Das¹⁵,
Sergey G. Pudov¹⁶,
Vadim O. Pirogov¹⁶ &
…
Pradeep Dubey¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9137))

Included in the following conference series:

International Conference on High Performance Computing

3902 Accesses

Abstract

Sparse matrix-matrix multiplication (SpGEMM) is a key kernel in many applications in High Performance Computing such as algebraic multigrid solvers and graph analytics. Optimizing SpGEMM on modern processors is challenging due to random data accesses, poor data locality and load imbalance during computation. In this work, we investigate different partitioning techniques, cache optimizations (using dense arrays instead of hash tables), and dynamic load balancing on SpGEMM using a diverse set of real-world and synthetic datasets. We demonstrate that our implementation outperforms the state-of-the-art using Intel$^{{\textregistered }}$ Xeon$^{{\textregistered }}$ processors. We are up to 3.8X faster than Intel$^{{\textregistered }}$ Math Kernel Library (MKL) and up to 257X faster than CombBLAS. We also outperform the best published GPU implementation of SpGEMM on nVidia GTX Titan and on AMD Radeon HD 7970 by up to 7.3X and 4.5X, respectively on their published datasets. We demonstrate good multi-core scalability (geomean speedup of 18.2X using 28 threads) as compared to MKL which gets 7.5X scaling on 28 threads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs

Article Open access 11 March 2024

Parallelization of Sparse Matrix Kernels for Big Data Applications

Design Principles for Sparse Matrix Multiplication on the GPU

Notes

1.
Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.
2.
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel micro-architecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804.

References

Combinatorial Blas v 1.3. http://gauss.cs.ucsb.edu/~aydin/CombBLAS/html/
Thread affinity interface. https://software.intel.com/en-us/node/522691
Intel math kernel library (2015). https://software.intel.com/en-us/intel-mkl
Bell, N., Dalton, S., Olson, L.N.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012)
Article MATH MathSciNet Google Scholar
Buluc, A., Gilbert, J.: On the representation and multiplication of hypersparse matrices. In: Proceedings of IPDPS, pp. 1–11, April 2008
Google Scholar
Buluç, A., Gilbert, J.R.: Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments. CoRR abs/1109.3739 (2011)
Google Scholar
Chan, T.M.: More algorithms for all-pairs shortest paths in weighted graphs. SIAM J. Comput. 39(5), 2075–2089 (2010)
Article MATH MathSciNet Google Scholar
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)
MathSciNet Google Scholar
Gilbert, J., Moler, C., Schreiber, R.: Sparse matrices in matlab: design and implementation. SIAM J. Matrix Anal. Appl. 13(1), 333–356 (1992)
Article MATH MathSciNet Google Scholar
Gilbert, J.R., Reinhardt, S., Shah, V.B.: High-performance graph algorithms from parallel sparse matrices. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 260–269. Springer, Heidelberg (2007)
Google Scholar
Gustavson, F.G.: Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM Trans. Math. Softw. 4(3), 250–269 (1978)
Article MATH MathSciNet Google Scholar
Kaplan, H., Sharir, M., Verbin, E.: Colored intersection searching via sparse rectangular matrix multiplication. In: Symposium on Computational Geometry, pp. 52–60. ACM (2006)
Google Scholar
Liu, W., Vinter, B.: An efficient GPU general sparse matrix-matrix multiplication for irregular data. In: Proceedings of IPDPS, pp. 370–381. IEEE (2014)
Google Scholar
Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray User’s Group (2010)
Google Scholar
Siegel, J., et al.: Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems. In: IEEE Cluster Computing, pp. 1–8 (2010)
Google Scholar
Sulatycke, P., Ghose, K.: Caching-efficient multithreaded fast multiplication of sparse matrices. In: Proceedings of IPPS/SPDP 1998, pp. 117–123, March 1998
Google Scholar
Vassilevska, V., Williams, R., Yuster, R.: Finding heaviest h-subgraphs in real weighted graphs, with applications. CoRR abs/cs/0609009 (2006)
Google Scholar
Zhu, Q., Graf, T., Sumbul, H., Pileggi, L., Franchetti, F.: Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware. In: IEEE HPEC, pp. 1–6 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Parallel Computing Lab, Intel Corporation, Santa Clara, USA
Md. Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jongsoo Park, Michael J. Anderson, Satya Gautam Vadlamudi, Dipankar Das & Pradeep Dubey
Software and Services Group, Intel Corporation, Santa Clara, USA
Sergey G. Pudov & Vadim O. Pirogov

Authors

Md. Mostofa Ali Patwary
View author publications
You can also search for this author in PubMed Google Scholar
Nadathur Rajagopalan Satish
View author publications
You can also search for this author in PubMed Google Scholar
Narayanan Sundaram
View author publications
You can also search for this author in PubMed Google Scholar
Jongsoo Park
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Satya Gautam Vadlamudi
View author publications
You can also search for this author in PubMed Google Scholar
Dipankar Das
View author publications
You can also search for this author in PubMed Google Scholar
Sergey G. Pudov
View author publications
You can also search for this author in PubMed Google Scholar
Vadim O. Pirogov
View author publications
You can also search for this author in PubMed Google Scholar
Pradeep Dubey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Md. Mostofa Ali Patwary .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Julian M. Kunkel
Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Thomas Ludwig

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patwary, M.M.A. et al. (2015). Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-20119-1_4
Published: 20 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20118-4
Online ISBN: 978-3-319-20119-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics