Parallelization of Sparse Matrix Kernels for Big Data Applications

Selvitopi, Oguz; Akbudak, Kadir; Aykanat, Cevdet

doi:10.1007/978-3-319-44881-7_17

Oguz Selvitopi⁵,
Kadir Akbudak⁵ &
Cevdet Aykanat⁵

Part of the book series: Computer Communications and Networks ((CCN))

1578 Accesses

Abstract

Analysis of big data on large-scale distributed systems often necessitates efficient parallel graph algorithms that are used to explore the relationships between individual components. Graph algorithms use the basic adjacency list representation for graphs, which can also be viewed as a sparse matrix. This correspondence between representation of graphs and sparse matrices makes it possible to express many important graph algorithms in terms of basic sparse matrix operations, where the literature for optimization is more mature. For example, the graph analytic libraries such as Pegasus and Combinatorial BLAS use sparse matrix kernels for a wide variety of operations on graphs. In this work, we focus on two such important sparse matrix kernels: Sparse matrix–sparse matrix multiplication (SpGEMM) and sparse matrix–dense matrix multiplication (SpMM). We propose partitioning models for efficient parallelization of these kernels on large-scale distributed systems. Our models aim at reducing and improving communication volume while balancing computational load, which are two vital performance metrics on distributed systems. We show that by exploiting sparsity patterns of the matrices through our models, the parallel performance of SpGEMM and SpMM operations can be significantly improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Scalable parallel graph algorithms with matrix–vector multiplication evaluated with queries

Article 29 August 2017

Accelerating Processing of Scale-Free Graphs on Massively-Parallel Architectures

References

Intel math kernel library (2015). https://software.intel.com/en-us/intel-mkl
Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, pp. 1–11. IEEE Computer Society, Washington, DC, USA (2010). doi:10.1109/SC.2010.46
Akbudak, K., Aykanat, C.: Simultaneous input and output matrix partitioning for outer-product–parallel sparse matrix-matrix multiplication. SIAM J. Sci. Comput. 36(5), C568–C590 (2014). doi:10.1137/13092589X
Google Scholar
Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan S., Ramamritham K., Kumar A., Ravindra M.P., Bertino E., Kumar R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM Press (2011)
Google Scholar
Boldi, P., Vigna, S.: The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601. ACM Press, Manhattan (2004)
Google Scholar
Boman, E., Devine, K., Heaphy, R., Hendrickson, B., Heroux, M., Preis, R.: LDRD report: Parallel repartitioning for optimal solver performance. Tech. Rep. SAND2004–0365, Sandia National Laboratories, Albuquerque, NM (2004)
Google Scholar
Buluç, A., Gilbert, J.R.: Parallel sparse matrix-matrix multiplication and indexing: implementation and experiments. SIAM J. Sci. Comput. (SISC) 34(4), 170–191 (2012). doi:10.1137/110848244; http://gauss.cs.ucsb.edu/~aydin/spgemm_sisc12.pdf
Google Scholar
Buluç, A., Madduri, K.: Parallel breadth-first search on distributed memory systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pp. 65:1–65:12. ACM, New York, NY, USA (2011). doi:10.1145/2063384.2063471; http://doi.acm.org/10.1145/2063384.2063471
Catalyurek, U.V., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10(7), 673–693 (1999)
Article Google Scholar
CP2K: CP2K home page (Accessed at 2015). http://www.cp2k.org/
D’Alberto, P., Nicolau, A.: R-kleene: A high-performance divide-and-conquer algorithm for the all-pair shortest path for densely connected networks. Algorithmica 47(2), 203–213 (2007). doi:10.1007/s00453-006-1224-z
Google Scholar
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1 (2011)
MathSciNet Google Scholar
Dostál, Z., Horák, D., Kučera, R.: Total FETI-an easier implementable variant of the FETI method for numerical solution of elliptic PDE. Commun. Numer. Meth. Eng. 22(12), 1155–1162 (2006)
Article MathSciNet MATH Google Scholar
Feng, Y., Owen, D., Peri, D.: A block conjugate gradient method applied to linear systems with multiple right-hand sides. Comput. Meth. Appl. Mech. Eng. 127(14), 203–215 (1995). http://dx.doi.org/10.1016/0045-7825(95)00832-2; http://www.sciencedirect.com/science/article/pii/0045782595008322
Google Scholar
Heroux, M.A., Bartlett, R.A., Howle, V.E., Hoekstra, R.J., Hu, J.J., Kolda, T.G., Lehoucq, R.B., Long, K.R., Pawlowski, R.P., Phipps, E.T., et al.: An overview of the Trilinos project. ACM Trans. Math. Softw. (TOMS) 31(3), 397–423 (2005)
Article MathSciNet MATH Google Scholar
Horowitz, E., Sahni, S.: Fundamentals of Computer Algorithms. Computer Science Press (1978)
Google Scholar
Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: A peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM ’09, pp. 229–238. IEEE Computer Society, Washington, DC, USA (2009). doi:10.1109/ICDM.2009.14
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014)
Marion-Poty, V., Lefer, W.: A wavelet decomposition scheme and compression method for streamline-based vector field visualizations. Comput. Graphics 26(6), 899–906 (2002). doi:10.1016/S0097-8493(02)00178-4; http://www.sciencedirect.com/science/article/pii/S0097849302001784
Google Scholar
Mattson, T., Bader, D., Berry, J., Buluc, A., Dongarra, J., Faloutsos, C., Feo, J., Gilbert, J., Gonzalez, J., Hendrickson, B., Kepner, J., Leiserson, C., Lumsdaine, A., Padua, D., Poole, S., Reinhardt, S., Stonebraker, M., Wallach, S., Yoo, A.: Standards for Graph Algorithm Primitives. ArXiv e-prints (2014)
Google Scholar
NVIDIA Corporation: CUSPARSE library (2010)
Google Scholar
O’Leary, D.P.: The block conjugate gradient algorithm and related methods. Linear Algebra Appl. 29(0), 293–322 (1980). http://dx.doi.org/10.1016/0024-3795(80)90247-5; http://www.sciencedirect.com/science/article/pii/0024379580902475. Special Volume Dedicated to Alson S. Householder
Google Scholar
O’Leary, D.P.: Parallel implementation of the block conjugate gradient algorithm. Parallel Comput. 5(12), 127–139 (1987). http://dx.doi.org/10.1016/0167-8191(87)90013-5; http://www.sciencedirect.com/science/article/pii/0167819187900135. Proceedings of the International Conference on Vector and Parallel Computing-Issues in Applied Research and Development
Sarıyuce, A.E., Saule, E., Kaya, K., Çatalyurek, U.V.: Regularizing graph centrality computations. J. Parallel Distrib. Comput. 76(0), 106–119 (2015). http://dx.doi.org/10.1016/j.jpdc.2014.07.006; http://www.sciencedirect.com/science/article/pii/S0743731514001282. Special Issue on Architecture and Algorithms for Irregular Applications
Sawyer, W., Messmer, P.: Parallel grid manipulations for general circulation models. In: Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science, vol. 2328, pp. 605–608. Springer, Berlin (2006)
Google Scholar
Selvitopi, O., Aykanat, C.: Reducing latency cost in 2D sparse matrix partitioning models. Parallel Comput. 57, 1–24 (2016). http://dx.doi.org/10.1016/j.parco.2016.04.004; http://www.sciencedirect.com/science/article/pii/S0167819116300138
Google Scholar
Selvitopi, R.O., Ozdal, M.M., Aykanat, C.: A novel method for scaling iterative solvers: avoiding latency overhead of parallel sparse-matrix vector multiplies. IEEE Trans. Parallel Distrib. Syst. 26(3), 632–645 (2015). doi:10.1109/TPDS.2014.2311804
Article Google Scholar
Shi, Z., Zhang, B.: Fast network centrality analysis using gpus. BMC Bioinf. 12(1), 149 (2011). doi:10.1186/1471-2105-12-149
Uçar, B., Aykanat, C.: Encapsulating multiple communication-cost metrics in partitioning sparse rectangular matrices for parallel matrix-vector multiplies. SIAM J. Sci. Comput. 25(6), 1837–1859 (2004). doi:10.1137/S1064827502410463
Google Scholar
Van De Geijn, R.A., Watts, J.: Summa: scalable universal matrix multiplication algorithm. Concurrency-Pract. Experience 9(4), 255–274 (1997)
Article Google Scholar

Download references

Acknowledgments

This work was supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under Grant EEEAG-115E212. This article is also based upon work from COST Action IC1406 (cHiPSet).

Author information

Authors and Affiliations

Department of Computer Engineering, Bilkent University, 06800, Cankaya, Ankara, Turkey
Oguz Selvitopi, Kadir Akbudak & Cevdet Aykanat

Authors

Oguz Selvitopi
View author publications
You can also search for this author in PubMed Google Scholar
Kadir Akbudak
View author publications
You can also search for this author in PubMed Google Scholar
Cevdet Aykanat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cevdet Aykanat .

Editor information

Editors and Affiliations

University Politehnica of Bucharest, Bucharest, Romania
Florin Pop
Cracow University of Technology, Cracow, Poland
Joanna Kołodziej
Second University of Naples, Naples, Caserta, Italy
Beniamino Di Martino

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Selvitopi, O., Akbudak, K., Aykanat, C. (2016). Parallelization of Sparse Matrix Kernels for Big Data Applications. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-44881-7_17
Published: 28 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44880-0
Online ISBN: 978-3-319-44881-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics