Scaling sparse matrix-matrix multiplication in the accumulo database

Demirci, Gunduz Vehbi; Aykanat, Cevdet

doi:10.1007/s10619-019-07257-y

Scaling sparse matrix-matrix multiplication in the accumulo database

Published: 28 January 2019

Volume 38, pages 31–62, (2020)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

542 Accesses
5 Citations
Explore all metrics

Abstract

We propose and implement a sparse matrix-matrix multiplication (SpGEMM) algorithm running on top of Accumulo’s iterator framework which enables high performance distributed parallelism. The proposed algorithm provides write-locality while ingesting the output matrix back to database via utilizing row-by-row parallel SpGEMM. The proposed solution also alleviates scanning of input matrices multiple times by making use of Accumulo’s batch scanning capability which is used for accessing multiple ranges of key-value pairs in parallel. Even though the use of batch-scanning introduces some latency overheads, these overheads are alleviated by the proposed solution and by using node-level parallelism structures. We also propose a matrix partitioning scheme which reduces the total communication volume and provides a balance of workload among servers. The results of extensive experiments performed on both real-world and synthetic sparse matrices show that the proposed algorithm scales significantly better than the outer-product parallel SpGEMM algorithm available in the Graphulo library. By applying the proposed matrix partitioning, the performance of the proposed algorithm is further improved considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight

Article 07 November 2022

Xiaoyan Liu, Yi Liu, … Depei Qian

Parallelization of Sparse Matrix Kernels for Big Data Applications

Addressing Volume and Latency Overheads in 1D-parallel Sparse Matrix-Vector Multiplication

Notes

References

Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)
Article Google Scholar
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: ACM SIGOPS operating systems review, vol. 41, pp. 205–220. ACM (2007)
Article Google Scholar
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Article Google Scholar
Fuchs, A.: Accumulo-extensions to googles bigtable design, National Security Agency, Tech. Rep (2012)
Apache hbase. https://hbase.apache.org/ (2018). Accessed 15 April 2018
Sen, R., Farris, A., Guerra, P.: Benchmarking apache accumulo bigdata distributed table store using its continuous test suite. In: 2013 IEEE International Congress on Big Data (BigData Congress), pp. 334–341. IEEE (2013)
Hutchison, D., Kepner, J., Gadepally, V., Howe, B.: From nosql accumulo to newsql graphulo: Design and utility of graph algorithms inside a bigtable database. In: 2016 IEEE on High Performance Extreme Computing Conference (HPEC), pp. 1–9. IEEE (2016)
Grolinger, K., Higashino, W.A., Tiwari, A., Capretz, M.A.: Data management in cloud environments: Nosql and newsql data stores. J. Cloud Comput. 2(1), 22 (2013)
Article Google Scholar
Gadepally, V., Bolewski, J., Hook, D., Hutchison, D., Miller, B., Kepner, J.: Graphulo: Linear algebra graph kernels for nosql databases. In: 2015 IEEE International on Parallel and Distributed Processing Symposium Workshop (IPDPSW), pp. 822–830. IEEE (2015)
Kepner, J., Bader, D., Buluç, A., Gilbert, J., Mattson, T., Meyerhenke, H.: Graphs, matrices, and the graphblas: seven good reasons. Procedia Comput. Sci. 51, 2453–2462 (2015)
Article Google Scholar
Weale, T., Gadepally, V., Hutchison, D., Kepner, J.: Benchmarking the graphulo processing framework. In: 2016 IEEE on High Performance Extreme Computing Conference (HPEC), pp. 1–5. IEEE (2016)
Buluç, A., Gilbert, J.R.: Highly parallel sparse matrix-matrix multiplication, arXiv preprint arXiv:1006.2183 (2010)
Kepner, J., Gilbert, J.: Graph algorithms in the language of linear algebra. SIAM, Philadelphia (2011)
Book Google Scholar
Hutchison, D., Kepner, J., Gadepally, V., Fuchs, A.: Graphulo implementation of server-side sparse matrix multiply in the accumulo database. In: 2015 IEEE on High Performance Extreme Computing Conference (HPEC), pp. 1–7. IEEE (2015)
Akbudak, K., Selvitopi, O., Aykanat, C.: Partitioning models for scaling parallel sparse matrix-matrix multiplication. ACM Trans. Parallel Comput. (TOPC) 4(3), 13 (2018)
Google Scholar
Bader, D., Madduri, K., Gilbert, J., Shah, V., Kepner, J., Meuse, T., Krishnamurthy, A.: Designing scalable synthetic compact applications for benchmarking high productivity computing systems. Cyberinfrastruct. Technol. Watch 2, 1–10 (2006)
Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th symposium on Mass storage systems and technologies (MSST), pp. 1–10. IEEE (2010)
Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX annual technical conference, vol. 8, p. 9 (2010)
Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., Wang, Y.: Intel math kernel library. In: High-Performance Computing on the Intel\({\textregistered }\) Xeon Phi, pp. 167–188. Springer, New York (2014)
Google Scholar
Patwary, M.M.A., Satish, N.R., Sundaram, N., Park, J., Anderson, M.J., Vadlamudi, S.G., Das, D., Pudov, S.G., Pirogov, V.O., Dubey, P.: Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In: International Conference on High Performance Computing, pp. 48–57. Springer, New York (2015)
Google Scholar
Gremse, F., Hofter, A., Schwen, L.O., Kiessling, F., Naumann, U.: GPU-accelerated sparse matrix-matrix multiplication by iterative row merging. SIAM J. Sci. Comput. 37(1), C54–C71 (2015)
Article MathSciNet Google Scholar
Akbudak, K., Aykanat, C.: Exploiting locality in sparse matrix-matrix multiplication on many-core architectures. IEEE Trans. Parallel Distrib. Syst. 28(8), 2258–2271 (2017)
Article Google Scholar
Heroux, M.A., Bartlett, R.A., Howle, V.E., Hoekstra, R.J., Hu, J.J., Kolda, T.G., Lehoucq, R.B., Long, K.R., Pawlowski, R.P., Phipps, E.T.: An overview of the trilinos project. ACM Trans. Math. Softw. (TOMS) 31(3), 397–423 (2005)
Article MathSciNet Google Scholar
Buluç, A., Gilbert, J.R.: The combinatorial blas: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)
Article Google Scholar
Buluç, A., Gilbert, J.R.: Parallel sparse matrix-matrix multiplication and indexing: implementation and experiments. SIAM J. Sci. Comput. 34(4), C170–C191 (2012)
Article MathSciNet Google Scholar
Akbudak, K., Aykanat, C.: Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication. SIAM J. Sci. Comput. 36(5), C568–C590 (2014)
Article MathSciNet Google Scholar
Catalyurek, U., Aykanat, C.: A hypergraph-partitioning approach for coarse-grain decomposition. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, pp. 28–28. ACM (2001)
Karypis, G.: Multilevel algorithms for multi-constraint hypergraph partitioning, tech. rep., MINNESOTA UNIV MINNEAPOLIS DEPT OF COMPUTER SCIENCE (1999)
Karypis, G., Kumar, V.: Metis—unstructured graph partitioning and sparse matrix ordering system, version 2.0 (1995)
Chevalier, C., Pellegrini, F.: Pt-scotch: a tool for efficient parallel graph ordering. Parallel Comput. 34(6–8), 318–331 (2008)
Article MathSciNet Google Scholar
Bejeck, B.: Getting Started with Google Guava. Packt Publishing Ltd, Birmingham (2013)
Google Scholar
Karypis, G., Kumar, V.: Multilevelk-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)
Article Google Scholar
Liu, W., Vinter, B.: An efficient GPU general sparse matrix-matrix multiplication for irregular data. In: 2014 IEEE 28th International on Parallel and Distributed Processing Symposium, pp. 370–381. IEEE (2014)
McCourt, M., Smith, B., Zhang, H.: Sparse matrix-matrix products executed through coloring. SIAM J. Matrix Anal. Appl. 36(1), 90–109 (2015)
Article MathSciNet Google Scholar
D’Alberto, P., Nicolau, A.: R-kleene: a high-performance divide-and-conquer algorithm for the all-pair shortest path for densely connected networks. Algorithmica 47(2), 203–213 (2007)
Article MathSciNet Google Scholar
Ordonez, C.: Optimization of linear recursive queries in SQL. IEEE Trans. Knowl. Data Eng. 22(2), 264–277 (2010)
Article MathSciNet Google Scholar
Ordonez, C., Zhang, Y., Cabrera, W.: The gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Trans. Knowl. Data Eng. 28(7), 1905–1918 (2016)
Article Google Scholar
Linden, G., Smith, B., York, J.: Amazon. com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7(1), 76–80 (2003)
Article Google Scholar
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1 (2011)
MathSciNet MATH Google Scholar
Bell, N., Dalton, S., Olson, L.N.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012)
Article MathSciNet Google Scholar
Li, H., Li, K., Peng, J., Hu, J., Li, K.: An efficient parallelization approach for large-scale sparse non-negative matrix factorization using kullback-leibler divergence on multi-GPU. In: IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017, pp. 511–518. IEEE (2017)
Li, H., Li, K., Peng, J., Li, K.: Cusnmf: A sparse non-negative matrix factorization approach for large-scale collaborative filtering recommender systems on multi-GPU. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), pp. 1144–1151. IEEE (2017)
Kannan, R., Ballard, G., Park, H.: Mpi-faun: an MPI-based framework for alternating-updating nonnegative matrix factorization. IEEE Trans. Knowl. Data Eng. 30(3), 544–558 (2018)
Article Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pp. 556–562 (2001)

Download references

Acknowledgements

The numerical calculations reported in this paper were fully performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).

Author information

Authors and Affiliations

Department of Computer Engineering, Bilkent University, Ankara, Turkey
Gunduz Vehbi Demirci & Cevdet Aykanat

Authors

Gunduz Vehbi Demirci
View author publications
You can also search for this author in PubMed Google Scholar
Cevdet Aykanat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cevdet Aykanat.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is partially supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under project EEEAG-115E512.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demirci, G.V., Aykanat, C. Scaling sparse matrix-matrix multiplication in the accumulo database. Distrib Parallel Databases 38, 31–62 (2020). https://doi.org/10.1007/s10619-019-07257-y

Download citation

Published: 28 January 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10619-019-07257-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scaling sparse matrix-matrix multiplication in the accumulo database

Abstract

Access this article

Similar content being viewed by others

swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight

Parallelization of Sparse Matrix Kernels for Big Data Applications

Addressing Volume and Latency Overheads in 1D-parallel Sparse Matrix-Vector Multiplication

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scaling sparse matrix-matrix multiplication in the accumulo database

Abstract

Access this article

Similar content being viewed by others

swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight

Parallelization of Sparse Matrix Kernels for Big Data Applications

Addressing Volume and Latency Overheads in 1D-parallel Sparse Matrix-Vector Multiplication

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation