Skip to main content
Log in

Scaling sparse matrix-matrix multiplication in the accumulo database

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

We propose and implement a sparse matrix-matrix multiplication (SpGEMM) algorithm running on top of Accumulo’s iterator framework which enables high performance distributed parallelism. The proposed algorithm provides write-locality while ingesting the output matrix back to database via utilizing row-by-row parallel SpGEMM. The proposed solution also alleviates scanning of input matrices multiple times by making use of Accumulo’s batch scanning capability which is used for accessing multiple ranges of key-value pairs in parallel. Even though the use of batch-scanning introduces some latency overheads, these overheads are alleviated by the proposed solution and by using node-level parallelism structures. We also propose a matrix partitioning scheme which reduces the total communication volume and provides a balance of workload among servers. The results of extensive experiments performed on both real-world and synthetic sparse matrices show that the proposed algorithm scales significantly better than the outer-product parallel SpGEMM algorithm available in the Graphulo library. By applying the proposed matrix partitioning, the performance of the proposed algorithm is further improved considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://github.com/Accla/graphulo.

  2. https://github.com/pyamg/pyamg.

References

  1. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)

    Article  Google Scholar 

  2. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: ACM SIGOPS operating systems review, vol. 41, pp. 205–220. ACM (2007)

    Article  Google Scholar 

  3. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  4. Fuchs, A.: Accumulo-extensions to googles bigtable design, National Security Agency, Tech. Rep (2012)

  5. Apache hbase. https://hbase.apache.org/ (2018). Accessed 15 April 2018

  6. Sen, R., Farris, A., Guerra, P.: Benchmarking apache accumulo bigdata distributed table store using its continuous test suite. In: 2013 IEEE International Congress on Big Data (BigData Congress), pp. 334–341. IEEE (2013)

  7. Hutchison, D., Kepner, J., Gadepally, V., Howe, B.: From nosql accumulo to newsql graphulo: Design and utility of graph algorithms inside a bigtable database. In: 2016 IEEE on High Performance Extreme Computing Conference (HPEC), pp. 1–9. IEEE (2016)

  8. Grolinger, K., Higashino, W.A., Tiwari, A., Capretz, M.A.: Data management in cloud environments: Nosql and newsql data stores. J. Cloud Comput. 2(1), 22 (2013)

    Article  Google Scholar 

  9. Gadepally, V., Bolewski, J., Hook, D., Hutchison, D., Miller, B., Kepner, J.: Graphulo: Linear algebra graph kernels for nosql databases. In: 2015 IEEE International on Parallel and Distributed Processing Symposium Workshop (IPDPSW), pp. 822–830. IEEE (2015)

  10. Kepner, J., Bader, D., Buluç, A., Gilbert, J., Mattson, T., Meyerhenke, H.: Graphs, matrices, and the graphblas: seven good reasons. Procedia Comput. Sci. 51, 2453–2462 (2015)

    Article  Google Scholar 

  11. Weale, T., Gadepally, V., Hutchison, D., Kepner, J.: Benchmarking the graphulo processing framework. In: 2016 IEEE on High Performance Extreme Computing Conference (HPEC), pp. 1–5. IEEE (2016)

  12. Buluç, A., Gilbert, J.R.: Highly parallel sparse matrix-matrix multiplication, arXiv preprint arXiv:1006.2183 (2010)

  13. Kepner, J., Gilbert, J.: Graph algorithms in the language of linear algebra. SIAM, Philadelphia (2011)

    Book  Google Scholar 

  14. Hutchison, D., Kepner, J., Gadepally, V., Fuchs, A.: Graphulo implementation of server-side sparse matrix multiply in the accumulo database. In: 2015 IEEE on High Performance Extreme Computing Conference (HPEC), pp. 1–7. IEEE (2015)

  15. Akbudak, K., Selvitopi, O., Aykanat, C.: Partitioning models for scaling parallel sparse matrix-matrix multiplication. ACM Trans. Parallel Comput. (TOPC) 4(3), 13 (2018)

    Google Scholar 

  16. Bader, D., Madduri, K., Gilbert, J., Shah, V., Kepner, J., Meuse, T., Krishnamurthy, A.: Designing scalable synthetic compact applications for benchmarking high productivity computing systems. Cyberinfrastruct. Technol. Watch 2, 1–10 (2006)

    Google Scholar 

  17. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th symposium on Mass storage systems and technologies (MSST), pp. 1–10. IEEE (2010)

  18. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: Wait-free coordination for internet-scale systems. In: USENIX annual technical conference, vol. 8, p. 9 (2010)

  19. Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., Wang, Y.: Intel math kernel library. In: High-Performance Computing on the Intel\({\textregistered }\) Xeon Phi, pp. 167–188. Springer, New York (2014)

    Google Scholar 

  20. Patwary, M.M.A., Satish, N.R., Sundaram, N., Park, J., Anderson, M.J., Vadlamudi, S.G., Das, D., Pudov, S.G., Pirogov, V.O., Dubey, P.: Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In: International Conference on High Performance Computing, pp. 48–57. Springer, New York (2015)

    Google Scholar 

  21. Gremse, F., Hofter, A., Schwen, L.O., Kiessling, F., Naumann, U.: GPU-accelerated sparse matrix-matrix multiplication by iterative row merging. SIAM J. Sci. Comput. 37(1), C54–C71 (2015)

    Article  MathSciNet  Google Scholar 

  22. Akbudak, K., Aykanat, C.: Exploiting locality in sparse matrix-matrix multiplication on many-core architectures. IEEE Trans. Parallel Distrib. Syst. 28(8), 2258–2271 (2017)

    Article  Google Scholar 

  23. Heroux, M.A., Bartlett, R.A., Howle, V.E., Hoekstra, R.J., Hu, J.J., Kolda, T.G., Lehoucq, R.B., Long, K.R., Pawlowski, R.P., Phipps, E.T.: An overview of the trilinos project. ACM Trans. Math. Softw. (TOMS) 31(3), 397–423 (2005)

    Article  MathSciNet  Google Scholar 

  24. Buluç, A., Gilbert, J.R.: The combinatorial blas: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)

    Article  Google Scholar 

  25. Buluç, A., Gilbert, J.R.: Parallel sparse matrix-matrix multiplication and indexing: implementation and experiments. SIAM J. Sci. Comput. 34(4), C170–C191 (2012)

    Article  MathSciNet  Google Scholar 

  26. Akbudak, K., Aykanat, C.: Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication. SIAM J. Sci. Comput. 36(5), C568–C590 (2014)

    Article  MathSciNet  Google Scholar 

  27. Catalyurek, U., Aykanat, C.: A hypergraph-partitioning approach for coarse-grain decomposition. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, pp. 28–28. ACM (2001)

  28. Karypis, G.: Multilevel algorithms for multi-constraint hypergraph partitioning, tech. rep., MINNESOTA UNIV MINNEAPOLIS DEPT OF COMPUTER SCIENCE (1999)

  29. Karypis, G., Kumar, V.: Metis—unstructured graph partitioning and sparse matrix ordering system, version 2.0 (1995)

  30. Chevalier, C., Pellegrini, F.: Pt-scotch: a tool for efficient parallel graph ordering. Parallel Comput. 34(6–8), 318–331 (2008)

    Article  MathSciNet  Google Scholar 

  31. Bejeck, B.: Getting Started with Google Guava. Packt Publishing Ltd, Birmingham (2013)

    Google Scholar 

  32. Karypis, G., Kumar, V.: Multilevelk-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)

    Article  Google Scholar 

  33. Liu, W., Vinter, B.: An efficient GPU general sparse matrix-matrix multiplication for irregular data. In: 2014 IEEE 28th International on Parallel and Distributed Processing Symposium, pp. 370–381. IEEE (2014)

  34. McCourt, M., Smith, B., Zhang, H.: Sparse matrix-matrix products executed through coloring. SIAM J. Matrix Anal. Appl. 36(1), 90–109 (2015)

    Article  MathSciNet  Google Scholar 

  35. D’Alberto, P., Nicolau, A.: R-kleene: a high-performance divide-and-conquer algorithm for the all-pair shortest path for densely connected networks. Algorithmica 47(2), 203–213 (2007)

    Article  MathSciNet  Google Scholar 

  36. Ordonez, C.: Optimization of linear recursive queries in SQL. IEEE Trans. Knowl. Data Eng. 22(2), 264–277 (2010)

    Article  MathSciNet  Google Scholar 

  37. Ordonez, C., Zhang, Y., Cabrera, W.: The gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Trans. Knowl. Data Eng. 28(7), 1905–1918 (2016)

    Article  Google Scholar 

  38. Linden, G., Smith, B., York, J.: Amazon. com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7(1), 76–80 (2003)

    Article  Google Scholar 

  39. Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1 (2011)

    MathSciNet  MATH  Google Scholar 

  40. Bell, N., Dalton, S., Olson, L.N.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012)

    Article  MathSciNet  Google Scholar 

  41. Li, H., Li, K., Peng, J., Hu, J., Li, K.: An efficient parallelization approach for large-scale sparse non-negative matrix factorization using kullback-leibler divergence on multi-GPU. In: IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017, pp. 511–518. IEEE (2017)

  42. Li, H., Li, K., Peng, J., Li, K.: Cusnmf: A sparse non-negative matrix factorization approach for large-scale collaborative filtering recommender systems on multi-GPU. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), pp. 1144–1151. IEEE (2017)

  43. Kannan, R., Ballard, G., Park, H.: Mpi-faun: an MPI-based framework for alternating-updating nonnegative matrix factorization. IEEE Trans. Knowl. Data Eng. 30(3), 544–558 (2018)

    Article  Google Scholar 

  44. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pp. 556–562 (2001)

Download references

Acknowledgements

The numerical calculations reported in this paper were fully performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cevdet Aykanat.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is partially supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under project EEEAG-115E512.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Demirci, G.V., Aykanat, C. Scaling sparse matrix-matrix multiplication in the accumulo database. Distrib Parallel Databases 38, 31–62 (2020). https://doi.org/10.1007/s10619-019-07257-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-019-07257-y

Keywords

Navigation