Skip to main content

Dynamic Sparse-Matrix Allocation on GPUs

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9697))

Included in the following conference series:

Abstract

Sparse matrices are a core component in many numerical simulations, and their efficiency is essential to achieving high performance. Dynamic sparse-matrix allocation (insertion) can benefit a number of problems such as sparse-matrix factorization, sparse-matrix-matrix addition, static analysis (e.g., points-to analysis), computing transitive closure, and other graph algorithms. Existing sparse-matrix formats are poorly designed to handle dynamic updates. The compressed sparse-row (CSR) format is fully compact and must be rebuilt after each new entry. Ellpack (ELL) stores a constant number of entries per row, which allows for efficient insertion and sparse matrix-vector multiplication (SpMV) but is memory inefficient and strictly limits row size. The coordinate (COO) format stores a list of entries and is efficient for both memory use and insertion time; however, it is much less efficient at SpMV. Hybrid ellpack (HYB) compromises by using a combination of ELL and COO but degrades in performance as the COO portion fills up. Rows that use the COO portion require it to be completely traversed during every SpMV operation.

In this paper we introduce a new sparse matrix format, dynamic compressed sparse row (DCSR), that permits efficient dynamic updates. These updates are significantly faster than those made to a HYB matrix while maintaining SpMV times comparable to CSR. We demonstrate the efficacy of our dynamic allocation scheme, evaluating updates and SpMV operations on adjacency matrices of sparse-graph benchmarks on the GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Im, E., Yelick, K.: Optimization of sparse matrix kernels for data mining. In: First SIAM Conference on Data Mining (2000)

    Google Scholar 

  2. Gilbert, J., Reinhardt, S., Shah, V.: High-performance graph algorithms from parallel sparse matrices. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) Applied Parallel Computing. State of the Art in Scientific Computing. LNCS, vol. 4699, pp. 260–269. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2003). Saad:2003:IMS

    Book  MATH  Google Scholar 

  4. Shivers, O.: Control-Flow Analysis of Higher-Order Languages. Carnegie-Mellon University, Pittsburgh (1991)

    Google Scholar 

  5. Midtgaard, J.: Control-flow analysis of functional programs. ACM Comput. Surv. 44(3), 10:1–10:33 (2012)

    Article  MATH  Google Scholar 

  6. Gilray, T., King, J., Might, M.: Partitioning 0-CFA for the GPU. In: Workshop on Functional and Constraint Logic Programming, September 2014

    Google Scholar 

  7. Prabhu, T., Ramalingam, S., Might, M., Hall, M.: EigenCFA: accelerating flow analysis with GPUs. In: Proceedings of the Symposium on the Principals of Programming Languages, pp. 511–522 (2010)

    Google Scholar 

  8. Mendez-Lojo, M., Burtscher, M., Pingali, K.: A GPU implementation of inclusion-based points-to analysis. ACM SIGPLAN Not. 47(8), 107–116 (2012)

    Article  Google Scholar 

  9. Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 769–780. IEEE Press, Piscataway (2014)

    Google Scholar 

  10. Garland, M.: Sparse matrix computations on manycore GPU’s. In: Proceedings of the 45th Annual Design Automation Conference, DAC 2008, pp. 2–6. ACM, New York (2008)

    Google Scholar 

  11. Garland, M., Kirk, D.B.: Understanding throughput-oriented architectures. Commun. ACM 53(11), 58–66 (2010)

    Article  Google Scholar 

  12. Bell, N., Garland, M.: Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Corporation (2008). NVR-2008-004

    Google Scholar 

  13. Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 111–125. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. Su, B.Y., Keutzer, K.: clSpMV: a cross-platform OpenCL SpMV framework on GPUs. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS 2012, pp. 353–364. ACM, New York (2012)

    Google Scholar 

  15. Vuduc, R.W.: Automatic Performance Tuning of Sparse Matrix Kernels. University of California, Berkeley (2003). AAI3121741

    Google Scholar 

  16. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication onthroughput-oriented processors. In: SC 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–11. ACM, New York (2009)

    Google Scholar 

  17. Ashari, A., Sedaghati, N., Eisenlohr, J., Sadayappan, P.: An efficient two-dimensional blocking strategy for sparsematrix-vector multiplication on GPUs. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS 2014, pp. 273–282. ACM, New York (2014)

    Google Scholar 

  18. Yan, S., Li, C., Zhang, Y., Zhou, H.: yaSpMV: yet another SpMV framework on GPUs. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPopp 2014, pp. 107–118. ACM, New York (2014)

    Google Scholar 

  19. Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC 2007, pp. 38:1–38:12. ACM, New York (2007)

    Google Scholar 

  20. Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-coreprocessors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS 2013, pp. 273–282. ACM, New York (2013)

    Google Scholar 

  21. Yang, X., Parthasarathy, S., Sadayappan, P.: Fast sparse matrix-vector multiplication on GPUs: implications for graph mining. Proc. VLDB Endow. 4(4), 231–242 (2011)

    Article  Google Scholar 

  22. Choi, J.W., Singh, A., Vuduc, R.W.: Model-driven autotuning of sparse matrix-vector multiply on GPUs. SIGPLAN Not. 45(5), 115–126 (2010)

    Article  Google Scholar 

  23. Reguly, I., Giles, M.: Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Innovative Parallel Computing (InPar), pp. 1–12 (2012)

    Google Scholar 

  24. Kepner, J., Gilbert, J.: Graph Algorithms in the Language of Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia (2011)

    Book  MATH  Google Scholar 

  25. Ashari, A., Sedaghati, N., Eisenlohr, J., Parthasarathy, S., Sadayappan, P.: Fast sparse matrix-vector multiplication on GPUs for graph applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 781–792. IEEE Press, Piscataway (2014)

    Google Scholar 

  26. Bell, N., Garland, M.: CUSP: generic parallel algorithms for sparse matrix and graph computations (2012). Version 0.3.0

    Google Scholar 

  27. Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1–25 (2011)

    MathSciNet  Google Scholar 

  28. Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.R.: A unified sparse matrix data format for modern processors with wide SIMD units. CoRR. abs/1307.6209 (2013)

    Google Scholar 

  29. McLaughlin, A., Bader, D.A.: Revisiting edge and node parallelism for dynamic GPU graph analytics. In: IEEE International on Parallel Distributed Processing Symposium Workshops (IPDPSW), pp. 1396–1406 (2014)

    Google Scholar 

  30. Bender, M.A., Farach-Colton, M., Mosteiro, M.A.: Insertion sort is O(n log n). Theor. Comput. Syst. 39(3), 391–397 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James King .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

King, J., Gilray, T., Kirby, R.M., Might, M. (2016). Dynamic Sparse-Matrix Allocation on GPUs. In: Kunkel, J., Balaji, P., Dongarra, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9697. Springer, Cham. https://doi.org/10.1007/978-3-319-41321-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41321-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41320-4

  • Online ISBN: 978-3-319-41321-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics