Skip to main content
Log in

Programming GPGPU Graph Applications with Linear Algebra Building Blocks

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Graph applications are common in scientific and enterprise computing. Recent research used graphics processing units (GPUs) to accelerate graph workloads. These applications tend to present characteristics that are challenging for SIMD execution. To achieve high performance, prior work studied individual graph problems, and designed device-specific algorithms and optimizations to achieve high performance. However, programmers have to expend significant manual effort, packing data and computation to make such solutions GPU-friendly. This usually is too complex for regular programmers, and the resultant implementations may not be portable and perform well across platforms. To address these concerns, we propose and implement a library of software building blocks with application examples, BelRed which allows programmers to build graph applications with ease. BelRed currently is built on top of the OpenCL™ framework and optimized for GPUs. It consists of fundamental linear-algebra building blocks necessary for graph processing. Developers can program graph algorithms with a set of key primitives. This paper introduces the API and presents several case studies on how to use the library for a variety of representative graph problems. We evaluate application performance on an AMD GPU and investigate optimization techniques to improve performance. We show that this framework is useful to provide satisfactory GPU acceleration of various graph applications and help reduce programming efforts significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Burtscher, M., Nasre, R., Pingali, K.: A quantitative study of irregular programs on GPUs. In: Proceedings of the 2012 IEEE International Symposium on Workload Characterization, pp. 141–151 (2012)

  2. Che, S., Beckmann, B., Reinhardt, S., Skadron, K.: Pannotia: understanding irregular GPGPU graph algorithms. In: Proceedings of the IEEE International Symposium on Workload Characterization (2013)

  3. Buluc, A., Gilbert, J.R.: The combinatorial blas: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)

    Article  Google Scholar 

  4. Kepner, J., Gilbert, J.: Graph Algorithms in the Language of Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA (2011)

    Book  MATH  Google Scholar 

  5. Mattson, T., Bader, D.A., Berry, J.W., Bulu, A., Dongarra, J., Faloutsos, C., Feo, J., Gilbert, J.R., Gonzalez, J., Hendrickson, B., Kepner, J., Leiserson, C.E., Lumsdaine, A., Padua, D.A., Poole, S., Reinhardt, S., Stonebraker, M., Wallach, S., Yoo, A.: Standards for graph algorithm primitives. In: Proceedings of IEEE High Performance Extreme Computing Conference (2013)

  6. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: GraphLab: a new parallel framework for machine learning. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2010)

  7. Graphics Core Next (GCN). Web resource. http://www.amd.com/us/products/technologies/gcn/Pages/gcn-architecture.aspx

  8. AMD Accelerated Parallel Processing: OpenCL Programming Guide. Web resource. http://developer.amd.com/resources/heterogeneous-computing/opencl-zone/

  9. OpenCL. Web Resource. http://www.khronos.org/opencl/

  10. Burtscher, M., Pingali, K.: An efficient cuda implementation of the tree-based Barnes Hut n-body algorithm. In: Wen-mei, W.H. (ed.) GPU Computing Gems Emerald Edition, pp. 75–92. Morgan Kaufmann, San Francisco, CA (2011)

  11. Harish, P., Narayanan, P.: Accelerating large graph algorithms on the GPU using CUDA. In: Proceedings of 2007 International Conference on High Performance Computing (2007)

  12. Merrill, D.G., Garland, M., Grimshaw, A.S.: Scalable GPU graph traversal. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2012)

  13. Vineet, V., Harish, P., Patidar, S., Narayanan,P.J.: Fast minimum spanning tree for large graphs on the GPU. In: Proceedings of the Conference on High Performance Graphics (2009)

  14. The 10th DIMACS Implementation Challenge Graph Partitioning and Graph Clustering. Web resource. http://www.cc.gatech.edu/dimacs10/

  15. The 9th DIMACS Implementation Challenge Shortest Paths. Web resource. http://www.dis.uniroma1.it/challenge9/

  16. METIS File Format. Web Resource. http://people.sc.fsu.edu/~jburkardt/data/metis_graph/metis_graph.html

  17. Matrix Market Format. Web Resouce. http://math.nist.gov/MatrixMarket/formats.html

  18. The University of Florida Sparse Matrix Collection. Web Resource. http://www.cise.ufl.edu/research/sparse/matrices/

  19. GTGraph: A Suite of Synthetic Random Graph Generators. Web Resource. http://www.cse.psu.edu/~madduri/software/GTgraph/index.html

  20. Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation (2008)

  21. Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on gpus using the CSR storage format. In: Proceedings of the ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (2014)

  22. Su, B., Keutzer, K.: clSpMV: a cross-platform OpenCL SpMV framework on GPUs. In: Proceedings of the International Conference on Supercomputing (2012)

  23. Yang, C., Wang, Y., Owens, J.D.: Fast sparse matrix and sparse vector multiplication algorithm on the gpu. In: Proceedings of Graph Algorithms Building Blocks (2015)

  24. Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Proceedings of Graphics Hardware (2007)

  25. Bolt C++ Template Library. Advanced Micro Devices. https://github.com/HSA-Libraries/Bolt

  26. The Thrust library. Web Resource. http://code.google.com/p/thrust/

  27. Malewicz, G., Austern, M.H., Bik, A.J.C, Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (2010)

  28. Fineman, J.T., Robinson, E.: Fundamental graph algorithms. In: Kepner, J., Gilbert, J. (eds.) Graph Algorithms in the Language of Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA (2011)

  29. Davidson, A., Baxter, S., Garland, M., Owens, J.D.: Work-efficient parallel gpu methods for single-source shortest paths. In: Proceedings of the International Parallel and Distributed Processing Symposium (2014)

  30. Cohen, J., Castonguay, P.: Efficient Graph Matching and Coloring on the Gpu. http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0332-GTC2012-Graph-Coloring-GPU.pdf

  31. Luby, M.: A simple parallel algorithm for the maximal independent set problem. In: Proceedings of the 17th Symposium on Theory of Computing (1985)

  32. Buluc, A., Duriakova, E., Fox, A., Gilbert, J., Kamil, S., Lugowski, A., Oliker, L., Williams, S.: Parallel processing of filtered queries in attributed semantic graphs. In: Proceedings of the International Parallel and Distributed Processing Symposium (2013)

  33. Maximal Independent Set. Presentation Slides. http://acts.nersc.gov/events/para06/Shah.pdf

  34. Buluc, A., Gilbert, J.R., Budak, C.: Solving path problems on the gpu. Parallel Comput. 36(5–6), 241–253 (2010)

    Article  MATH  Google Scholar 

  35. Heterogeneous System Architecture (HSA). Web resource. http://hsafoundation.com/

  36. Jia, W., Shaw, K.A., Martonosi, M.: Starchart: hardware and software optimization using recursive partitioning regression trees. In: Proceedings of the International Conference on Parallel Architectures and Compilation (2013)

  37. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S-H., Skadron K.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the IEEE International Symposium on Workload Characterization (2009)

  38. Parboil Benchmark suite. Web Resource. http://impact.crhc.illinois.edu/parboil.php

  39. Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V. Vetter, J.S.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of Third Workshop on General-Purpose Computation on Graphics Processing Units (2010)

  40. Oliveira, V.M.A., Lotufo, R.A.: A study on connected components labeling algorithms using GPUs. In: Proceedings of the 23rd SIBGRAPI Conference on Graphics, Patterns and Images (2010)

  41. Daga, M., Nutter, M.: Exploiting coarse-grained parallelism in B+ tree searches on an APU. In: SC Companion, pp. 240–247 (2012)

  42. The Parallel Boost Graph Library. Web Resource. http://osl.iu.edu/research/pbgl/

  43. SNAP: Small-world Network Analysis and Partitioning. Web Resource. http://snap-graph.sourceforge.net/

  44. MultiThreaded Graph Library. Web Resource. https://software.sandia.gov/trac/mtgl

  45. Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (2012)

  46. Liu, W., Vinter, B.: An efficient gpu general sparse matrix–matrix multiplication for irregular data. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium (2014)

  47. Azad, A., Bulu, A., Gilbert, J.R.: Parallel triangle counting and enumeration using matrix algebra. In: Proceedings of the IPDPSW, Workshop on Graph Algorithm Building Blocks (2015)

  48. Graph Analytics in GraphBLAS. Web resource. http://www.mit.edu/~kepner/Graphulo/150301-GraphuloInGraphBLAS.pptx

Download references

Acknowledgments

We thank the anonymous reviewers for their helpful feedback. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuai Che.

Additional information

BelRed is famous road across Bellevue and Redmond, WA USA. This manuscript is an extension to the 6-page paper, “BelRed: Constructing GPGPU Graph Applications with Software Building Blocks”, in the 2014 IEEE High Performance Extreme Computing Conference.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Che, S., Beckmann, B.M. & Reinhardt, S.K. Programming GPGPU Graph Applications with Linear Algebra Building Blocks. Int J Parallel Prog 45, 657–679 (2017). https://doi.org/10.1007/s10766-016-0448-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0448-z

Keywords

Navigation