skip to main content
10.1145/2145816.2145832acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Scalable GPU graph traversal

Published:25 February 2012Publication History

ABSTRACT

Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data-dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with non-trivial diameter.

We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations both CPU and GPU platforms.

References

  1. 10th DIMACS Implementation Challenge: http://www.cc.gatech.edu/dimacs10/index.shtml. Accessed: 2011-07-11.Google ScholarGoogle Scholar
  2. 9th DIMACS Implementation Challenge: http://www.dis.uniroma1.it/~challenge9/download.shtml. Accessed: 2011-07-11.Google ScholarGoogle Scholar
  3. Agarwal, V. et al. 2010. Scalable Graph Exploration on Multicore Processors. 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (New Orleans, LA, USA, Nov. 2010), 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bader, D.A. and Madduri, K. Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2. 2006 International Conference on Parallel Processing (ICPP'06) (Columbus, OH, USA), 523--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bader, D.A. et al. On the Architectural Requirements for Efficient Execution of Graph Algorithms. 2005 International Conference on Parallel Processing (ICPP'05) (Oslo, Norway), 547--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bell, N. and Garland, M. 2009. Implementing sparse matrix-vector multiplication on throughput-oriented processors. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (New York, NY, USA, 2009), 18:1--18:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Blelloch, G.E. 1990. Prefix Sums and Their Applications. Synthesis of Parallel Algorithms.Google ScholarGoogle Scholar
  8. Blelloch, G.E. 1989. Scans as primitive parallel operations. IEEE Transactions on Computers. 38, 11 (Nov. 1989), 1526--1538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chatterjee, S. et al. 1990. Scan primitives for vector computers. Proceedings of the 1990 ACM/IEEE conference on Supercomputing (Los Alamitos, CA, USA, 1990), 666--675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Che, S. et al. 2009. Rodinia: A benchmark suite for heterogeneous computing. 2009 IEEE International Symposium on Workload Characterization (IISWC) (Austin, TX, USA, Oct. 2009), 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cormen, T.H. et al. 2001. Introduction to Algorithms. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Deng, Y. (Steve) et al. 2009. Taming irregular EDA applications on GPUs. Proceedings of the 2009 International Conference on Computer-Aided Design (New York, NY, USA, 2009), 539--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dotsenko, Y. et al. 2008. Fast scan algorithms on graphics processors. Proceedings of the 22nd annual international conference on Supercomputing (New York, NY, USA, 2008), 205--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Garland, M. 2008. Sparse matrix computations on manycore GPU's. Proceedings of the 45th annual Design Automation Conference (New York, NY, USA, 2008), 2--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. GTgraph: A suite of synthetic random graph generators: https://sdm.lbl.gov/~kamesh/software/GTgraph/. Accessed: 2011-07-11.Google ScholarGoogle Scholar
  16. Harish, P. and Narayanan, P.J. 2007. Accelerating large graph algorithms on the GPU using CUDA. Proceedings of the 14th international conference on High performance computing (Berlin, Heidelberg, 2007), 197--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hillis, W.D. and Steele, G.L. 1986. Data parallel algorithms. Communications of the ACM. 29, 12 (Dec. 1986), 1170--1183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hong, S. et al. 2011. Accelerating CUDA graph algorithms at maximum warp. Proceedings of the 16th ACM symposium on Principles and practice of parallel programming (New York, NY, USA, 2011), 267--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hong, S. et al. 2011. Efficient Parallel Graph Exploration for Multi-Core CPU and GPU. (New York, NY, USA, 2011), to appear.Google ScholarGoogle Scholar
  20. Hussein, M. et al. 2007. On Implementing Graph Cuts on CUDA. First Workshop on General Purpose Processing on Graphics Processing Units (Boston, MA, Oct. 2007).Google ScholarGoogle Scholar
  21. Leiserson, C.E. and Schardl, T.B. 2010. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures (New York, NY, USA, 2010), 303--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Luo, L. et al. 2010. An effective GPU implementation of breadth-first search. Proceedings of the 47th Design Automation Conference (New York, NY, USA, 2010), 52--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Merrill, D. and Grimshaw, A. 2011. High Performance and Scalable Radix Sorting: A case study of implementing dynamic parallelism for GPU computing. Parallel Processing Letters. 21, 02 (2011), 245--272.Google ScholarGoogle ScholarCross RefCross Ref
  24. Merrill, D. and Grimshaw, A. 2009. Parallel Scan for Stream Architectures. Technical Report #CS2009--14. Department of Computer Science, University of Virginia.Google ScholarGoogle Scholar
  25. Merrill, D. et al. 2011. High Performance and Scalable GPU Graph Traversal. Technical Report #CS2011-05. Department of Computer Science, University of Virginia.Google ScholarGoogle Scholar
  26. Parboil Benchmark suite: http://impact.crhc.illinois.edu/parboil.php. Accessed: 2011-07-11.Google ScholarGoogle Scholar
  27. Scarpazza, D.P. et al. 2008. Efficient Breadth-First Search on the Cell/BE Processor. IEEE Transactions on Parallel and Distributed Systems. 19, 10 (Oct. 2008), 1381--1395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sengupta, S. et al. 2008. Efficient parallel scan algorithms for GPUs. Technical Report #NVR-2008-003. NVIDIA.Google ScholarGoogle Scholar
  29. The Graph 500 List: http://www.graph500.org/. Accessed: 2011-07-11.Google ScholarGoogle Scholar
  30. Ullman, J. and Yannakakis, M. 1990. High-probability parallel transitive closure algorithms. Proceedings of the second annual ACM symposium on Parallel algorithms and architectures - SPAA '90 (Island of Crete, Greece, 1990), 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. University of Florida Sparse Matrix Collection: http://www.cise.ufl.edu/research/sparse/matrices/. Accessed: 2011-07-11.Google ScholarGoogle Scholar
  32. Xia, Y. and Prasanna, V.K. 2009. Topologically Adaptive Parallel Breadth-first Search on Multicore Processors. 21st International Conference on Parallel and Distributed Computing and Systems (PDCS'09) (Nov. 2009).Google ScholarGoogle Scholar
  33. Yoo, A. et al. A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L. ACM/IEEE SC 2005 Conference (SC'05) (Seattle, WA, USA), 25--25. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable GPU graph traversal

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
          February 2012
          352 pages
          ISBN:9781450311601
          DOI:10.1145/2145816
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 47, Issue 8
            PPOPP '12
            August 2012
            334 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2370036
            Issue’s Table of Contents

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 February 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate230of1,014submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader