Skip to main content

Load-Balanced Breadth-First Search on GPUs

  • Conference paper
Book cover Web-Age Information Management (WAIM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8485))

Included in the following conference series:

Abstract

Breadth-first search (BFS) is widely used in web link and social network analysis as well as other fields. The Graphics Processing Unit (GPU) has been demonstrated to have great potential in accelerating graph algorithms through parallel processing. However, BFS is difficult to parallelize efficiently due to the irregular workload distribution, leading to load imbalance between threads. Previous work has proposed several strategies to alleviate the load imbalance but none of them solves this issue in general.

This paper presents a new GPU BFS algorithm that focuses on full load balance. Each BFS iteration is decoupled into two phases: work redistribution and neighbor gathering. Work redistribution phase reorganizes the irregular workloads in order for the neighbor gathering phase to visit the vertices in a load-balanced way. The evaluation results show that the proposed approach achieves speedups of up to 39x and 1.42x over CPU sequential implementation and state-of-the-art GPU implementation respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 10th dimacs implementation challenge, http://www.cc.gatech.edu/dimacs10/index.shtml

  2. The graph 500 list, http://www.graph500.org/

  3. Nvidia cuda, http://www.nvidia.com/cuda/

  4. University of florida sparse matrix collection, http://www.cise.ufl.edu/research/sparse/matrices/

  5. Bader, D.A., Madduri, K.: Gtgraph: A synthetic graph generator suite, Atlanta, GA (February 2006)

    Google Scholar 

  6. Deo, N., Sarkar, D.: Parallel algorithms for merging and sorting. Information Sciences 56(1), 151–161 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  7. Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the GPU using CUDA. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 197–208. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Hong, S., Kim, S.K., Oguntebi, T., Olukotun, K.: Accelerating cuda graph algorithms at maximum warp. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, pp. 267–276. ACM (2011)

    Google Scholar 

  9. Leiserson, C.E., Rivest, R.L., Stein, C., Cormen, T.H.: Introduction to algorithms. The MIT Press (2009)

    Google Scholar 

  10. Luo, L., Wong, M., Hwu, W.M.: An effective gpu implementation of breadth-first search. In: Proceedings of the 47th Design Automation Conference, pp. 52–55. ACM (2010)

    Google Scholar 

  11. Merrill, D., Garland, M., Grimshaw, A.: Scalable gpu graph traversal. In: ACM SIGPLAN Notices, vol. 17, pp. 117–128. ACM (2012)

    Google Scholar 

  12. Nasre, R., Burtscher, M., Pingali, K.: Data-driven versus topology-driven irregular computations on gpus. In: 2013 IEEE 27th International Symposium onParallel & Distributed Processing (IPDPS), pp. 463–474. IEEE (2013)

    Google Scholar 

  13. Nguyen, H.: Gpu gems 3. Addison-Wesley Professional (2007)

    Google Scholar 

  14. Odeh, S., Green, O., Mwassi, Z., Shmueli, O., Birk, Y.: Merge path-parallel merging made simple. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1611–1618. IEEE (2012)

    Google Scholar 

  15. Shiloach, Y., Vishkin, U.: Finding the maximum, merging, and sorting in a parallel computation model. Journal of Algorithms 2(1), 88–102 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  16. Zhong, J., He, B.: Medusa: Simplified graph processing on gpus. IEEE Transactions on Parallel and Distributed Systems 99, 1 (2013) (PrePrints)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhu, Z., Li, J., Li, G. (2014). Load-Balanced Breadth-First Search on GPUs. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08010-9_46

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08009-3

  • Online ISBN: 978-3-319-08010-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics