Load-Balanced Breadth-First Search on GPUs

Zhu, Zhe; Li, Jianjun; Li, Guohui

doi:10.1007/978-3-319-08010-9_46

Zhe Zhu²⁰,
Jianjun Li²⁰ &
Guohui Li²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8485))

Included in the following conference series:

International Conference on Web-Age Information Management

5812 Accesses
2 Citations

Abstract

Breadth-first search (BFS) is widely used in web link and social network analysis as well as other fields. The Graphics Processing Unit (GPU) has been demonstrated to have great potential in accelerating graph algorithms through parallel processing. However, BFS is difficult to parallelize efficiently due to the irregular workload distribution, leading to load imbalance between threads. Previous work has proposed several strategies to alleviate the load imbalance but none of them solves this issue in general.

This paper presents a new GPU BFS algorithm that focuses on full load balance. Each BFS iteration is decoupled into two phases: work redistribution and neighbor gathering. Work redistribution phase reorganizes the irregular workloads in order for the neighbor gathering phase to visit the vertices in a load-balanced way. The evaluation results show that the proposed approach achieves speedups of up to 39x and 1.42x over CPU sequential implementation and state-of-the-art GPU implementation respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

10th dimacs implementation challenge, http://www.cc.gatech.edu/dimacs10/index.shtml
The graph 500 list, http://www.graph500.org/
Nvidia cuda, http://www.nvidia.com/cuda/
University of florida sparse matrix collection, http://www.cise.ufl.edu/research/sparse/matrices/
Bader, D.A., Madduri, K.: Gtgraph: A synthetic graph generator suite, Atlanta, GA (February 2006)
Google Scholar
Deo, N., Sarkar, D.: Parallel algorithms for merging and sorting. Information Sciences 56(1), 151–161 (1991)
Article MATH MathSciNet Google Scholar
Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the GPU using CUDA. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 197–208. Springer, Heidelberg (2007)
Chapter Google Scholar
Hong, S., Kim, S.K., Oguntebi, T., Olukotun, K.: Accelerating cuda graph algorithms at maximum warp. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, pp. 267–276. ACM (2011)
Google Scholar
Leiserson, C.E., Rivest, R.L., Stein, C., Cormen, T.H.: Introduction to algorithms. The MIT Press (2009)
Google Scholar
Luo, L., Wong, M., Hwu, W.M.: An effective gpu implementation of breadth-first search. In: Proceedings of the 47th Design Automation Conference, pp. 52–55. ACM (2010)
Google Scholar
Merrill, D., Garland, M., Grimshaw, A.: Scalable gpu graph traversal. In: ACM SIGPLAN Notices, vol. 17, pp. 117–128. ACM (2012)
Google Scholar
Nasre, R., Burtscher, M., Pingali, K.: Data-driven versus topology-driven irregular computations on gpus. In: 2013 IEEE 27th International Symposium onParallel & Distributed Processing (IPDPS), pp. 463–474. IEEE (2013)
Google Scholar
Nguyen, H.: Gpu gems 3. Addison-Wesley Professional (2007)
Google Scholar
Odeh, S., Green, O., Mwassi, Z., Shmueli, O., Birk, Y.: Merge path-parallel merging made simple. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1611–1618. IEEE (2012)
Google Scholar
Shiloach, Y., Vishkin, U.: Finding the maximum, merging, and sorting in a parallel computation model. Journal of Algorithms 2(1), 88–102 (1981)
Article MATH MathSciNet Google Scholar
Zhong, J., He, B.: Medusa: Simplified graph processing on gpus. IEEE Transactions on Parallel and Distributed Systems 99, 1 (2013) (PrePrints)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Technology, Huazhong University of Science & Technology, China
Zhe Zhu, Jianjun Li & Guohui Li

Authors

Zhe Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Li
View author publications
You can also search for this author in PubMed Google Scholar
Guohui Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Utah, 50 S. Central Campus Drive, 84112, Salt Lake City,, UT, USA
Feifei Li
Department of Computer Science, Tsinghua University, 100084, Beijing, China
Guoliang Li
POSTECH, Republic of Korea
Seung-won Hwang
Shanghai Key Laboratory of Scalable Computing and Systems, Department of Computer Science and Engineering,, Shanghai Jiao Tong University, China
Bin Yao
Advanced Digital Sciences Center (ADSC), 138632, Singapore, Singapore
Zhenjie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Z., Li, J., Li, G. (2014). Load-Balanced Breadth-First Search on GPUs. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-08010-9_46
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics