Skip to main content
Log in

WAVE: designing a heuristics-based three-way breadth-first search on GPUs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Breadth-first search (BFS) is a building block for improving the performance of many iterative graph algorithms. In addition to conventional BFS (push), a novel method that traverses a graph in the reverse direction (pull) has emerged and gained popularity because of its enhanced processing performance. Several frameworks have recently used a hybrid approach known as direction-optimizing BFS, which utilizes both directions. However, these frameworks are mostly interested in optimizing the procedure in each direction, instead of designing sophisticated methods for determining the appropriate direction between push and pull at each iteration. Owing to the lack of in-depth discussion on this decision, state-of-the-art direction-optimizing BFS algorithms cannot demonstrate their comprehensive performance despite improvements in the design of each direction because they select ineffective directions at each iteration. We identified that the current frameworks suffer from high computational overheads for their decisions and make decisions that are overfitted to several graph datasets used for tuning their direction selection process. Based on observations from state-of-the-art limitations, we designed a direction-optimizing method for BFS called WAVE. WAVE minimizes the computational overhead to near zero and makes more appropriate direction selection decisions than the state-of-the-art heuristics based on the characteristics extracted from the input graph dataset. In our experiments on 20 graph benchmarks, WAVE achieved speedups of up to 4.95\(\times \), 5.79\(\times \), 46.49\(\times \), and 149.67\(\times \) over Enterprise, Gunrock, Tigr, and CuSha, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The data presented in this study are publicly available at https://github.com/kljp/WAVE/.

Notes

  1. WAVE is the abbreviation of the name, ‘a direction-optimizing BFS Working As Versatile and Efficient. ’

  2. https://github.com/kljp/WAVE/

References

  1. Gjoka M, Kurant M, Butts CT, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Select Areas Commun 29(9):1872–1892

    Article  Google Scholar 

  2. Haewoon K, Changhyun L, Hosung P, Sue M (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp 591–600

  3. Lubos T, Michal Z (2012) Data analysis in public social networks. In: International Scientific Conference and International Workshop Present Day Trends of Innovations, Volume 1

  4. Lars B, Dan H, Jon K, Xiangyang L (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 44–54

  5. Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Intern Math 6(1):29–123

    MathSciNet  MATH  Google Scholar 

  6. Yucheng L, Joseph G, Aapo K, Danny B, Carlos G, Hellerstein JM (2010) Graphlab: a new parallel framework for machine learning. In: Conference on Uncertainty in Artificial Intelligence (UAI), Volume 20

  7. Kyrola A, Blelloch G, Guestrin C (2012) Graphchi: large-scale graph computation on just a pc. In: 10th USENIX symposium on operating systems design and implementation (OSDI 12), pp 31–46

  8. Shun J, Blelloch GE (2013) Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp 135–146

  9. Nguyen D, Lenharth A, Pingali K (2013) A lightweight infrastructure for graph analytics. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 456–471

  10. Zheng R, Pai S (2021) Efficient execution of graph algorithms on cpu with simd extensions. In: 2021 IEEE/ACM international symposium on code generation and optimization (CGO), IEEE, pp 262–276

  11. Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms. MIT press, Cambridge

    MATH  Google Scholar 

  12. Meyer U, Sanders P (1998) \(\delta \)-stepping: a parallel single source shortest path algorithm. In: European symposium on algorithms, pp 393–404. Springer, Berlin

  13. Page Lawrence, Brin Sergey, Motwani Rajeev, Winograd Terry (1999) The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab

  14. Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp 135–146

  15. Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: 10th USENIX Symposium on operating systems design and implementation (OSDI 12), pp 17–30

  16. Gonzalez Joseph E, Xin Reynold S, Dave Ankur, Crankshaw Daniel, Franklin Michael J, Stoica Ion (2014) Graphx: Graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp 599–613

  17. Fu Z, Personick M, Thompson B (2014) Mapgraph: a high level api for fast development of high performance graph analytics on gpus. In: Proceedings of workshop on GRAph data management experiences and systems, pp 1–6

  18. Khorasani F, Vora K, Gupta R, Bhuyan LN (2014) Cusha: vertex-centric graph processing on gpus. In: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pp 239–252

  19. Wang Y, Davidson A, Pan Y, Wu Y, Riffel A, Owens JD (2016) Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN symposium on principles and practice of parallel programming, pp 1–12

  20. Sabet AHN, Qiu J, Zhao Zhijia (2018) Tigr: transforming irregular graphs for GPU-friendly graph processing. ACM SIGPLAN Not 53(2):622–636

    Article  Google Scholar 

  21. Liu H, Huang HH (2019) Simd-x: programming and processing of graph algorithms on GPUS. In: 2019 USENIX Annual Technical Conference (USENIX ATC 19), pp 411–428

  22. Wang P, Wang J, Li C, Wang J, Zhu H, Guo Minyi (2021) Grus: toward unified-memory-efficient high-performance graph processing on GPU. ACM Trans Archit Code Optimiz (TACO) 18(2):1–25

    Article  Google Scholar 

  23. Merrill D, Garland M, Grimshaw A (2012) Scalable GPU graph traversal. ACM Sigpl Not 47(8):117–128

    Article  Google Scholar 

  24. Liu H, Huang HH (2015) Enterprise: breadth-first graph traversal on GPUS. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–12

  25. Gaihre A, Wu Z, Yao F, Liu H (2019) Xbfs: exploring runtime optimizations for breadth-first search on gpus. In: Proceedings of the 28th International symposium on high-performance parallel and distributed computing, pp 121–131

  26. Beamer S, Asanovic K, Patterson D (2012) Direction-optimizing breadth-first search. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, IEEE, pp 1–10

  27. Setting parameters for direction-optimized bfs. https://gunrock.github.io/gunrock/doc/latest/md_stats_do_ab_random.html/, (2017)

  28. Yamane T (1967) An introductory analysis of statistics

  29. Israel GD (1992) Determining sample size

  30. Davis TA, Hu Yifan (2011) The university of florida sparse matrix collection. ACM Trans Math Softw (TOMS) 38(1):1–25

    MathSciNet  MATH  Google Scholar 

  31. Elteir M, Lin H, Feng WC (2011) Performance characterization and optimization of atomic operations on amd gpus. In: 2011 IEEE International Conference on Cluster Computing, IEEE, pp 234–243

  32. Gunrock repository (2013). https://github.com/gunrock/gunrock/

  33. Balaji V, Lucia B (2019) Combining data duplication and graph reordering to accelerate parallel graph processing. In: Proceedings of the 28th International symposium on high-performance parallel and distributed computing, pp 133–144

  34. Zhang Y, Kiriansky V, Mendis C, Amarasinghe S, Zaharia M (2017) Making caches work for graph analytics. In: 2017 IEEE International Conference on Big Data (Big Data), IEEE, pp 293–302

  35. Mawi working group traffic archive (2015). https://sparse.tamu.edu/MAWI/mawi_201512020330/

  36. Cuda C++ programming guide (2022). https://docs.nvidia.com/cuda/cuda-c-programming-guide/

  37. Cheng J, Grossman M, McKercher T (2014) Professional CUDA C programming. Wiley, Hobroken

    Google Scholar 

  38. Luebke D (2008) Cuda: scalable parallel programming for high-performance scientific computing. In: 2008 5th IEEE international symposium on biomedical imaging: from nano to macro, IEEE , pp 836–838

  39. Enterprise repository (2016). https://github.com/iHeartGraph/Enterprise/

  40. Tigr repository (2018). https://github.com/AutomataLab/Tigr/

  41. Cusha repository (2014). https://github.com/farkhor/CuSha/

Download references

Acknowledgements

This work was jointly supported by the Basic Science Research Program Through National Research Foundation (NRF) of Korea (2021R1F1A1062779), the Korea Institute of Science and Technology Information (KISTI) (TS-2022-RE-0019), and the ITRC (Information Technology Research Center) support program (IITP-2021-2018-0-01431) supervised by the Institute for Information & Communications Technology Planning & Evaluation (IITP), Korea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sangyoon Oh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yoon, D., Jeong, M. & Oh, S. WAVE: designing a heuristics-based three-way breadth-first search on GPUs. J Supercomput 79, 6889–6917 (2023). https://doi.org/10.1007/s11227-022-04934-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04934-1

Keywords

Navigation