Abstract
Breadth-first search (BFS) is a building block for improving the performance of many iterative graph algorithms. In addition to conventional BFS (push), a novel method that traverses a graph in the reverse direction (pull) has emerged and gained popularity because of its enhanced processing performance. Several frameworks have recently used a hybrid approach known as direction-optimizing BFS, which utilizes both directions. However, these frameworks are mostly interested in optimizing the procedure in each direction, instead of designing sophisticated methods for determining the appropriate direction between push and pull at each iteration. Owing to the lack of in-depth discussion on this decision, state-of-the-art direction-optimizing BFS algorithms cannot demonstrate their comprehensive performance despite improvements in the design of each direction because they select ineffective directions at each iteration. We identified that the current frameworks suffer from high computational overheads for their decisions and make decisions that are overfitted to several graph datasets used for tuning their direction selection process. Based on observations from state-of-the-art limitations, we designed a direction-optimizing method for BFS called WAVE. WAVE minimizes the computational overhead to near zero and makes more appropriate direction selection decisions than the state-of-the-art heuristics based on the characteristics extracted from the input graph dataset. In our experiments on 20 graph benchmarks, WAVE achieved speedups of up to 4.95\(\times \), 5.79\(\times \), 46.49\(\times \), and 149.67\(\times \) over Enterprise, Gunrock, Tigr, and CuSha, respectively.
Similar content being viewed by others
Data availability
The data presented in this study are publicly available at https://github.com/kljp/WAVE/.
Notes
WAVE is the abbreviation of the name, ‘a direction-optimizing BFS Working As Versatile and Efficient. ’
References
Gjoka M, Kurant M, Butts CT, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Select Areas Commun 29(9):1872–1892
Haewoon K, Changhyun L, Hosung P, Sue M (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp 591–600
Lubos T, Michal Z (2012) Data analysis in public social networks. In: International Scientific Conference and International Workshop Present Day Trends of Innovations, Volume 1
Lars B, Dan H, Jon K, Xiangyang L (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 44–54
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Intern Math 6(1):29–123
Yucheng L, Joseph G, Aapo K, Danny B, Carlos G, Hellerstein JM (2010) Graphlab: a new parallel framework for machine learning. In: Conference on Uncertainty in Artificial Intelligence (UAI), Volume 20
Kyrola A, Blelloch G, Guestrin C (2012) Graphchi: large-scale graph computation on just a pc. In: 10th USENIX symposium on operating systems design and implementation (OSDI 12), pp 31–46
Shun J, Blelloch GE (2013) Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp 135–146
Nguyen D, Lenharth A, Pingali K (2013) A lightweight infrastructure for graph analytics. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 456–471
Zheng R, Pai S (2021) Efficient execution of graph algorithms on cpu with simd extensions. In: 2021 IEEE/ACM international symposium on code generation and optimization (CGO), IEEE, pp 262–276
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms. MIT press, Cambridge
Meyer U, Sanders P (1998) \(\delta \)-stepping: a parallel single source shortest path algorithm. In: European symposium on algorithms, pp 393–404. Springer, Berlin
Page Lawrence, Brin Sergey, Motwani Rajeev, Winograd Terry (1999) The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp 135–146
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: 10th USENIX Symposium on operating systems design and implementation (OSDI 12), pp 17–30
Gonzalez Joseph E, Xin Reynold S, Dave Ankur, Crankshaw Daniel, Franklin Michael J, Stoica Ion (2014) Graphx: Graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp 599–613
Fu Z, Personick M, Thompson B (2014) Mapgraph: a high level api for fast development of high performance graph analytics on gpus. In: Proceedings of workshop on GRAph data management experiences and systems, pp 1–6
Khorasani F, Vora K, Gupta R, Bhuyan LN (2014) Cusha: vertex-centric graph processing on gpus. In: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pp 239–252
Wang Y, Davidson A, Pan Y, Wu Y, Riffel A, Owens JD (2016) Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN symposium on principles and practice of parallel programming, pp 1–12
Sabet AHN, Qiu J, Zhao Zhijia (2018) Tigr: transforming irregular graphs for GPU-friendly graph processing. ACM SIGPLAN Not 53(2):622–636
Liu H, Huang HH (2019) Simd-x: programming and processing of graph algorithms on GPUS. In: 2019 USENIX Annual Technical Conference (USENIX ATC 19), pp 411–428
Wang P, Wang J, Li C, Wang J, Zhu H, Guo Minyi (2021) Grus: toward unified-memory-efficient high-performance graph processing on GPU. ACM Trans Archit Code Optimiz (TACO) 18(2):1–25
Merrill D, Garland M, Grimshaw A (2012) Scalable GPU graph traversal. ACM Sigpl Not 47(8):117–128
Liu H, Huang HH (2015) Enterprise: breadth-first graph traversal on GPUS. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–12
Gaihre A, Wu Z, Yao F, Liu H (2019) Xbfs: exploring runtime optimizations for breadth-first search on gpus. In: Proceedings of the 28th International symposium on high-performance parallel and distributed computing, pp 121–131
Beamer S, Asanovic K, Patterson D (2012) Direction-optimizing breadth-first search. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, IEEE, pp 1–10
Setting parameters for direction-optimized bfs. https://gunrock.github.io/gunrock/doc/latest/md_stats_do_ab_random.html/, (2017)
Yamane T (1967) An introductory analysis of statistics
Israel GD (1992) Determining sample size
Davis TA, Hu Yifan (2011) The university of florida sparse matrix collection. ACM Trans Math Softw (TOMS) 38(1):1–25
Elteir M, Lin H, Feng WC (2011) Performance characterization and optimization of atomic operations on amd gpus. In: 2011 IEEE International Conference on Cluster Computing, IEEE, pp 234–243
Gunrock repository (2013). https://github.com/gunrock/gunrock/
Balaji V, Lucia B (2019) Combining data duplication and graph reordering to accelerate parallel graph processing. In: Proceedings of the 28th International symposium on high-performance parallel and distributed computing, pp 133–144
Zhang Y, Kiriansky V, Mendis C, Amarasinghe S, Zaharia M (2017) Making caches work for graph analytics. In: 2017 IEEE International Conference on Big Data (Big Data), IEEE, pp 293–302
Mawi working group traffic archive (2015). https://sparse.tamu.edu/MAWI/mawi_201512020330/
Cuda C++ programming guide (2022). https://docs.nvidia.com/cuda/cuda-c-programming-guide/
Cheng J, Grossman M, McKercher T (2014) Professional CUDA C programming. Wiley, Hobroken
Luebke D (2008) Cuda: scalable parallel programming for high-performance scientific computing. In: 2008 5th IEEE international symposium on biomedical imaging: from nano to macro, IEEE , pp 836–838
Enterprise repository (2016). https://github.com/iHeartGraph/Enterprise/
Tigr repository (2018). https://github.com/AutomataLab/Tigr/
Cusha repository (2014). https://github.com/farkhor/CuSha/
Acknowledgements
This work was jointly supported by the Basic Science Research Program Through National Research Foundation (NRF) of Korea (2021R1F1A1062779), the Korea Institute of Science and Technology Information (KISTI) (TS-2022-RE-0019), and the ITRC (Information Technology Research Center) support program (IITP-2021-2018-0-01431) supervised by the Institute for Information & Communications Technology Planning & Evaluation (IITP), Korea.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yoon, D., Jeong, M. & Oh, S. WAVE: designing a heuristics-based three-way breadth-first search on GPUs. J Supercomput 79, 6889–6917 (2023). https://doi.org/10.1007/s11227-022-04934-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04934-1