FSGraph: fast and scalable implementation of graph traversal on GPUs

Zhang, Yuan; Cao, Huawei; Liang, Yan; Zhang, Jie; Huang, Junying; Ye, Xiaochun; An, Xuejun

doi:10.1007/s42514-023-00155-x

FSGraph: fast and scalable implementation of graph traversal on GPUs

Regular Paper
Published: 31 May 2023

Volume 5, pages 277–291, (2023)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Yuan Zhang^1,2,
Huawei Cao ORCID: orcid.org/0000-0003-1176-2521^1,3,
Yan Liang^1,2,
Jie Zhang^1,2,
Junying Huang¹,
Xiaochun Ye¹ &
…
Xuejun An^1,2

240 Accesses
1 Citation
Explore all metrics

Abstract

Graph is one of the best ways to express and process association relationship. It is widely used in various applications, including social networks, fraud detection, Internet of things, etc. As a typical graph traversal algorithm, the Breadth-First Search (BFS) performance on GPU is not desirable, due to strong data dependency, intensive irregular memory access and low computation intensity. On GPUs, the situation is even worse with unbalanced data partitioning and high communication-to-computation ratios. In this paper, we implement FSGraph that is a fast and scalable BFS implementation on GPUs. In FSGraph, we propose three optimizing techniques: GPU-friendly Compressed Sparse Row (CSR) structure, bidirectional one-dimensional (1d) partition and UM-aware communication. We have evaluated our work with extensive experiments on four T4 and four V100 GPUs. The average performance of BFS on four T4 GPUs is 132.67 Giga-Traversed Edges per Second (GTEPS), which delivers up to 1.44\(\times\) improvement than that on single T4. In terms of four V100 GPUs, the BFS performance achieves 392.35 GTEPS and outperforms existing CPU-based cluster with 1024 nodes on November 2022 Graph500 list.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

Graph based anomaly detection and description: a survey

Article 05 July 2014

A survey on visualization approaches for exploring association relationships in graph data

Article 02 April 2019

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Agarwal, V., Petrini, F., Pasetto, D., Bader, D.:A.: Scalable graph exploration on multicore processors. In: SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–11. IEEE, 2010
Bader, D. A., Madduri, K.: Snap, small-world network analysis and partitioning: An open-source parallel graph framework for the exploration of large-scale networks. In: 2008 IEEE international symposium on parallel and distributed processing, pp. 1–12, IEEE, 2008
Beamer, S., Asanovic, K., Patterson, D.: Direction-optimizing breadth-first search. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–10, IEEE, 2012
Bernaschi, M., Carbone, G., Mastrostefano, E., Bisson, M., Fatica, M.: Enhanced gpu-based distributed breadth first search. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, pages 1–8, 2015
Bisson, Mauro, Bernaschi, Massimo, Mastrostefano, Enrico: Parallel distributed breadth first search on the kepler architecture. IEEE Transact. Parallel Distrib. Syst 27(7), 2091–2102 (2015)
Article Google Scholar
Buluç, Aydin, Madduri, K.: Parallel breadth-first search on distributed memory systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12, 2011
Busato, Federico, Bombieri, Nicola: Bfs-4k: an efficient implementation of bfs for kepler gpu architectures. IEEE Transact. Parallel Distrib. Syst. 26(7), 1826–1838 (2014)
Article Google Scholar
Checconi, F.o, Petrini, F., Willcock, J., Lumsdaine, A., Choudhury, A. Roy, Sabharwal, Y.: Breaking the speed and scalability barriers for graph exploration on distributed-memory machines. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 1–12, IEEE, 2012
De Domenico, Manlio, Lima, Antonio, Mougel, Paul, Musolesi, Mirco: The anatomy of a scientific rumor. Sci. Rep. 3(1), 1–9 (2013)
Article Google Scholar
Dong, R.u, Cao, H., Ye, X., Zhang, Y., Hao, Q., Fan, D.: Highly efficient and gpu-friendly implementation of bfs on single-node system. In: 2020 IEEE Intl Conf on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking (ISPA/BDCloud/SocialCom/SustainCom), pp 544–553, IEEE, 2020
Faloutsos, Michalis, Faloutsos, Petros, Faloutsos, Christos: On power-law relationships of the internet topology. In: The Structure and Dynamics of Networks, pp. 195–206. Princeton University Press, New jersey (2011)
Chapter MATH Google Scholar
Graph500. http://www.graph500.org, (2010)
Harish, P., Narayanan, P. J.: Accelerating large graph algorithms on the gpu using cuda. In International conference on high-performance computing, Springer, pp 197–208, 2007
Hiragushi, T., Takahashi, D.: Efficient hybrid breadth-first search on gpus. In: International Conference on Algorithms and Architectures for Parallel Processing, Springer, pages 40–50, 2013
Hong, Sungpack, Kim, Sang Kyun, Oguntebi, Tayo, Olukotun, Kunle: Accelerating cuda graph algorithms at maximum warp. Acm. Sigplan. Notices 46(8), 267–276 (2011)
Article Google Scholar
Khorasani, F., Vora, Keval, G., Rajiv, B., Laxmi N., Cusha.: Vertex-centric graph processing on gpus. In: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pages 239–252, 2014
Klymko, C., Gleich, D., Kolda, T, G.: Using triangles to improve community detection in directed networks. arXiv preprint arXiv:1404.5874,(2014)
Li, Z., Wang, H., Zhang, P., Hui, P., Huang, J., Liao, J., Zhang, J., Bu, J.: Live-streaming fraud detection: a heterogeneous graph neural network approach. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3670–3678, 2021
Liu, H., Huang, H H.: Enterprise: breadth-first graph traversal on gpus. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12, 2015
Luo, L., Wong, M., Hwu, Wen-m.: An effective gpu implementation of breadth-first search. In: Design Automation Conference, pages 52–55, IEEE, 2010
Merrill, Duane, Garland, Michael, Grimshaw, Andrew: Scalable gpu graph traversal. Acm. Sigplan. Notices 47(8), 117–128 (2012)
Article Google Scholar
Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pages 29–42, 2007
Murphy, Richard C., Wheeler, Kyle B., Barrett, Brian W., Ang, James A.: Introducing the graph 500. Cray Use. Group (CUG). 19, 45–74 (2010)
Google Scholar
Nvidia. nvidia t4 70w low profile pcie gpu accelerator. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-t4/t4-tensor-core-product-brief.pdf, (2020)
Pan, Y., Pearce, R., Owens, J, D.: Scalable breadth-first search on a gpu cluster. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1090–1101. IEEE, 2018
Pan, Y., Wang, Y., Wu, Y., Yang, C., Owens, J. D.: Multi-gpu graph analytics. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE,pages 479–490, 2017
Pham, T.-A. N., Li, X., Cong, G., Zhang, Z.: A general graph-based model for recommendation in event-based social networks. In: 2015 IEEE 31st international conference on data engineering, pp. 567–578, IEEE, 2015
Potluri, Sreeram, Goswami, Anshuman, Venkata, Manjunath Gorentla, Imam, Neena: Efficient breadth first search on multi-gpu systems using gpu-centric openshmem, pp. 82–96. Springer, In Workshop on OpenSHMEM and Related Technologies (2017)
Google Scholar
Sabet, Amir Hossein N., Zhao, Zhijia, Gupta R.: Subway Minimizing data transfer during out-of-gpu-memory graph processing. In: Proceedings of the Fifteenth European Conference on Computer Systems, pages 1–16, 2020
Sabet, Amir Hossein Nodehi., Qiu, Junqiao, Zhao, Zhijia: Tigr: Transforming irregular graphs for gpu-friendly graph processing. ACM SIGPLAN Notices 53(2), 622–636 (2018)
Article Google Scholar
Takac, L., Zabovsky, M.: Data analysis in public social networks. In: International scientific conference and international workshop present day trends of innovations. Present Day Trends of Innovations Lamza Poland, 2012
Ting, Y., Yan, C., Xiang-wei, M.: Personalized recommendation system based on web log mining and weighted bipartite graph. In: 2013 international conference on computational and information sciences, pp 587–590, IEEE, 2013
Ueno, K., Suzumura, T.: Highly scalable graph search for the graph500 benchmark. In: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, pages 149–160, 2012
Wang, Pengyu, Wang, Jing, Li, Chao, Wang, Jianzong, Zhu, Haojin, Guo, Minyi: Grus: Toward unified-memory-efficient high-performance graph processing on gpu. ACM Transact. Architec. Code Optimiz. (TACO) 18(2), 1–25 (2021)
Article Google Scholar
Yang, Jaewon, Leskovec, Jure: Defining and evaluating network communities based on ground-truth. Knowledge Info. Syst. 42(1), 181–213 (2015)
Article Google Scholar
Yasui, Y., Fujisawa, K.: Fast and scalable numa-based thread parallel breadth-first search. In: 2015 International Conference on High Performance Computing and Simulation (HPCS), pp 377–385, IEEE, 2015
Yin, H., Benson, A. R., Leskovec, J., Gleich, D. F.:Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 555–564, 2017
Yoo, A., Chow, E., Henderson, K.h, McLendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on bluegene/l. In: SC’05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, pp 25–25, IEEE, 2005
Zhang, C., Cao, H., Ye, X., Wang, G., Hao, Q., Fan, D.: Highly efficient breadth-first search on cpu-based single-node system. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pages 2066–2071, IEEE, 2019
Zhong, Jianlong, He, Bingsheng: Medusa: Simplified graph processing on gpus. IEEE Transact. Parallel Distrib. Syst. 25(6), 1543–1552 (2013)
Article Google Scholar
Zhong, Wenyong, Sun, Jianhua, Chen, Hao, Xiao, Jun, Chen, Zhiwen, Cheng, Chang, Shi, Xuanhua: Optimizing graph processing on gpus. IEEE Transact. Parallel Distrib. Syst. 28(4), 1149–1162 (2016)
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Key Research and Development Program (Grant No. 2022YFB4501404), the Beijing Natural Science Foundation (4232036), CAS Project for Youth Innovation Promotion Association.

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Yuan Zhang, Huawei Cao, Yan Liang, Jie Zhang, Junying Huang, Xiaochun Ye & Xuejun An
University of Chinese Academy of Sciences, Beijing, 100049, China
Yuan Zhang, Yan Liang, Jie Zhang & Xuejun An
University of Chinese Academy of Sciences, Nanjing, 211135, China
Huawei Cao

Authors

Yuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huawei Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yan Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Junying Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xuejun An
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huawei Cao.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Cao, H., Liang, Y. et al. FSGraph: fast and scalable implementation of graph traversal on GPUs. CCF Trans. HPC 5, 277–291 (2023). https://doi.org/10.1007/s42514-023-00155-x

Download citation

Received: 11 March 2023
Accepted: 15 May 2023
Published: 31 May 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s42514-023-00155-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FSGraph: fast and scalable implementation of graph traversal on GPUs

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Graph based anomaly detection and description: a survey

A survey on visualization approaches for exploring association relationships in graph data

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FSGraph: fast and scalable implementation of graph traversal on GPUs

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Graph based anomaly detection and description: a survey

A survey on visualization approaches for exploring association relationships in graph data

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation