Scalable and efficient graph traversal on high-throughput cluster

Fan, Dongrui; Cao, Huawei; Wang, Guobo; Nie, Na; Ye, Xiaochun; Sun, Ninghui

doi:10.1007/s42514-020-00056-3

Scalable and efficient graph traversal on high-throughput cluster

Regular Paper
Published: 10 November 2020

Volume 3, pages 101–113, (2021)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Dongrui Fan^1,2,
Huawei Cao ORCID: orcid.org/0000-0003-1176-2521¹,
Guobo Wang^1,2,
Na Nie^1,2,
Xiaochun Ye^1,3 &
…
Ninghui Sun^1,2

372 Accesses
Explore all metrics

Abstract

Graph is one of the most important data structures in modern big data applications and is widely used in various fields. Among many graph algorithms, the Breadth-First Search (BFS) algorithm is a classic algorithm to solve the graph traversal problem and also the key kernel of Graph500 benchmark. On modern CPU architecture, the implementation of graph traversal on single-node systems has achieved significant improvement. However, due to the low resource utilization and high communications overhead, graph traversal on distributed clusters suffers from poor performance and energy inefficiency. High-throughput cluster (HTCs) adopt High-Throughput many-core architecture, which has the characteristics of high concurrency, strong real-time, and low-power consumption. In this work, we propose several techniques, including asynchronous virtual ring method, thread caching scheme and vertex ID reordering to solve above problems and improve BFS performance on HTCs. We systematically evaluate optimized BFS algorithm and achieve 249.74 giga-traversed edges per second (GTEPS) on 72 nodes (2880 cores) HTCs. Compared with results on Graph500 list, the optimized algorithm achieves the highest node efficiency under the same cluster scale and the performance shows weakly linear scalability as the number of cluster nodes increases. With regard to efficiency, the average performance on HTCs is 3.47 GTEPS/node, which is the best among CPU-based distributed systems on the November 2019 Graph500 list.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FOG: A Fast Out-of-Core Graph Processing Framework

Article 01 November 2016

GraphScSh: Efficient I/O Scheduling and Graph Sharing for Concurrent Graph Processing

Task Scheduling for Processing Big Graphs in Heterogeneous Commodity Clusters

References

Ang, J., Barrett, B., Wheeler, K., Murphy, R.: Introducing the Graph 500. Cray User’s Group (CUG) (2010)
Bader, D.A., Madduri, K.: SNAP, Small-world network analysis and partitioning: an open-source parallel graph framework for the exploration of large-scale networks. In: Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, (2008)
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Article MathSciNet Google Scholar
Barabasi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999). https://doi.org/10.1126/science.286.5439.509
Article MathSciNet MATH Google Scholar
Beamer, S., Buluç, A., Asanovic, K., Patterson, D.: Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, Cambridge, MA, pp 1618–1627 (2013). https://doi.org/10.1109/IPDPSW.2013.159.
Beamer, S., Asanović, K., Patterson, D.J.S.P.: Direction-optimizing breadth-first search. vol 21, no. 3–4, (2013b)
Buluç, A., Madduri, K.: Parallel breadth-first search on distributed memory systems, SC11 p 12–18 (2011)
CAO Communications: Big data white paper (2019), the official launch. Autom Panorama 2019(03), 17 (2019)
Google Scholar
Dongrui, F.A.N., Xiaochun, Y.E., Yungang, B.A.O., Ninghui, S.: Independent research and development of high throughput computer in China. Bull Chin Acad Sci 34, 648–656 (2019). https://doi.org/10.16418/j.issn.1000-3045.2019.06.006
Article Google Scholar
Fan, D. et al.: SmarCo: an efficient many-core processor for high-throughput applications in datacenters. In: High-performance computer architecture, pp 596–607 (2018)
Haveliwala, T.H.: Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search. IEEE Trans Knowl Data Eng 15(4), 796 (2003)
Article Google Scholar
Khorasani, F., Vora, K., Gupta, R., Bhuyan, L. N.: CuSha: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, 2014: ACM, pp 239–252 (2014)
Merrill, D., Garland, M., Grimshaw, A.: High-performance and scalable GPU graph traversal. ACM Trans Parallel Comput 1(2), 14 (2015)
Article Google Scholar
Mislove, A., Marcon, M., Gummadi, K.R., Druschel, P., Bhattacharjee, B., ACM.: Measurement and Analysis of Online Social Networks. In: Imc'07: Proceedings of the 2007 Acm Sigcomm Internet Measurement Conference. p 29 (2007)
Satish, N., Kim, C., Chhugani, J., Dubey, P.: Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing. In: International Conference for High Performance Computing, Networking, Storage and Analysis, (2012)
Ueno, K., Suzumura, T., Maruyama, N., Fujisawa, K., Matsuoka, S.: Extreme scale breadth-first search on supercomputers. pp 1047, (2016)
Wang, CHG., Zhang, C. et al.: Optimization of graph computing on High-Throughput Computer. In: HPC China2019, Hohhot, (2019)
Yan, M. et al.: Alleviating irregularity in graph analytics acceleration: a Hardware/Software Co-Design Approach. In: International Symposium on Microarchitecture, pp 615–628 (2019)
Yan, M. et al.: HyGCN: A GCN Accelerator with Hybrid Architecture. IEEE International Symposium on High Performance Computer Architecture (HPCA), pp 15–29 (2020). https://doi.org/10.1109/HPCA47549.2020.00012
Yasui, Y., Fujisawa, K.: Fast and scalable NUMA-based thread parallel breadth-first search. In: 2015 International Conference on High Performance Computing and Simulation (HPCS), 2015: IEEE, pp 377–385 (2015)
Yoo, A., Chow, E., Henderson, K., Mclendon, W., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on BlueGene/L. In: Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, (2005)
Zhang, C., Cao, H., Ye, X., Wang, G., Fan, D.: Highly Efficient breadth-first search on CPU-based single-node system. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), (2019)

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program (Grant No. 2018YFB1003501), the Strategic Priority Research Program of Chinese Academy of Sciences (XDC05000000), the National Natural Science of China (11904370, 61872335, 61732018, 61672499), the Innovation Project of the State Key Laboratory of Computer Architecture (CARCH4509), and the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing (2019A07).

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Dongrui Fan, Huawei Cao, Guobo Wang, Na Nie, Xiaochun Ye & Ninghui Sun
School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, China
Dongrui Fan, Guobo Wang, Na Nie & Ninghui Sun
State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, China
Xiaochun Ye

Authors

Dongrui Fan
View author publications
You can also search for this author in PubMed Google Scholar
Huawei Cao
View author publications
You can also search for this author in PubMed Google Scholar
Guobo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Na Nie
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Ye
View author publications
You can also search for this author in PubMed Google Scholar
Ninghui Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huawei Cao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, D., Cao, H., Wang, G. et al. Scalable and efficient graph traversal on high-throughput cluster. CCF Trans. HPC 3, 101–113 (2021). https://doi.org/10.1007/s42514-020-00056-3

Download citation

Received: 15 June 2020
Accepted: 22 October 2020
Published: 10 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s42514-020-00056-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable and efficient graph traversal on high-throughput cluster

Abstract

Access this article

Similar content being viewed by others

FOG: A Fast Out-of-Core Graph Processing Framework

GraphScSh: Efficient I/O Scheduling and Graph Sharing for Concurrent Graph Processing

Task Scheduling for Processing Big Graphs in Heterogeneous Commodity Clusters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable and efficient graph traversal on high-throughput cluster

Abstract

Access this article

Similar content being viewed by others

FOG: A Fast Out-of-Core Graph Processing Framework

GraphScSh: Efficient I/O Scheduling and Graph Sharing for Concurrent Graph Processing

Task Scheduling for Processing Big Graphs in Heterogeneous Commodity Clusters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation