Skip to main content

Advertisement

Log in

Scalable and efficient graph traversal on high-throughput cluster

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Graph is one of the most important data structures in modern big data applications and is widely used in various fields. Among many graph algorithms, the Breadth-First Search (BFS) algorithm is a classic algorithm to solve the graph traversal problem and also the key kernel of Graph500 benchmark. On modern CPU architecture, the implementation of graph traversal on single-node systems has achieved significant improvement. However, due to the low resource utilization and high communications overhead, graph traversal on distributed clusters suffers from poor performance and energy inefficiency. High-throughput cluster (HTCs) adopt High-Throughput many-core architecture, which has the characteristics of high concurrency, strong real-time, and low-power consumption. In this work, we propose several techniques, including asynchronous virtual ring method, thread caching scheme and vertex ID reordering to solve above problems and improve BFS performance on HTCs. We systematically evaluate optimized BFS algorithm and achieve 249.74 giga-traversed edges per second (GTEPS) on 72 nodes (2880 cores) HTCs. Compared with results on Graph500 list, the optimized algorithm achieves the highest node efficiency under the same cluster scale and the performance shows weakly linear scalability as the number of cluster nodes increases. With regard to efficiency, the average performance on HTCs is 3.47 GTEPS/node, which is the best among CPU-based distributed systems on the November 2019 Graph500 list.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Ang, J., Barrett, B., Wheeler, K., Murphy, R.: Introducing the Graph 500. Cray User’s Group (CUG) (2010)

  • Bader, D.A., Madduri, K.: SNAP, Small-world network analysis and partitioning: an open-source parallel graph framework for the exploration of large-scale networks. In: Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, (2008)

  • Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Article  MathSciNet  Google Scholar 

  • Barabasi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999). https://doi.org/10.1126/science.286.5439.509

    Article  MathSciNet  MATH  Google Scholar 

  • Beamer, S., Buluç, A., Asanovic, K., Patterson, D.: Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, Cambridge, MA, pp 1618–1627 (2013). https://doi.org/10.1109/IPDPSW.2013.159.

  • Beamer, S., Asanović, K., Patterson, D.J.S.P.: Direction-optimizing breadth-first search. vol 21, no. 3–4, (2013b)

  • Buluç, A., Madduri, K.: Parallel breadth-first search on distributed memory systems, SC11 p 12–18 (2011)

  • CAO Communications: Big data white paper (2019), the official launch. Autom Panorama 2019(03), 17 (2019)

    Google Scholar 

  • Dongrui, F.A.N., Xiaochun, Y.E., Yungang, B.A.O., Ninghui, S.: Independent research and development of high throughput computer in China. Bull Chin Acad Sci 34, 648–656 (2019). https://doi.org/10.16418/j.issn.1000-3045.2019.06.006

    Article  Google Scholar 

  • Fan, D. et al.: SmarCo: an efficient many-core processor for high-throughput applications in datacenters. In: High-performance computer architecture, pp 596–607 (2018)

  • Haveliwala, T.H.: Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search. IEEE Trans Knowl Data Eng 15(4), 796 (2003)

    Article  Google Scholar 

  • Khorasani, F., Vora, K., Gupta, R., Bhuyan, L. N.: CuSha: vertex-centric graph processing on GPUs. In: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, 2014: ACM, pp 239–252 (2014)

  • Merrill, D., Garland, M., Grimshaw, A.: High-performance and scalable GPU graph traversal. ACM Trans Parallel Comput 1(2), 14 (2015)

    Article  Google Scholar 

  • Mislove, A., Marcon, M., Gummadi, K.R., Druschel, P., Bhattacharjee, B., ACM.: Measurement and Analysis of Online Social Networks. In: Imc'07: Proceedings of the 2007 Acm Sigcomm Internet Measurement Conference. p 29 (2007)

  • Satish, N., Kim, C., Chhugani, J., Dubey, P.: Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing. In: International Conference for High Performance Computing, Networking, Storage and Analysis, (2012)

  • Ueno, K., Suzumura, T., Maruyama, N., Fujisawa, K., Matsuoka, S.: Extreme scale breadth-first search on supercomputers. pp 1047, (2016)

  • Wang, CHG., Zhang, C. et al.: Optimization of graph computing on High-Throughput Computer. In: HPC China2019, Hohhot, (2019)

  • Yan, M. et al.: Alleviating irregularity in graph analytics acceleration: a Hardware/Software Co-Design Approach. In: International Symposium on Microarchitecture, pp 615–628 (2019)

  • Yan, M. et al.: HyGCN: A GCN Accelerator with Hybrid Architecture. IEEE International Symposium on High Performance Computer Architecture (HPCA), pp 15–29 (2020). https://doi.org/10.1109/HPCA47549.2020.00012

  • Yasui, Y., Fujisawa, K.: Fast and scalable NUMA-based thread parallel breadth-first search. In: 2015 International Conference on High Performance Computing and Simulation (HPCS), 2015: IEEE, pp 377–385 (2015)

  • Yoo, A., Chow, E., Henderson, K., Mclendon, W., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on BlueGene/L. In: Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, (2005)

  • Zhang, C., Cao, H., Ye, X., Wang, G., Fan, D.: Highly Efficient breadth-first search on CPU-based single-node system. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), (2019)

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program (Grant No. 2018YFB1003501), the Strategic Priority Research Program of Chinese Academy of Sciences (XDC05000000), the National Natural Science of China (11904370, 61872335, 61732018, 61672499), the Innovation Project of the State Key Laboratory of Computer Architecture (CARCH4509), and the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing (2019A07).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huawei Cao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, D., Cao, H., Wang, G. et al. Scalable and efficient graph traversal on high-throughput cluster. CCF Trans. HPC 3, 101–113 (2021). https://doi.org/10.1007/s42514-020-00056-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-020-00056-3

Keywords

Navigation