ABSTRACT
Detecting strongly connected components (SCCs) is an important step in various graph computations. The fastest GPU and CPU implementations from the literature work well on graphs where most of the vertices belong to a single SCC and the vertex degrees follow a power-law distribution. However, these algorithms can be slow on the mesh graphs used in certain radiative transfer simulations, which have a nearly constant vertex degree and can have significant variability in the number and size of SCCs. We introduce ECL-SCC, an SCC detection algorithm that addresses these shortcomings. Our approach is GPU friendly and employs innovative techniques such as maximum ID propagation and edge removal. On an A100 GPU, ECL-SCC performs on par with the fastest prior GPU code on power-law graphs and outperforms it by 7.8× on mesh graphs. Moreover, ECL-SCC running on the GPU outperforms fast parallel CPU code by three orders of magnitude on meshes.
- Marvin L. Adams and Edward W. Larsen. 2002. Fast iterative methods for discrete-ordinates particle transport calculations. Progress in Nuclear Energy 40 (2002), 3--159. Issue 1.Google ScholarCross Ref
- Stefano Allesina, Antonio Bodini, and Cristina Bondavalli. 2005. Ecological subsystems via graph theory: the role of strongly connected components. Oikos 110, 1 (2005), 164--176.Google ScholarCross Ref
- Robert Anderson, Julian Andrej, Andrew Barker, Jamie Bramwell, Jean-Sylvain Camier, Jakub Cerveny, Veselin Dobrev, Yohann Dudouit, Aaron Fisher, Tzanio Kolev, et al. 2021. MFEM: A modular finite element methods library. Computers & Mathematics with Applications 81 (2021), 42--74.Google ScholarCross Ref
- Jiri Barnat, Petr Bauch, Lubos Brim, and Milan Ceška. 2011. Computing strongly connected components in parallel on CUDA. In 2011 IEEE International Parallel & Distributed Processing Symposium. IEEE, 544--555.Google ScholarDigital Library
- Martin Burtscher. 2023. ECL-SCC Git Repository. https://github.com/burtscher/ECL-SCC. Accessed: 2023-08-18.Google Scholar
- Martin Burtscher. 2023. ECL-SCC Website. https://cs.txstate.edu/~burtscher/research/ECL-SCC/. Accessed: 2023-08-18.Google Scholar
- Tim Davis. [n. d.]. SuiteSparse Matrix Collection. http://sparse.tamu.edu, Last accessed on 2023-03-16.Google Scholar
- Lisa K Fleischer, Bruce Hendrickson, and Ali Pınar. 2000. On identifying strongly connected components in parallel. In Parallel and Distributed Processing: 15 IPDPS 2000 Workshops Cancun, Mexico, May 1--5, 2000 Proceedings 14. Springer, 505--511.Google ScholarCross Ref
- Kshitij Gupta, Jeff A. Stuart, and John D. Owens. 2012. A study of Persistent Threads style GPU programming for GPGPU workloads. In 2012 Innovative Parallel Computing (InPar). IEEE, San Jose, CA, USA, 1--14. Google ScholarCross Ref
- T.S. Haut, P.G. Maginot, V.Z. Tomov, B.S. Southworth, T.A. Brunner, and T.S. Bailey. 2019. An efficient sweep-based solver for the SN equations on high-order meshes. Nuclear Science and Engineering 193 (2019), 746--759. Issue 7.Google ScholarCross Ref
- Sungpack Hong, Nicole C Rodia, and Kunle Olukotun. 2013. On fast parallel detection of strongly connected components (SCC) in small-world graphs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1--11.Google ScholarDigital Library
- John R. Howell, M. Pinar Julian Mengüc, Kyle Daun, and Robert Siegel. 2020. Thermal Radiation Heat Transfer (7 ed.). Taylor & Francis.Google Scholar
- Yuede Ji, Hang Liu, and H. Howie Huang. 2018. iSpan: Parallel Identification of Strongly Connected Components with Spanning Trees. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 731--742. Google ScholarDigital Library
- Pingfan Li, Xuhao Chen, Jie Shen, Jianbin Fang, Tao Tang, and Canqun Yang. 2017. High performance detection of strongly connected components in sparse graphs on GPUs. In Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores. 48--57.Google ScholarDigital Library
- William Mclendon III, Bruce Hendrickson, Steven J Plimpton, and Lawrence Rauchwerger. 2005. Finding strongly connected components in distributed graphs. J. Parallel and Distrib. Comput. 65, 8 (2005), 901--910.Google ScholarDigital Library
- Warren F. Miller and Elmer E. Lewis. 1993. Computational methods of neutron transport. Wiley.Google Scholar
- Rupesh Nasre, Martin Burtscher, and Keshav Pingali. 2013. Atomic-Free Irregular Computations on GPUs (GPGPU-6). Association for Computing Machinery, New York, NY, USA, 96--107. Google ScholarDigital Library
- Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proceedings of ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania) (SOSP '13). 456--471. Google ScholarDigital Library
- Sreepathi Pai and Keshav Pingali. 2016. A Compiler for Throughput Optimization of Graph Algorithms on GPUs. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Amsterdam, Netherlands) (OOPSLA 2016). Association for Computing Machinery, New York, NY, USA, 1--19. Google ScholarDigital Library
- Md. Mostofa Ali Patwary, Peder Refsnes, and Fredrik Manne. 2012. Multi-Core Spanning Forest Algorithms Using the Disjoint-Set Data Structure. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS '12). IEEE Computer Society, USA, 827--835. Google ScholarDigital Library
- Wen-Chih Peng, Haixun Wang, James Bailey, Vincent S Tseng, Tu Bao Ho, Zhi-Hua Zhou, and Arbee LP Chen. 2014. Trends and Applications in Knowledge Discovery and Data Mining: PAKDD 2014 International Workshops: DANTH, BDM, MobiSocial, BigEC, CloudSD, MSMV-MBI, SDA, DMDA-Health, ALSIP, SocNet, DMBIH, BigPMA, Tainan, Taiwan, May 13--16, 2014. Revised Selected Papers. Vol. 8643. Springer.Google Scholar
- Steven J. Plimpton, Bruce Hendrickson, Shawn P. Burns, William McLendon III, and Lawrence Rauchwerger. 2005. Parallel Sn sweeps on unstructured grids: algorithms for prioritization, grid Partitioning, and cycle detection. Nuclear Science and Engineering 150, 3 (2005), 267--283.Google ScholarCross Ref
- K.H. Randall, R. Stata, R.G. Wickremesinghe, and J.L. Wiener. 2002. The Link Database: fast access to graphs of the Web. In Proceedings DCC 2002. Data Compression Conference. 122--131. Google ScholarCross Ref
- Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing 1, 2 (1972), 146--160.Google Scholar
- Sebastiaan J. van Schaik and Oege de Moor. 2011. A Memory Efficient Reachability Data Structure through Bit Vector Compression. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (Athens, Greece) (SIGMOD '11). Association for Computing Machinery, New York, NY, USA, 913--924. Google ScholarDigital Library
- Jan I.C. Vermaak, Jean C. Ragusa, Marvin L. Adams, and Jim E. Morel. 2019. Massively parallel transport sweeps on meshes with cyclic dependencies. J. Comput. Phys. 425 (2019), 109892.Google ScholarCross Ref
Index Terms
- A GPU Algorithm for Detecting Strongly Connected Components
Recommendations
A High-Performance MST Implementation for GPUs
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisFinding a minimum spanning tree (MST) is a fundamental graph algorithm with applications in many fields. This paper presents ECL-MST, a fast MST implementation designed specifically for GPUs. ECL-MST is based on a parallelization approach that unifies ...
A high-performance connected components implementation for GPUs
HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed ComputingComputing connected components is an important graph algorithm that is used, for example, in medicine, image processing, and biochemistry. This paper presents a fast connected-components implementation for GPUs called ECL-CC. It builds upon the best ...
High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs
PMAM'17: Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and ManycoresDetecting strongly connected components (SCC) has been broadly used in many real-world applications. To speedup SCC detection for large-scale graphs, parallel algorithms have been proposed to leverage modern GPUs. Existing GPU implementations are able ...
Comments