Fast Parallel Connected Components Algorithms on GPUs

Cong, Guojing; Muzio, Paul

doi:10.1007/978-3-319-14325-5_14

Guojing Cong³⁴ &
Paul Muzio³⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8805))

Included in the following conference series:

European Conference on Parallel Processing

1894 Accesses
4 Citations

Abstract

We study parallel connected components algorithms on GPUs in comparison with CPUs. Although straightforward implementation of PRAM algorithms performs relatively better on GPUs than on CPUs, the GPU memory subsystem performance is poor due to non-coalesced random accesses.

We argue that generic sort-based access coalescing is too costly on GPUs. We propose a new coalescing technique and a new meta algorithm to improve locality and performance. Our optimization achieves up to 2.7 times speedup over the straightforward implementation. Interestingly, our optimization also works well on CPUs.

Comparing the best-performing algorithms on GPUs and CPUs, we find our new algorithm is the fastest on GPUs and the second fastest on CPUs, while the parallel Rem’s algorithm is the fastest on CPUs but does not perform well on GPUs due to path divergence.

Download to read the full chapter text

Chapter PDF

Novel Parallel Algorithms for Fast Multi-GPU-Based Generation of Massive Scale-Free Networks

Article Open access 30 March 2019

A graph pattern mining framework for large graphs on GPU

Article 05 December 2024

Distributed Sparse Block Grids on GPUs

Keywords

References

Arge, L., Bender, M.A., Demaine, E.D., Holland-Minkley, B., Munro, J.I.: Cache-oblivious priority queue and graph algorithm applications. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing, Montreal, Canada, pp. 268–276 (2002)
Google Scholar
Arge, L., Goodrich, M.T., Nelson, M., Sitchinava, N.: Fundamental parallel algorithms for private-cache chip multiprocessors. In: Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures, SPAA 2008, pp. 197–206. ACM, New York (2008)
Chapter Google Scholar
Arge, L., Goodrich, M.T., Sitchinava, N.: Parallel external memory graph algorithms. In: 24th IEEE International Parallel & Distributed Processing Symposium, Atlanta, Georgia, USA (2010)
Google Scholar
Bader, D.A., Cong, G.: A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs). In: Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), Santa Fe, New Mexico (April 2004)
Google Scholar
Blelloch, G.E., Chowdhury, R.A., Gibbons, P.B., Ramachandran, V., Chen, S., Kozuch, M.: Provably good multicore cache performance for divide-and-conquer algorithms. In: In Proc. 19th ACM-SIAM Sympos. Discrete Algorithms, pp. 501–510 (2008)
Google Scholar
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In: Proc. 4th SIAM Intl. Conf. on Data Mining (April 2004)
Google Scholar
Vitter, J.S.: External memory algorithms. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 1–25. Springer, Heidelberg (1998)
Chapter Google Scholar
Chowdhury, R., Silvestri, F., Blakeley, B., Ramachandran, V.: Oblivious algorithms for multicores and network of processors. In: 24th IEEE International Parallel & Distributed Processing Symposium, Atlanta, Georgia, USA (2010)
Google Scholar
Cong, G., Makarychev, K.: Optimizing large-scale graph analysis on multi-threaded, multi-core platforms. In: Proceedings of the 2012 IEEE International Parallel & Distributed Processing Symposium, IPDPS 2012, pp. 414–425. IEEE Computer Society, Washington, DC (2012)
Google Scholar
Dehne, F., Yogaratnam, K.: Exploring the limits of GPUs with parallel graph algorithms. CoRR, abs/1002.4482 (2010)
Google Scholar
Goh, K.-I., Oh, E., Jeong, H., Kahng, B., Kim, D.: Classification of scale-free networks. Proc. Natl. Acad. Sci. 99, 12583–12588 (2002)
Article MathSciNet MATH Google Scholar
Hong, S., Oguntebi, T., Olukotun, K.: Efficient parallel graph exploration on multi-core cpu and gpu. In: 2011 Int’l Conf. on Parallel Architectures and Compilation Techniques (PACT), pp. 78–88 (October 2011)
Google Scholar
Ladner, R., Fix, J.D., LaMarca, A.: The cache performance of traversals and random accesses. In: Proc. 10th Ann. Symp. Discrete Algorithms (SODA-1999), pp. 613–622. ACM-SIAM, Baltimore (1999)
Google Scholar
Lee, J., Lakshminarayanaand, N.B., Hyesoon, K., Vuduc, R.: Many-thread aware prefetching mechanisms for GPGPU applications. In: 43rd Annual IEEE/ACM Int’l Symp on Microarchitecture (MICRO), pp. 213–224 (December 2010)
Google Scholar
Luo, L., Wong, M., Hwu, W.: An effective gpu implementation of breadth-first search. In: 2010 47th ACM/IEEE Design Automation Conference (DAC), pp. 52–55 (June 2010)
Google Scholar
Palmer, E.M.: Graphical evolution. Wiley-Interscience Series in Discrete Mathematics. Wiley (1985)
Google Scholar
Patwary, M.A., Ref, P., Manne, F.: Multi-core spanning forest algorithms using the disjoint-set data structure. In: Proceedings of the 2012 IEEE International Parallel & Distributed Processing Symposium, IPDPS 2012, pp. 827–835. IEEE Computer Society Press, Washington, DC (2012)
Google Scholar
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the 2009 IEEE Int’l Symp. on Parallel&Distributed Processing, IPDPS 2009, pp. 1–10. IEEE Computer Society, Washington, DC (2009)
Google Scholar
Shiloach, Y., Vishkin, U.: An O(logn) parallel connectivity algorithm. J. Algs 3(1), 57–67 (1982)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

IBM TJ Watson research center, Yorktown Heights, NY, 10598, USA
Guojing Cong
CUNY High Performance Computing Center, Staten Island, New York, 10324, USA
Paul Muzio

Authors

Guojing Cong
View author publications
You can also search for this author in PubMed Google Scholar
Paul Muzio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, University of Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Luís Lopes
Vilnius University, 08663, Vilnius, Lithuania
Julius Žilinskas
Inria Rennes - Bretagne Atlantique, 35042, Rennes, France
Alexandru Costan
Inria, Campus Universitaire de Beaulieu, 35042, Rennes, France
Roberto G. Cascella
MTA SZTAKI, Budapest, Hungary
Gabor Kecskemeti
LaBRI, Inria, France
Emmanuel Jeannot
University Magna Graecia of Catanzaro, 88100, Catanzaro, Italy
Mario Cannataro
University of Pisa, Italy
Laura Ricci
Faculty of Computer Science, University of Vienna, Wien, Austria
Siegfried Benkner
Universitat Politècnica de València, Spain
Salvador Petit
ISISLab - Dipartimento di Informatica, Università di Salerno, Italy
Vittorio Scarano
High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, 70550, Stuttgart, Germany
José Gracia
Vienna University of Technology, 1040, Vienna, Austria
Sascha Hunold
Tennessee Tech University and Oak Ridge National Laboratory, 38505, Cookeville, TN, USA
Stephen L. Scott
RWTH Aachen University, Aachen, Germany
Stefan Lankes
Department of Informatics and Mathematics, University of Passau, Germany
Christian Lengauer
Universidad Carlos III de Madrid, 28911, Leganés, Spain
Jesus Carretero
TU München, 85747, Garching bei München, Germany
Jens Breitbart
TU Vienna, 1040, Vienna, Austria
Michael Alexander

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cong, G., Muzio, P. (2014). Fast Parallel Connected Components Algorithms on GPUs. In: Lopes, L., et al. Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8805. Springer, Cham. https://doi.org/10.1007/978-3-319-14325-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-14325-5_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14324-8
Online ISBN: 978-3-319-14325-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics