Skip to main content
Log in

Distributed computing connected components with linear communication cost

Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The paper studies three fundamental problems in graph analytics, computing connected components (CCs), biconnected components (BCCs), and 2-edge-connected components (ECCs) of a graph. With the recent advent of big data, developing efficient distributed algorithms for computing CCs, BCCs and ECCs of a big graph has received increasing interests. As with the existing research efforts, we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur \(O(m\times \#\text {supersteps})\) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to compute CCs and BCCs with O(m) total communication cost. The total computation costs of our techniques are also smaller than that of the existing techniques in practice, though theoretically almost the same. Moreover, we also study distributed computing ECCs. We are the first to study this problem and an approach with O(m) total communication cost is proposed. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. http://newsroom.fb.com/company-info/.

  2. http://law.di.unimi.it/datasets.php.

  3. The total data communication cost of \(\mathsf {single\text {-}pivot}\) becomes O(m) in practice if the input graph contains one giant CC and other small CCs.

  4. Note that we confirmed with the authors of [38] through private communication that the data communication cost of \(\mathsf {S\text {-}V}\) per superstep is O(m), which is misspelled as O(n) in [38].

  5. http://www.cse.cuhk.edu.hk/pregelplus/.

  6. https://snap.stanford.edu/data/roadNet-CA.html.

  7. http://law.di.unimi.it/datasets.php.

  8. https://webscope.sandbox.yahoo.com/catalog.php?datatype=g.

  9. http://www.cse.psu.edu/~kxm85/software/GTgraph/.

References

  1. Awerbuch, B., Shiloach, Y.: New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Trans. Comput. 10, 1258–1263 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  2. Ceccarello, M., Pietracaprina, A., Pucci, G., Upfal, E.: Space and time efficient parallel graph decomposition, clustering and diameter approximation. In: Proceedings of SPAA’15 (2015)

  3. Chang, L., Yu, J. X., Qin, L., Lin, X., Liu, C., Liang, W.: Efficiently computing k-edge connected components via graph decomposition. In: Proceedings of SIGMOD’13 (2013)

  4. Ching, A., Kunz, C.: Giraph: large-scale graph processing infrastructure on hadoop. Hadoop Summit (2011)

  5. Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11, 29–41 (2009)

    Article  Google Scholar 

  6. Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. McGraw-Hill Higher Education, New York (2001)

    MATH  Google Scholar 

  7. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of OSDI’04 (2004)

  8. Feng, X., Chang, L., Lin, X., Qin, L., Zhang, W.: Computing connected components with linear communication cost in pregel-like systems. In: Proceedings of ICDE’16 (2016)

  9. Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  10. Gibbons, A.: Algorithmic Graph Theory. Cambridge University Press, Cambridge (1985)

    MATH  Google Scholar 

  11. Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: Proceedings of OSDI’14 (2014)

  12. Henry, N., Bezerianos, A., Fekete, J.: Improving the readability of clustered social networks using node duplication. IEEE Trans. Vis. Comput. Graph. 14, 1317–1324 (2008)

    Article  Google Scholar 

  13. Hopcroft, J.E., Tarjan, R.E.: Efficient algorithms for graph manipulation [H] (algorithm 447). Commun. ACM 16, 372–378 (1973)

    Article  Google Scholar 

  14. Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system. In: Proceedings of ICDM’09 (2009)

  15. Kang, U., McGlohon, M., Akoglu, L., Faloutsos, C.: Patterns on the connected components of terabyte-scale graphs. In: Proceedings of ICDM’10 (2010)

  16. Kavitha, T., Liebchen, C., Mehlhorn, K., Michail, D., Rizzi, R., Ueckerdt, T., Zweig, K.A.: Cycle bases in graphs characterization, algorithms, complexity, and applications. Comput. Sci. Rev. 3, 199–243 (2009)

    Article  MATH  Google Scholar 

  17. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5, 716–727 (2012)

    Google Scholar 

  18. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD’10 (2010)

  19. Miller, G.L., Peng, R., Xu, S.C.: Parallel graph decompositions using random shifts. In: Proceedings of SPAA’13 (2013)

  20. Miller, G.L., Ramachandran, V.: Efficient parallel ear decomposition with applications. MSRI 135, 162 (1986)

    Google Scholar 

  21. Munagala, K., Ranade, A.G.: I/o-complexity of graph algorithms. In: Proceedings of SODA’99 (1999)

  22. Nanda, S., Kotz, D.: Localized bridging centrality for distributed network analysis. In: Proceedings of ICCCN’08 (2008)

  23. Przulj, N., Wigle, D., Jurisica, I.: Functional topology in a network of protein interactions. Bioinformatics 20, 340–348 (2004)

    Article  Google Scholar 

  24. Qin, L., Yu, J.X., Chang, L., Cheng, H., Zhang, C., Lin, X.: Scalable big graph processing in mapreduce. In: Proceedings of SIGMOD’14 (2014)

  25. Ramachandran, V.: Parallel open ear decomposition with applications to graph biconnectivity and triconnectivity. Citeseer (1992)

  26. Rastogi, V., Machanavajjhala, A., Chitnis, L., Sarma, A.D.: Finding connected components in map-reduce in logarithmic rounds. In: Proceedings of ICDE’13 (2013)

  27. Salihoglu, S., Widom, J.: Optimizing graph algorithms on Pregel-like systems. PVLDB 7, 577–588 (2014)

    Google Scholar 

  28. Sanders, P.: Random permutations on distributed, external and hierarchical memory. Inf. Process. Lett. 67, 305–309 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  29. Schmidt, J.M.: A simple test on 2-vertex- and 2-edge-connectivity. Inf. Process. Lett. 113, 241–244 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  30. Shiloach, Y., Vishkin, U.: An o(log n) parallel connectivity algorithm. J. Algorithms 3, 57–67 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  31. Slota, G.M., Madduri, K.: Simple parallel biconnectivity algorithms for multicore platforms. In: Proceedings of HiPC’14 (2014)

  32. Stanton, I.: Streaming balanced graph partitioning algorithms for random graphs. In: Proceedings of SODA’14 (2014)

  33. Tarjan, R.E., Vishkin, U.: An efficient parallel biconnectivity algorithm. SIAM J. Comput. 14, 862–874 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  34. Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex” to “think like a graph”. PVLDB 7, 193–204 (2013)

    Google Scholar 

  35. Turau, V.: Computing bridges, articulations, and 2-connected components in wireless sensor networks. In: Proceedings of ALGOSENSORS’06 (2006)

  36. Valiant, L.G.: A bridging model for parallel computation. Commun ACM 33, 103–111 (1990)

    Article  Google Scholar 

  37. Vega, D., Cerda-Alabern, L., Navarro, L., Meseguer, R.: Topology patterns of a community network: Guifi.net. In: Proceedings of WiMob’12 (2012)

  38. Yan, D., Cheng, J., Xing, K., Lu, Y., Ng, W., Bu, Y.: Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB 7, 1821–1832 (2014)

    Google Scholar 

  39. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of NSDI’12 (2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xing Feng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, X., Chang, L., Lin, X. et al. Distributed computing connected components with linear communication cost. Distrib Parallel Databases 36, 555–592 (2018). https://doi.org/10.1007/s10619-018-7232-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-018-7232-6

Keywords

Navigation