Skip to main content
Log in

Efficient structural graph clustering: an index-based approach

The VLDB Journal Aims and scope Submit manuscript

Abstract

Graph clustering is a fundamental problem widely applied in many applications. The structural graph clustering (\(\mathsf {SCAN}\)) method obtains not only clusters but also hubs and outliers. However, the clustering results heavily depend on two parameters, \(\epsilon \) and \(\mu \), while the optimal parameter setting depends on different graph properties and various user requirements. In addition, all existing \(\mathsf {SCAN}\) solutions need to scan at least the whole graph, even if only a small number of vertices belong to clusters. In this paper, we propose an index-based method for \(\mathsf {SCAN}\). Based on our index, we cluster the graph for any \(\epsilon \) and \(\mu \) in \(O(\sum _{C\in \mathbb {C}}|E_C|)\) time, where \(\mathbb {C} \) is the result set of all clusters and \(|E_C|\) is the number of edges in a specific cluster \(C\). In other words, the time spent on computing structural clustering depends only on the result size, not on the size of the original graph. Our index’s space complexity is O(m), where m is the number of edges in the graph. To handle dynamic graph updates, we propose algorithms and several optimization techniques for maintaining our index. We also design an index for I/O efficient query processing. We conduct extensive experiments to evaluate the performance of all our proposed algorithms on 10 real-world networks, with the largest one containing more than 1 billion edges. The experimental results demonstrate that our approaches significantly outperform existing solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30

Similar content being viewed by others

Notes

  1. http://snap.stanford.edu/index.html.

  2. http://webgraph.di.unimi.it/.

References

  1. Bortner, D., Han, J.: Progressive clustering of networks using structure-connected order of traversal. In: Proceedings of ICDE’10, pp. 653–656 (2010)

  2. Chang, L., Li, W., Lin, X., Qin, L., Zhang, W.: pSCAN: fast and exact structural graph clustering. In: ICDE, pp. 253–264 (2016)

  3. Cheng, J., Ke, Y., Chu, S., T. Özsu, M.: Efficient core decomposition in massive networks. In: ICDE, pp. 51–62 (2011)

  4. Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks by h*-graph. In: SIGMOD, pp. 447–458 (2010)

  5. Chiba, N., Nishizeki, T.: Arboricity and subgraph listing algorithms. SICOMP 14(1), 210–223 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ding, C.H., He, X., Zha, H., Gu, M., Simon, H.D.: A min–max cut algorithm for graph partitioning and data clustering. In: ICDM, pp. 107–114 (2001)

  7. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  8. Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. JACM 34(3), 596–615 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  9. Guimera, R., Amaral, L.A.N.: Functional cartography of complex metabolic networks. Nature 433(7028), 895 (2005)

    Article  Google Scholar 

  10. Huang, J., Sun, H., Han, J., Deng, H., Sun, Y., Liu, Y.: Shrink: a structural clustering algorithm for detecting hierarchical communities in networks. In: CIKM, pp. 219–228 (2010)

  11. Jiang, P., Singh, M.: Spici: a fast clustering algorithm for large biological networks. Bioinformatics 26(8), 1105–1111 (2010)

    Article  Google Scholar 

  12. Kang, U., Faloutsos, C.: Beyond ‘Caveman Communities’: hubs and spokes for graph compression and mining. In: ICDM, pp. 300–309 (2011)

  13. Lee, V. E., Ruan, N., Jin, R., Aggarwal, C.: A survey of algorithms for dense subgraph discovery. In: Managing and Mining Graph Data, pp. 303–336 (2010)

  14. Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: Linkscan*: overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303 (2014)

  15. Mai, S.T., Dieu, M.S., Assent, I., Jacobsen, J., Kristensen, J., Birk, M.: Scalable and interactive graph clustering algorithm on multicore cpus. In: ICDE, pp. 349–360 (2017)

  16. Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)

    Article  Google Scholar 

  17. Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)

    Article  MATH  Google Scholar 

  18. Shi, J., Malik, J.: Normalized cuts and image segmentation. TPAMI 22(8), 888–905 (2000)

    Article  Google Scholar 

  19. Shiokawa, H., Fujiwara, Y., Onizuka, M.: Fast algorithm for modularity-based graph clustering. In: AAAI, pp. 1170–1176 (2013)

  20. Shiokawa, H., Fujiwara, Y., Onizuka, M.: Scan++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. PVLDB 8(11), 1178–1189 (2015)

    Google Scholar 

  21. Shiokawa, H., Takahashi, T., Kitagawa, H.: Scalescan: scalable density-based graph clustering. In: Database and Expert Systems Applications, pp. 18–34 (2018)

  22. Son, M. T., Amer-Yahia, S., Assent, I., Birk, M., Storgaard Dieu, M. Jacobsen, J., Kristensen, J.: Scalable interactive dynamic graph clustering on multicore CPUs. In: TKDE (2018)

  23. Sun, H., Huang, J., Han, J. Deng, H., Zhao, P., Feng, B.: gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In: ICDM, pp. 481–490 (2010)

  24. Takahashi, T., Shiokawa, H., Kitagawa, H.: SCAN-XP: parallel structural graph clustering algorithm on Intel Xeon Phi coprocessors. In: Proceedings of the 2nd International Workshop on Network Data Analytics, NDA, pp. 6:1–6:7 (2017)

  25. Tsourakakis, C., Bonchi, F., Gionis, A., Gullo, F., Tsiarli, M.: Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: KDD, pp. 104–112. ACM (2013)

  26. Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)

    Google Scholar 

  27. Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: ICDE (2014)

  28. Wang, N., Zhang, J., Tan, K.-L., Tung, A.K.: On triangulation-based dense neighborhood graph discovery. PVLDB 4(2), 58–68 (2010)

    Google Scholar 

  29. Wen, D., Qin, L., Zhang, Y., Chang, L., Lin, X.: Efficient structural graph clustering: an index-based approach. PVLDB 11(3), 243–255 (2017)

    Google Scholar 

  30. Wen, D., Qin, L., Zhang, Y., Lin, X., Yu, J.X.: I/o efficient core graph decomposition at web scale. In: ICDE, pp. 133–144 (2016)

  31. Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007)

  32. Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/o efficient depth-first search. In: SIGMOD, pp. 445–458 (2015)

  33. Zhao, W., Chen, G., Xu, X.: AnySCAN: an efficient anytime framework with active learning for large-scale network clustering. In: ICDM, pp. 665–674 (2017)

  34. Zhao, W., Martha, V., Xu, X.: PSCAN: a parallel structural clustering algorithm for big networks in MapReduce. In: AINA, pp. 862–869 (2013)

  35. Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)

    Google Scholar 

  36. Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17, 73–83 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, D., Qin, L., Zhang, Y. et al. Efficient structural graph clustering: an index-based approach. The VLDB Journal 28, 377–399 (2019). https://doi.org/10.1007/s00778-019-00541-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00541-4

Keywords

Navigation