Abstract
Graph clustering is a fundamental problem widely applied in many applications. The structural graph clustering (\(\mathsf {SCAN}\)) method obtains not only clusters but also hubs and outliers. However, the clustering results heavily depend on two parameters, \(\epsilon \) and \(\mu \), while the optimal parameter setting depends on different graph properties and various user requirements. In addition, all existing \(\mathsf {SCAN}\) solutions need to scan at least the whole graph, even if only a small number of vertices belong to clusters. In this paper, we propose an index-based method for \(\mathsf {SCAN}\). Based on our index, we cluster the graph for any \(\epsilon \) and \(\mu \) in \(O(\sum _{C\in \mathbb {C}}|E_C|)\) time, where \(\mathbb {C} \) is the result set of all clusters and \(|E_C|\) is the number of edges in a specific cluster \(C\). In other words, the time spent on computing structural clustering depends only on the result size, not on the size of the original graph. Our index’s space complexity is O(m), where m is the number of edges in the graph. To handle dynamic graph updates, we propose algorithms and several optimization techniques for maintaining our index. We also design an index for I/O efficient query processing. We conduct extensive experiments to evaluate the performance of all our proposed algorithms on 10 real-world networks, with the largest one containing more than 1 billion edges. The experimental results demonstrate that our approaches significantly outperform existing solutions.






























Similar content being viewed by others
References
Bortner, D., Han, J.: Progressive clustering of networks using structure-connected order of traversal. In: Proceedings of ICDE’10, pp. 653–656 (2010)
Chang, L., Li, W., Lin, X., Qin, L., Zhang, W.: pSCAN: fast and exact structural graph clustering. In: ICDE, pp. 253–264 (2016)
Cheng, J., Ke, Y., Chu, S., T. Özsu, M.: Efficient core decomposition in massive networks. In: ICDE, pp. 51–62 (2011)
Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks by h*-graph. In: SIGMOD, pp. 447–458 (2010)
Chiba, N., Nishizeki, T.: Arboricity and subgraph listing algorithms. SICOMP 14(1), 210–223 (1985)
Ding, C.H., He, X., Zha, H., Gu, M., Simon, H.D.: A min–max cut algorithm for graph partitioning and data clustering. In: ICDM, pp. 107–114 (2001)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. JACM 34(3), 596–615 (1987)
Guimera, R., Amaral, L.A.N.: Functional cartography of complex metabolic networks. Nature 433(7028), 895 (2005)
Huang, J., Sun, H., Han, J., Deng, H., Sun, Y., Liu, Y.: Shrink: a structural clustering algorithm for detecting hierarchical communities in networks. In: CIKM, pp. 219–228 (2010)
Jiang, P., Singh, M.: Spici: a fast clustering algorithm for large biological networks. Bioinformatics 26(8), 1105–1111 (2010)
Kang, U., Faloutsos, C.: Beyond ‘Caveman Communities’: hubs and spokes for graph compression and mining. In: ICDM, pp. 300–309 (2011)
Lee, V. E., Ruan, N., Jin, R., Aggarwal, C.: A survey of algorithms for dense subgraph discovery. In: Managing and Mining Graph Data, pp. 303–336 (2010)
Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: Linkscan*: overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303 (2014)
Mai, S.T., Dieu, M.S., Assent, I., Jacobsen, J., Kristensen, J., Birk, M.: Scalable and interactive graph clustering algorithm on multicore cpus. In: ICDE, pp. 349–360 (2017)
Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)
Shi, J., Malik, J.: Normalized cuts and image segmentation. TPAMI 22(8), 888–905 (2000)
Shiokawa, H., Fujiwara, Y., Onizuka, M.: Fast algorithm for modularity-based graph clustering. In: AAAI, pp. 1170–1176 (2013)
Shiokawa, H., Fujiwara, Y., Onizuka, M.: Scan++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. PVLDB 8(11), 1178–1189 (2015)
Shiokawa, H., Takahashi, T., Kitagawa, H.: Scalescan: scalable density-based graph clustering. In: Database and Expert Systems Applications, pp. 18–34 (2018)
Son, M. T., Amer-Yahia, S., Assent, I., Birk, M., Storgaard Dieu, M. Jacobsen, J., Kristensen, J.: Scalable interactive dynamic graph clustering on multicore CPUs. In: TKDE (2018)
Sun, H., Huang, J., Han, J. Deng, H., Zhao, P., Feng, B.: gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In: ICDM, pp. 481–490 (2010)
Takahashi, T., Shiokawa, H., Kitagawa, H.: SCAN-XP: parallel structural graph clustering algorithm on Intel Xeon Phi coprocessors. In: Proceedings of the 2nd International Workshop on Network Data Analytics, NDA, pp. 6:1–6:7 (2017)
Tsourakakis, C., Bonchi, F., Gionis, A., Gullo, F., Tsiarli, M.: Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: KDD, pp. 104–112. ACM (2013)
Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: ICDE (2014)
Wang, N., Zhang, J., Tan, K.-L., Tung, A.K.: On triangulation-based dense neighborhood graph discovery. PVLDB 4(2), 58–68 (2010)
Wen, D., Qin, L., Zhang, Y., Chang, L., Lin, X.: Efficient structural graph clustering: an index-based approach. PVLDB 11(3), 243–255 (2017)
Wen, D., Qin, L., Zhang, Y., Lin, X., Yu, J.X.: I/o efficient core graph decomposition at web scale. In: ICDE, pp. 133–144 (2016)
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007)
Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/o efficient depth-first search. In: SIGMOD, pp. 445–458 (2015)
Zhao, W., Chen, G., Xu, X.: AnySCAN: an efficient anytime framework with active learning for large-scale network clustering. In: ICDM, pp. 665–674 (2017)
Zhao, W., Martha, V., Xu, X.: PSCAN: a parallel structural clustering algorithm for big networks in MapReduce. In: AINA, pp. 862–869 (2013)
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)
Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17, 73–83 (1996)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wen, D., Qin, L., Zhang, Y. et al. Efficient structural graph clustering: an index-based approach. The VLDB Journal 28, 377–399 (2019). https://doi.org/10.1007/s00778-019-00541-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-019-00541-4