Efficient structural graph clustering: an index-based approach

Wen, Dong; Qin, Lu; Zhang, Ying; Chang, Lijun; Lin, Xuemin

doi:10.1007/s00778-019-00541-4

Efficient structural graph clustering: an index-based approach

Regular Paper
Published: 08 May 2019

Volume 28, pages 377–399, (2019)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Dong Wen ORCID: orcid.org/0000-0002-0903-1503¹,
Lu Qin¹,
Ying Zhang²,
Lijun Chang³ &
…
Xuemin Lin⁴

1024 Accesses
15 Citations
Explore all metrics

Abstract

Graph clustering is a fundamental problem widely applied in many applications. The structural graph clustering (\(\mathsf {SCAN}\)) method obtains not only clusters but also hubs and outliers. However, the clustering results heavily depend on two parameters, \(\epsilon \) and \(\mu \), while the optimal parameter setting depends on different graph properties and various user requirements. In addition, all existing \(\mathsf {SCAN}\) solutions need to scan at least the whole graph, even if only a small number of vertices belong to clusters. In this paper, we propose an index-based method for \(\mathsf {SCAN}\). Based on our index, we cluster the graph for any \(\epsilon \) and \(\mu \) in \(O(\sum _{C\in \mathbb {C}}|E_C|)\) time, where \(\mathbb {C} \) is the result set of all clusters and \(|E_C|\) is the number of edges in a specific cluster \(C\). In other words, the time spent on computing structural clustering depends only on the result size, not on the size of the original graph. Our index’s space complexity is O(m), where m is the number of edges in the graph. To handle dynamic graph updates, we propose algorithms and several optimization techniques for maintaining our index. We also design an index for I/O efficient query processing. We conduct extensive experiments to evaluate the performance of all our proposed algorithms on 10 real-world networks, with the largest one containing more than 1 billion edges. The experimental results demonstrate that our approaches significantly outperform existing solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

A survey of density based clustering algorithms

Article 29 September 2020

A comprehensive survey on community detection methods and applications in complex information networks

Article 18 April 2024

Notes

References

Bortner, D., Han, J.: Progressive clustering of networks using structure-connected order of traversal. In: Proceedings of ICDE’10, pp. 653–656 (2010)
Chang, L., Li, W., Lin, X., Qin, L., Zhang, W.: pSCAN: fast and exact structural graph clustering. In: ICDE, pp. 253–264 (2016)
Cheng, J., Ke, Y., Chu, S., T. Özsu, M.: Efficient core decomposition in massive networks. In: ICDE, pp. 51–62 (2011)
Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks by h*-graph. In: SIGMOD, pp. 447–458 (2010)
Chiba, N., Nishizeki, T.: Arboricity and subgraph listing algorithms. SICOMP 14(1), 210–223 (1985)
Article MathSciNet MATH Google Scholar
Ding, C.H., He, X., Zha, H., Gu, M., Simon, H.D.: A min–max cut algorithm for graph partitioning and data clustering. In: ICDM, pp. 107–114 (2001)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
Article MathSciNet Google Scholar
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. JACM 34(3), 596–615 (1987)
Article MathSciNet MATH Google Scholar
Guimera, R., Amaral, L.A.N.: Functional cartography of complex metabolic networks. Nature 433(7028), 895 (2005)
Article Google Scholar
Huang, J., Sun, H., Han, J., Deng, H., Sun, Y., Liu, Y.: Shrink: a structural clustering algorithm for detecting hierarchical communities in networks. In: CIKM, pp. 219–228 (2010)
Jiang, P., Singh, M.: Spici: a fast clustering algorithm for large biological networks. Bioinformatics 26(8), 1105–1111 (2010)
Article Google Scholar
Kang, U., Faloutsos, C.: Beyond ‘Caveman Communities’: hubs and spokes for graph compression and mining. In: ICDM, pp. 300–309 (2011)
Lee, V. E., Ruan, N., Jin, R., Aggarwal, C.: A survey of algorithms for dense subgraph discovery. In: Managing and Mining Graph Data, pp. 303–336 (2010)
Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.-G.: Linkscan*: overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303 (2014)
Mai, S.T., Dieu, M.S., Assent, I., Jacobsen, J., Kristensen, J., Birk, M.: Scalable and interactive graph clustering algorithm on multicore cpus. In: ICDE, pp. 349–360 (2017)
Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Article Google Scholar
Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007)
Article MATH Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. TPAMI 22(8), 888–905 (2000)
Article Google Scholar
Shiokawa, H., Fujiwara, Y., Onizuka, M.: Fast algorithm for modularity-based graph clustering. In: AAAI, pp. 1170–1176 (2013)
Shiokawa, H., Fujiwara, Y., Onizuka, M.: Scan++: efficient algorithm for finding clusters, hubs and outliers on large-scale graphs. PVLDB 8(11), 1178–1189 (2015)
Google Scholar
Shiokawa, H., Takahashi, T., Kitagawa, H.: Scalescan: scalable density-based graph clustering. In: Database and Expert Systems Applications, pp. 18–34 (2018)
Son, M. T., Amer-Yahia, S., Assent, I., Birk, M., Storgaard Dieu, M. Jacobsen, J., Kristensen, J.: Scalable interactive dynamic graph clustering on multicore CPUs. In: TKDE (2018)
Sun, H., Huang, J., Han, J. Deng, H., Zhao, P., Feng, B.: gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In: ICDM, pp. 481–490 (2010)
Takahashi, T., Shiokawa, H., Kitagawa, H.: SCAN-XP: parallel structural graph clustering algorithm on Intel Xeon Phi coprocessors. In: Proceedings of the 2nd International Workshop on Network Data Analytics, NDA, pp. 6:1–6:7 (2017)
Tsourakakis, C., Bonchi, F., Gionis, A., Gullo, F., Tsiarli, M.: Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: KDD, pp. 104–112. ACM (2013)
Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
Google Scholar
Wang, L., Xiao, Y., Shao, B., Wang, H.: How to partition a billion-node graph. In: ICDE (2014)
Wang, N., Zhang, J., Tan, K.-L., Tung, A.K.: On triangulation-based dense neighborhood graph discovery. PVLDB 4(2), 58–68 (2010)
Google Scholar
Wen, D., Qin, L., Zhang, Y., Chang, L., Lin, X.: Efficient structural graph clustering: an index-based approach. PVLDB 11(3), 243–255 (2017)
Google Scholar
Wen, D., Qin, L., Zhang, Y., Lin, X., Yu, J.X.: I/o efficient core graph decomposition at web scale. In: ICDE, pp. 133–144 (2016)
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algorithm for networks. In: KDD, pp. 824–833 (2007)
Zhang, Z., Yu, J.X., Qin, L., Shang, Z.: Divide & conquer: I/o efficient depth-first search. In: SIGMOD, pp. 445–458 (2015)
Zhao, W., Chen, G., Xu, X.: AnySCAN: an efficient anytime framework with active learning for large-scale network clustering. In: ICDM, pp. 665–674 (2017)
Zhao, W., Martha, V., Xu, X.: PSCAN: a parallel structural clustering algorithm for big networks in MapReduce. In: AINA, pp. 862–869 (2013)
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)
Google Scholar
Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17, 73–83 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Artificial Intelligence, University of Technology Sydney, Sydney, Australia
Dong Wen & Lu Qin
Zhejiang Gongshang University, Hangzhou, China
Ying Zhang
The University of Sydney, Sydney, Australia
Lijun Chang
The University of New South Wales, Sydney, Australia
Xuemin Lin

Authors

Dong Wen
View author publications
You can also search for this author in PubMed Google Scholar
Lu Qin
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lijun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Xuemin Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wen, D., Qin, L., Zhang, Y. et al. Efficient structural graph clustering: an index-based approach. The VLDB Journal 28, 377–399 (2019). https://doi.org/10.1007/s00778-019-00541-4

Download citation

Received: 06 November 2018
Revised: 09 April 2019
Accepted: 20 April 2019
Published: 08 May 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s00778-019-00541-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient structural graph clustering: an index-based approach

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

A survey of density based clustering algorithms

A comprehensive survey on community detection methods and applications in complex information networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient structural graph clustering: an index-based approach

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

A survey of density based clustering algorithms

A comprehensive survey on community detection methods and applications in complex information networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation