Efficient and Scalable Distributed Graph Structural Clustering at Billion Scale

Hao, Kongzhang; Yuan, Long; Yang, Zhengyi; Zhang, Wenjie; Lin, Xuemin

doi:10.1007/978-3-031-30675-4_16

Efficient and Scalable Distributed Graph Structural Clustering at Billion Scale

Kongzhang Hao¹⁵,
Long Yuan¹⁶,
Zhengyi Yang¹⁵,
Wenjie Zhang¹⁵ &
…
Xuemin Lin¹⁷

Conference paper
First Online: 15 April 2023

1477 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13945))

Abstract

Structural Graph Clustering (SCAN) is a fundamental problem in graph analysis and has received considerable attention recently. Existing distributed solutions either lack efficiency or suffer from high memory consumption when addressing this problem in billion-scale graphs. Motivated by these, in this paper, we aim to devise a distributed algorithm for SCAN that is both efficient and scalable. We first propose a fine-grained clustering framework tailored for SCAN. Based on the new framework, we devise a distributed SCAN algorithm, which not only keeps a low communication overhead during execution, but also effectively reduces the memory consumption at all time. We also devise an effective workload balance mechanism that is automatically triggered by the idle machines to handle skewed workloads. The experiment results demonstrate the efficiency and scalability of our proposed algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Birrell, A., Nelson, B.J.: Implementing remote procedure calls. ACM Trans. Comput. Syst. 2(1), 39–59 (1984)
Article Google Scholar
Chang, L., Li, W., Lin, X., Qin, L., Zhang, W.: pSCAN: fast and exact structural graph clustering. In: ICDE, pp. 253–264 (2016)
Google Scholar
Che, Y., Sun, S., Luo, Q.: Parallelizing pruning-based graph structural clustering. In: Proceedings of ICPP, pp. 1–10 (2018)
Google Scholar
Chen, X., Peng, Y., Wang, S., Yu, J.X.: DLCR: efficient indexing for label-constrained reachability queries on large dynamic graphs. Proc. VLDB Endow. 15(8), 1645–1657 (2022)
Article Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. MIT press (2022)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Google Scholar
Domingos, P., Richardson, M.: Mining the network value of customers. In: Proceedings of SIGKDD, pp. 57–66 (2001)
Google Scholar
Hao, K., Yang, Z., Lai, L., Lai, Z., Jin, X., Lin, X.: PatMat: a distributed pattern matching engine with cypher. In: Proceedings of CIKM, pp. 2921–2924 (2019)
Google Scholar
Hao, K., Yuan, L., Zhang, W.: Distributed hop-constrained s-t simple path enumeration at billion scale. Proc. VLDB Endow. 15(2), 169–182 (2021)
Article Google Scholar
Kang, U., Faloutsos, C.: Beyond ‘Caveman communities’: hubs and spokes for graph compression and mining. In: ICDM, pp. 300–309 (2011)
Google Scholar
Kim, J., et al.: CASS: a distributed network clustering algorithm based on structure similarity for large-scale network. PLoS ONE 13(10), e0203670 (2018)
Article Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Lai, L., et al.: Distributed subgraph matching on timely dataflow. Proceed. VLDB Endow. 12(10), 1099–1112 (2019)
Article Google Scholar
Liu, B., Yuan, L., Lin, X., Qin, L., Zhang, W., Zhou, J.: Efficient (\(\alpha \),\(\beta \))-core computation: an index-based approach. In: WWW, pp. 1130–1141 (2019)
Google Scholar
Liu, B., Yuan, L., Lin, X., Qin, L., Zhang, W., Zhou, J.: Efficient (\(\alpha \), \(\beta \))-core computation in bipartite graphs. VLDB J. 29(5), 1075–1099 (2020). https://doi.org/10.1007/s00778-020-00606-9
Article Google Scholar
Mazumder, S., Liu, B.: Context-aware path ranking for knowledge base completion. In: Sierra, C. (ed.) IJCAI, pp. 1195–1201 (2017)
Google Scholar
Meng, L., Yuan, L., Chen, Z., Lin, X., Yang, S.: Index-based structural clustering on directed graphs. In: ICDE, pp. 2831–2844 (2022)
Google Scholar
Peng, Y., Bian, S., Li, R., Wang, S., Yu, J.: Finding top-r influential communities under aggregation functions. In: ICDE, pp. 1941–1954 (2022)
Google Scholar
Shiokawa, H., Takahashi, T.: DSCAN: distributed structural graph clustering for billion-edge graphs. In: DEXA, pp. 38–54 (2020)
Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: MSST, pp. 1–10 (2010)
Google Scholar
Takahashi, T., Shiokawa, H., Kitagawa, H.: SCAN-XP: parallel structural graph clustering algorithm on intel xeon phi coprocessors. In: NDA, pp. 1–7 (2017)
Google Scholar
Wang, K., Lin, X., Qin, L., Zhang, W., Zhang, Y.: Accelerated butterfly counting with vertex priority on bipartite graphs. VLDB J. 32, 1–25 (2022)
Google Scholar
Wang, K., Zhang, W., Lin, X., Qin, L., Zhou, A.: Efficient personalized maximum biclique search. In: ICDE, pp. 498–511 (2022)
Google Scholar
Wang, Y., Chakrabarti, D., Wang, C., Faloutsos, C.: Epidemic spreading in real networks: an eigenvalue viewpoint. In: SRDS, pp. 25–34 (2003)
Google Scholar
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.: SCAN: a structural clustering algorithm for networks. In: Proceedings of SIGKDD, pp. 824–833 (2007)
Google Scholar
Yang, Z., Lai, L., Lin, X., Hao, K., Zhang, W.: HUGE: an efficient and scalable subgraph enumeration system. In: SIGMOD, pp. 2049–2062 (2021)
Google Scholar
Yuan, L., Qin, L., Lin, X., Chang, L., Zhang, W.: Diversified top-k clique search. VLDB J. 25(2), 171–196 (2016)
Article Google Scholar
Yuan, L., Qin, L., Zhang, W., Chang, L., Yang, J.: Index-based densest clique percolation community search in networks. TKDE 30(5), 922–935 (2018)
Google Scholar
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar
Zhang, J., Li, W., Yuan, L., Qin, L., Zhang, Y., Chang, L.: Shortest-path queries on complex networks: experiments, analyses, and improvement. Proc. VLDB Endow. 15(11), 2640–2652 (2022)
Article Google Scholar
Zhang, J., Yuan, L., Li, W., Qin, L., Zhang, Y.: Efficient label-constrained shortest path queries on road networks: a tree decomposition approach. Proc. VLDB Endow. 15(3), 686–698 (2021)
Article Google Scholar
Zhao, W., Martha, V., Xu, X.: PSCAN: a parallel structural clustering algorithm for big networks in mapreduce. In: AINA, pp. 862–869 (2013)
Google Scholar
Zhou, Q., Wang, J.: SparkSCAN: a structure similarity clustering algorithm on spark. In: BDTA, pp. 163–177 (2015)
Google Scholar

Download references

Acknowledgements

Long Yuan is supported by National Key RD Program of China 2022YFF0712100, NSFC61902184, and Science and Technology on Information Systems Engineering Laboratory WDZC20205250411.

Author information

Authors and Affiliations

The University of New South Wales, Sydney, Australia
Kongzhang Hao, Zhengyi Yang & Wenjie Zhang
Nanjing University of Science and Technology, Nanjing, China
Long Yuan
Shanghai Jiao Tong University, Shanghai, China
Xuemin Lin

Authors

Kongzhang Hao
View author publications
You can also search for this author in PubMed Google Scholar
Long Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuemin Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kongzhang Hao .

Editor information

Editors and Affiliations

Tianjin University, Tianjin, China
Xin Wang
University of Torino, Turin, Italy
Maria Luisa Sapino
POSTECH, Pohang, Korea (Republic of)
Wook-Shin Han
University of California Santa Barbara, Santa Barbara, CA, USA
Amr El Abbadi
University of Auckland, Auckland, New Zealand
Gill Dobbie
Tianjin University, Tianjin, China
Zhiyong Feng
Beijing University of Posts and Telecommunications, Beijing, China
Yingxiao Shao
The University of Queensland, Brisbane, QLD, Australia
Hongzhi Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, K., Yuan, L., Yang, Z., Zhang, W., Lin, X. (2023). Efficient and Scalable Distributed Graph Structural Clustering at Billion Scale. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13945. Springer, Cham. https://doi.org/10.1007/978-3-031-30675-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-30675-4_16
Published: 15 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30674-7
Online ISBN: 978-3-031-30675-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics