Abstract
Structural Graph Clustering (SCAN) is a fundamental problem in graph analysis and has received considerable attention recently. Existing distributed solutions either lack efficiency or suffer from high memory consumption when addressing this problem in billion-scale graphs. Motivated by these, in this paper, we aim to devise a distributed algorithm for SCAN that is both efficient and scalable. We first propose a fine-grained clustering framework tailored for SCAN. Based on the new framework, we devise a distributed SCAN algorithm, which not only keeps a low communication overhead during execution, but also effectively reduces the memory consumption at all time. We also devise an effective workload balance mechanism that is automatically triggered by the idle machines to handle skewed workloads. The experiment results demonstrate the efficiency and scalability of our proposed algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Birrell, A., Nelson, B.J.: Implementing remote procedure calls. ACM Trans. Comput. Syst. 2(1), 39–59 (1984)
Chang, L., Li, W., Lin, X., Qin, L., Zhang, W.: pSCAN: fast and exact structural graph clustering. In: ICDE, pp. 253–264 (2016)
Che, Y., Sun, S., Luo, Q.: Parallelizing pruning-based graph structural clustering. In: Proceedings of ICPP, pp. 1–10 (2018)
Chen, X., Peng, Y., Wang, S., Yu, J.X.: DLCR: efficient indexing for label-constrained reachability queries on large dynamic graphs. Proc. VLDB Endow. 15(8), 1645–1657 (2022)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms. MIT press (2022)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Domingos, P., Richardson, M.: Mining the network value of customers. In: Proceedings of SIGKDD, pp. 57–66 (2001)
Hao, K., Yang, Z., Lai, L., Lai, Z., Jin, X., Lin, X.: PatMat: a distributed pattern matching engine with cypher. In: Proceedings of CIKM, pp. 2921–2924 (2019)
Hao, K., Yuan, L., Zhang, W.: Distributed hop-constrained s-t simple path enumeration at billion scale. Proc. VLDB Endow. 15(2), 169–182 (2021)
Kang, U., Faloutsos, C.: Beyond ‘Caveman communities’: hubs and spokes for graph compression and mining. In: ICDM, pp. 300–309 (2011)
Kim, J., et al.: CASS: a distributed network clustering algorithm based on structure similarity for large-scale network. PLoS ONE 13(10), e0203670 (2018)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Lai, L., et al.: Distributed subgraph matching on timely dataflow. Proceed. VLDB Endow. 12(10), 1099–1112 (2019)
Liu, B., Yuan, L., Lin, X., Qin, L., Zhang, W., Zhou, J.: Efficient (\(\alpha \),\(\beta \))-core computation: an index-based approach. In: WWW, pp. 1130–1141 (2019)
Liu, B., Yuan, L., Lin, X., Qin, L., Zhang, W., Zhou, J.: Efficient (\(\alpha \), \(\beta \))-core computation in bipartite graphs. VLDB J. 29(5), 1075–1099 (2020). https://doi.org/10.1007/s00778-020-00606-9
Mazumder, S., Liu, B.: Context-aware path ranking for knowledge base completion. In: Sierra, C. (ed.) IJCAI, pp. 1195–1201 (2017)
Meng, L., Yuan, L., Chen, Z., Lin, X., Yang, S.: Index-based structural clustering on directed graphs. In: ICDE, pp. 2831–2844 (2022)
Peng, Y., Bian, S., Li, R., Wang, S., Yu, J.: Finding top-r influential communities under aggregation functions. In: ICDE, pp. 1941–1954 (2022)
Shiokawa, H., Takahashi, T.: DSCAN: distributed structural graph clustering for billion-edge graphs. In: DEXA, pp. 38–54 (2020)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: MSST, pp. 1–10 (2010)
Takahashi, T., Shiokawa, H., Kitagawa, H.: SCAN-XP: parallel structural graph clustering algorithm on intel xeon phi coprocessors. In: NDA, pp. 1–7 (2017)
Wang, K., Lin, X., Qin, L., Zhang, W., Zhang, Y.: Accelerated butterfly counting with vertex priority on bipartite graphs. VLDB J. 32, 1–25 (2022)
Wang, K., Zhang, W., Lin, X., Qin, L., Zhou, A.: Efficient personalized maximum biclique search. In: ICDE, pp. 498–511 (2022)
Wang, Y., Chakrabarti, D., Wang, C., Faloutsos, C.: Epidemic spreading in real networks: an eigenvalue viewpoint. In: SRDS, pp. 25–34 (2003)
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.: SCAN: a structural clustering algorithm for networks. In: Proceedings of SIGKDD, pp. 824–833 (2007)
Yang, Z., Lai, L., Lin, X., Hao, K., Zhang, W.: HUGE: an efficient and scalable subgraph enumeration system. In: SIGMOD, pp. 2049–2062 (2021)
Yuan, L., Qin, L., Lin, X., Chang, L., Zhang, W.: Diversified top-k clique search. VLDB J. 25(2), 171–196 (2016)
Yuan, L., Qin, L., Zhang, W., Chang, L., Yang, J.: Index-based densest clique percolation community search in networks. TKDE 30(5), 922–935 (2018)
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Zhang, J., Li, W., Yuan, L., Qin, L., Zhang, Y., Chang, L.: Shortest-path queries on complex networks: experiments, analyses, and improvement. Proc. VLDB Endow. 15(11), 2640–2652 (2022)
Zhang, J., Yuan, L., Li, W., Qin, L., Zhang, Y.: Efficient label-constrained shortest path queries on road networks: a tree decomposition approach. Proc. VLDB Endow. 15(3), 686–698 (2021)
Zhao, W., Martha, V., Xu, X.: PSCAN: a parallel structural clustering algorithm for big networks in mapreduce. In: AINA, pp. 862–869 (2013)
Zhou, Q., Wang, J.: SparkSCAN: a structure similarity clustering algorithm on spark. In: BDTA, pp. 163–177 (2015)
Acknowledgements
Long Yuan is supported by National Key RD Program of China 2022YFF0712100, NSFC61902184, and Science and Technology on Information Systems Engineering Laboratory WDZC20205250411.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hao, K., Yuan, L., Yang, Z., Zhang, W., Lin, X. (2023). Efficient and Scalable Distributed Graph Structural Clustering at Billion Scale. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13945. Springer, Cham. https://doi.org/10.1007/978-3-031-30675-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-30675-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30674-7
Online ISBN: 978-3-031-30675-4
eBook Packages: Computer ScienceComputer Science (R0)