research-article

Open Access

An Efficient Algorithm for Distance-based Structural Graph Clustering

Authors:
Kaixin Liu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0003-4939-0313
View Profile

,
Sibo Wang

The Chinese University of Hong Kong, Hong Kong, China

The Chinese University of Hong Kong, Hong Kong, China

0000-0003-1892-6971
View Profile

,
Yong Zhang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0001-8803-2055
View Profile

,
Chunxiao Xing

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0001-9390-3097
View Profile

Authors Info & Claims

Proceedings of the ACM on Management of Data Volume 1 Issue 1Article No.: 45pp 1–25https://doi.org/10.1145/3588725

Published:30 May 2023Publication History

Proceedings of the ACM on Management of Data

Abstract

Structural graph clustering (SCAN) is a classic graph clustering algorithm. In SCAN, a key step is to compute the structural similarity between vertices according to the overlap ratio of one-hop neighborhoods. Given two vertices u and v, existing studies only consider the case when u and v are neighbors. However, the structural similarity between non-neighboring vertices in SCAN is always zero, and using only one-hop neighbors on weighted graphs discards the weights on each edge. Both may not reflect the true closeness of two vertices and may fail to return high-quality clustering results.

To tackle this issue, we define and study the distance-based structural graph clustering problem. Given a distance threshold d and two vertices u and v, the structural similarity between u and v is defined as the ratio of their respective neighbors within a distance of no more than d. We show that the newly defined distance-based SCAN achieves better clustering results compared to the vanilla version of SCAN. However, the new definition brings challenges in the computation of final clustering results. To tackle this efficiency issue, we propose DistanceSCAN, an efficient approximate algorithm for solving the distance-based SCAN problem. The main idea of DistanceSCAN is to use all-distances bottom-k sketches (ADS) to speed up the computation of similarities. Given the ADS, we can derive the similarity between two vertices with a bounded cost of O(k).

However, to ensure that the estimated similarity has an approximation guarantee, the value of k still needs to be set to as large as thousands. This brings high computational costs when computing the similarities between neighboring vertices. To tackle this issue, we further construct histograms to prune the structural similarity computations of vertices pairs. Extensive experiments on real datasets validate the effectiveness and efficiency of DistanceSCAN.

Supplemental Material

PACMMOD-V1mod045.mp4

mp4

21.7 MB

Download

References

Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: Ordering Points To Identify the Clustering Structure. In SIGMOD. 49--60.Google Scholar
Kevin Aydin, Mohammad Hossein Bateni, and Vahab S. Mirrokni. 2016. Distributed Balanced Partitioning via Linear Embedding. In WSDM. 387--396.Google Scholar
Rémi Bardenet and Odalric-Ambrym Maillard. 2015. Concentration inequalities for sampling without replacement. Bernoulli, Vol. 21, 3 (2015), 1361--1385.Google ScholarCross Ref
Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: compression techniques. In WWW. 595--602.Google Scholar
Dustin Bortner and Jiawei Han. 2010. Progressive clustering of networks using Structure-Connected Order of Traversal. In ICDE. 653--656.Google Scholar
Lijun Chang, Wei Li, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. pSCAN: Fast and exact structural graph clustering. In ICDE. 253--264.Google Scholar
Yulin Che, Shixuan Sun, and Qiong Luo. 2018. Parallelizing Pruning-based Graph Structural Clustering. In ICPP. 77:1--77:10.Google Scholar
Edith Cohen. 2015. All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis. TKDE, Vol. 27, 9 (2015), 2320--2334.Google Scholar
Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, Jeffrey D. Ullman, and Cheng Yang. 2001. Finding Interesting Associations without Support Pruning. TKDE, Vol. 13, 1 (2001), 64--78.Google ScholarDigital Library
Edith Cohen and Haim Kaplan. 2007. Summarizing data using bottom-k sketches. In PODC. 225--234.Google Scholar
Chris H. Q. Ding, Xiaofeng He, Hongyuan Zha, Ming Gu, and Horst D. Simon. 2001. A Min-max Cut Algorithm for Graph Partitioning and Data Clustering. In ICDM. 107--114.Google Scholar
Pedro M. Domingos and Matthew Richardson. 2001. Mining the network value of customers. In SIGKDD. 57--66.Google Scholar
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. Density-based spatial clustering of applications with noise. In Int. Conf. Knowledge Discovery and Data Mining, Vol. 240. 6.Google Scholar
Michelle Girvan and Mark EJ Newman. 2002. Community structure in social and biological networks. Proceedings of the national academy of sciences, Vol. 99, 12 (2002), 7821--7826.Google ScholarCross Ref
Jianbin Huang, Heli Sun, Jiawei Han, Hongbo Deng, Yizhou Sun, and Yaguang Liu. 2010. SHRINK: a structural clustering algorithm for detecting hierarchical communities in networks. In CIKM. 219--228.Google Scholar
Jianbin Huang, Heli Sun, Qinbao Song, Hongbo Deng, and Jiawei Han. 2013. Revealing Density-Based Clustering Structure from the Core-Connected Tree of a Network. IEEE Trans. Knowl. Data Eng., Vol. 25, 8 (2013), 1876--1889.Google ScholarDigital Library
Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of classification, Vol. 2, 1 (1985), 193--218.Google ScholarCross Ref
Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist, Vol. 11, 2 (1912), 37--50.Google Scholar
U Kang and Christos Faloutsos. 2011. Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining. In ICDM. 300--309.Google Scholar
Jure Leskovec and Rok Sosic. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Trans. Intell. Syst. Technol., Vol. 8, 1 (2016), 1:1--1:20.Google ScholarDigital Library
Mark EJ Newman. 2004 a. Analysis of weighted networks. Physical review E, Vol. 70, 5 (2004), 056131.Google Scholar
Mark EJ Newman. 2004 b. Fast algorithm for detecting community structure in networks. Physical review E, Vol. 69, 6 (2004), 066133.Google Scholar
Boyu Ruan, Junhao Gan, Hao Wu, and Anthony Wirth. 2021. Dynamic Structural Clustering on Graphs. In SIGMOD. 1491--1503.Google Scholar
Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2015. SCAN: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Large-scale Graphs. Proc. VLDB Endow., Vol. 8, 11 (2015), 1178--1189.Google ScholarDigital Library
Tomokatsu Takahashi, Hiroaki Shiokawa, and Hiroyuki Kitagawa. 2017. SCAN-XP: Parallel Structural Graph Clustering Algorithm on Intel Xeon Phi Coprocessors. In NDA@SIGMOD. 6:1--6:7.Google Scholar
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: extraction and mining of academic social networks. In SIGKDD. 990--998.Google Scholar
Tom Tseng, Laxman Dhulipala, and Julian Shun. 2021. Parallel Index-Based Structural Graph Clustering and Its Approximation. In SIGMOD. 1851--1864.Google Scholar
Yang Wang, Deepayan Chakrabarti, Chenxi Wang, and Christos Faloutsos. 2003. Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint. In SRDS. 25--34.Google Scholar
Dong Wen, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2019. Efficient structural graph clustering: an index-based approach. VLDB J., Vol. 28, 3 (2019), 377--399.Google ScholarDigital Library
Changfa Wu, Yu Gu, and Ge Yu. 2019. DPSCAN: Structural Graph Clustering Based on Density Peaks. In DASFAA, Vol. 11447. 626--641.Google Scholar
Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas A. J. Schweiger. 2007. SCAN: a structural clustering algorithm for networks. In SIGKDD. 824--833.Google Scholar

Index Terms

An Efficient Algorithm for Distance-based Structural Graph Clustering
1. Theory of computation
  1. Design and analysis of algorithms
    1. Graph algorithms analysis

Recommendations

Stable structural clustering in uncertain graphs
Abstract
The uncertain graph is widely used to model and analyze graph data in which the relation between objects is uncertain. We here study the structural clustering in uncertain graphs. As an important method in graph clustering, structural ...
Read More
A Graph Distance Based Structural Clustering Approach for Networks
AI '09: Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence

In the era of information explosion, structured data emerge on a large scale. As a description of structured data, network has drawn attention of researchers in many subjects. Network clustering, as an essential part of this study area, focuses on ...
Read More
Graph clustering based on structural/attribute similarities

The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Management of Data Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Editor:
Divyakant Agrawal
UC Santa Barbara, United States
Issue’s Table of Contents
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 May 2023
Published in pacmmod Volume 1, Issue 1

Permissions
Request permissions about this article.
Request Permissions
Author Tags
all-distances sketches
structural clustering
weighted graph
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 371
  Total Downloads
- Downloads (Last 12 months)371
- Downloads (Last 6 weeks)62
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An Efficient Algorithm for Distance-based Structural Graph Clustering

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Stable structural clustering in uncertain graphs

A Graph Distance Based Structural Clustering Approach for Networks

Graph clustering based on structural/attribute similarities

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An Efficient Algorithm for Distance-based Structural Graph Clustering

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Stable structural clustering in uncertain graphs

A Graph Distance Based Structural Clustering Approach for Networks

Graph clustering based on structural/attribute similarities

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media