Abstract
Motif-based graph local clustering (MGLC) is a popular method for graph mining tasks due to its various applications. However, the traditional two-phase approach of precomputing motif weights before performing local clustering loses locality and is impractical for large graphs. While some attempts have been made to address the efficiency bottleneck, there is still no applicable algorithm for large scale graphs with billions of edges. In this paper, we propose a purely local and index-free method called Index-free Triangle-based Graph Local Clustering (TGLC*) to solve the MGLC problem w.r.t. a triangle. TGLC* directly estimates the Personalized PageRank (PPR) vector using random walks with the desired triangle-weighted distribution and proposes the clustering result using a standard sweep procedure. We demonstrate TGLC*’s scalability through theoretical analysis and its practical benefits through a novel visualization layout. TGLC* is the first algorithm to solve the MGLC problem without precomputing the motif weight. Extensive experiments on seven real-world large-scale datasets show that TGLC* is applicable and scalable for large graphs.
Similar content being viewed by others
References
Spielman D A, Teng S H. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 2004, 81–90
Andersen R, Chung F, Lang K. Local graph partitioning using pagerank vectors. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science. 2006, 475–486
Liao C S, Lu K, Baym M, Singh R, Berger B. IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics, 2009, 25(12): i253–i258
Li S, Gentile C, Karatzoglou A. Graph clustering bandits for recommendation. 2016, arXiv preprint arXiv: 1605.00596
Feng Y, Yu S, Zhang K, Li X, Ning Z. COMICS: a community property-based triangle motif clustering scheme. PeerJ Computer Science, 2019, 5: e180
Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888–905
Newman M E J. Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 2006, 103(23): 8577–8582
Vinh N X, Epps J, Bailey J. Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. The Journal of Machine Learning Research, 2010, 11: 2837–2854
Kobourov S G, Pupyrev S, Simonetto P. Visualizing graphs as maps with contiguous regions. In: Proceedings of the 16th Eurographics Conference on Visualization. 2014, 31–35
Yang J, Leskovec J. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 2015, 42(1): 181–213
Cheeger J. A lower bound for the smallest eigenvalue of the Laplacian. In: Gunning R C, ed. Problems in Analysis. Princeton: Princeton University Press, 1971, 195–200
Cox I J, Rao S B, Zhong Y. “Ratio regions”: a technique for image segmentation. In: Proceedings of the 13th International Conference on Pattern Recognition. 1996, 557–564
Benson A R, Gleich D F, Leskovec J. Higher-order organization of complex networks. Science, 2016, 353(6295): 163–166
Tsourakakis C E, Pachocki J, Mitzenmacher M. Scalable motif-aware graph clustering. In: Proceedings of the 26th International Conference on World Wide Web. 2017, 1451–1460
Watts D J, Strogatz S H. Collective dynamics of ‘small-world’ networks. Nature, 1998, 393(6684): 440–442
Krämer N C, Sauer V, Ellison N. The strength of weak ties revisited: further evidence of the role of strong ties in the provision of online social support. Social Media + Society, 2021, 7(2): 20563051211024958
Yao S. Application of data mining technology in financial fraud identification. In: Proceedings of the 4th International Conference on Information Systems and Computer Aided Education. 2021, 2919–2922
Zhou S, Yang X, Chang Q. Spatial clustering analysis of green economy based on knowledge graph. Journal of Intelligent & Fuzzy Systems, 2021, doi: https://doi.org/10.3233/jifs-219074
Foysal K H, Chang H J, Bruess F, Chong J W. SmartFit: smartphone application for garment fit detection. Electronics, 2021, 10(1): 97
Zhu D, Shen G, Chen J, Zhou W, Kong X. A higher-order motif-based spatiotemporal graph imputation approach for transportation networks. Wireless Communications and Mobile Computing, 2022, 2022: 1702170
Yin H, Benson A R, Leskovec J, Gleich D F. Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017, 555–564
Leskovec J, Sosič R. SNAP: a general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology, 2017, 8(1): 1
Ma W, Cai L, He T, Chen L, Cao Z, Li R. Local expansion and optimization for higher-order graph clustering. IEEE Internet of Things Journal, 2019, 6(5): 8702–8713
Huang S, Li Y, Bao Z, Li Z. Towards efficient motif-based graph partitioning: an adaptive sampling approach. In: Proceedings of the 37th International Conference on Data Engineering. 2021, 528–539
Miller R B. Response time in man-computer conversational transactions. In: Proceedings of the December 9–11, 1968, Fall Joint Computer Conference, Part I. 1968, 267–277
Liu Z, Heer J. The effects of interactive latency on exploratory visual analysis. IEEE Transactions on visualization and Computer Graphics, 2014, 20(12): 2122–2131
Andersen R, Chung F. Detecting sharp drops in PageRank and a simplified local partitioning algorithm. In: Proceedings of the 4th International Conference on Theory and Applications of Models of Computation. 2007, 1–12
Chung F. The heat kernel as the pagerank of a graph. Proceedings of the National Academy of Sciences of the United States of America, 2007, 104(50): 19735–19740
Kloster K, Gleich D F. Heat kernel based community detection. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 1386–1395
Li P, Chien I, Milenkovic O. Optimizing generalized pagerank methods for seed-expansion community detection. Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1050
Wang H, He M, Wei Z, Wang S, Yuan Y, Du X, Wen J R. Approximate graph propagation. In: Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2021, 1686–1696
Zhou D, Zhang S, Yildirim M Y, Alcorn S, Tong H, Davulcu H, He J. High-order structure exploration on massive graphs: a local graph clustering perspective. ACM Transactions on Knowledge Discovery from Data, 2021, 15(2): 1–26
Casella G, Robert C P, Wells M T. Generalized accept-reject sampling schemes. In: DasGupta A, ed. A Festschrift for Herman Rubin. Institute of Mathematical Statistics, 2004, 342–348
Paramonov K, Shemetov D, Sharpnack J. Estimating graphlet statistics via lifting. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 587–595
Guo W, Li Y, Sha M, He B, Xiao X, Tan K L. GPU-accelerated subgraph enumeration on partitioned graphs. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 1067–1082
Acknowledgements
We thank Wang Shijun, Huang Jin and Li Lianbei from Renmin University of China for putting their time and effort into discussing the ideas and algorithms. We thank Dr. Wu Jun for providing supports on the core techniques and proof details. We also thank Pr. Chen Zheng from Beijing Foreign Studies University for her encourages and supports. This work was partially supported by the Fundamental Research Funds for the Central Universities (No. 2020JS005).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.
Additional information
Zhe Yuan received the PhD degree in computer science from Renmin University of China, China. He is currently an assistant professor with Beijing Foreign Studies University, China. His research interests include graph algorithms, knowledge graph, and interpretable machine learning.
Zhewei Wei received the PhD degree in computer science and engineering from The Hong Kong University of Science and Technology, China. He is currently a professor with Renmin University of China, China. His research interests include algorithms for massive data, streaming algorithms, and graph algorithms.
Fangrui Lv received the BS and MS degrees from Renmin University of China, China. He is currently an employee at the China Development Bank, China. His research interests include graph algorithm, and data mining.
Ji-Rong Wen is a professor, the dean of School of Information and the executive dean of Gaoling School of Artificial Intelligence at Renmin University of China, China. His main research interests include information retrieval, data mining and machine learning. He once was a senior researcher and group manager of the Web Search and Mining Group at Microsoft Research Asia (MSRA).
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Yuan, Z., Wei, Z., Lv, F. et al. Index-free triangle-based graph local clustering. Front. Comput. Sci. 18, 183404 (2024). https://doi.org/10.1007/s11704-023-2768-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-023-2768-7