Skip to main content
Log in

Index-free triangle-based graph local clustering

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Motif-based graph local clustering (MGLC) is a popular method for graph mining tasks due to its various applications. However, the traditional two-phase approach of precomputing motif weights before performing local clustering loses locality and is impractical for large graphs. While some attempts have been made to address the efficiency bottleneck, there is still no applicable algorithm for large scale graphs with billions of edges. In this paper, we propose a purely local and index-free method called Index-free Triangle-based Graph Local Clustering (TGLC*) to solve the MGLC problem w.r.t. a triangle. TGLC* directly estimates the Personalized PageRank (PPR) vector using random walks with the desired triangle-weighted distribution and proposes the clustering result using a standard sweep procedure. We demonstrate TGLC*’s scalability through theoretical analysis and its practical benefits through a novel visualization layout. TGLC* is the first algorithm to solve the MGLC problem without precomputing the motif weight. Extensive experiments on seven real-world large-scale datasets show that TGLC* is applicable and scalable for large graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Spielman D A, Teng S H. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 2004, 81–90

  2. Andersen R, Chung F, Lang K. Local graph partitioning using pagerank vectors. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science. 2006, 475–486

  3. Liao C S, Lu K, Baym M, Singh R, Berger B. IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics, 2009, 25(12): i253–i258

    Article  Google Scholar 

  4. Li S, Gentile C, Karatzoglou A. Graph clustering bandits for recommendation. 2016, arXiv preprint arXiv: 1605.00596

  5. Feng Y, Yu S, Zhang K, Li X, Ning Z. COMICS: a community property-based triangle motif clustering scheme. PeerJ Computer Science, 2019, 5: e180

    Article  Google Scholar 

  6. Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888–905

    Article  Google Scholar 

  7. Newman M E J. Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 2006, 103(23): 8577–8582

    Article  Google Scholar 

  8. Vinh N X, Epps J, Bailey J. Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. The Journal of Machine Learning Research, 2010, 11: 2837–2854

    MathSciNet  MATH  Google Scholar 

  9. Kobourov S G, Pupyrev S, Simonetto P. Visualizing graphs as maps with contiguous regions. In: Proceedings of the 16th Eurographics Conference on Visualization. 2014, 31–35

  10. Yang J, Leskovec J. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 2015, 42(1): 181–213

    Article  Google Scholar 

  11. Cheeger J. A lower bound for the smallest eigenvalue of the Laplacian. In: Gunning R C, ed. Problems in Analysis. Princeton: Princeton University Press, 1971, 195–200

    Google Scholar 

  12. Cox I J, Rao S B, Zhong Y. “Ratio regions”: a technique for image segmentation. In: Proceedings of the 13th International Conference on Pattern Recognition. 1996, 557–564

  13. Benson A R, Gleich D F, Leskovec J. Higher-order organization of complex networks. Science, 2016, 353(6295): 163–166

    Article  Google Scholar 

  14. Tsourakakis C E, Pachocki J, Mitzenmacher M. Scalable motif-aware graph clustering. In: Proceedings of the 26th International Conference on World Wide Web. 2017, 1451–1460

  15. Watts D J, Strogatz S H. Collective dynamics of ‘small-world’ networks. Nature, 1998, 393(6684): 440–442

    Article  MATH  Google Scholar 

  16. Krämer N C, Sauer V, Ellison N. The strength of weak ties revisited: further evidence of the role of strong ties in the provision of online social support. Social Media + Society, 2021, 7(2): 20563051211024958

    Article  Google Scholar 

  17. Yao S. Application of data mining technology in financial fraud identification. In: Proceedings of the 4th International Conference on Information Systems and Computer Aided Education. 2021, 2919–2922

  18. Zhou S, Yang X, Chang Q. Spatial clustering analysis of green economy based on knowledge graph. Journal of Intelligent & Fuzzy Systems, 2021, doi: https://doi.org/10.3233/jifs-219074

  19. Foysal K H, Chang H J, Bruess F, Chong J W. SmartFit: smartphone application for garment fit detection. Electronics, 2021, 10(1): 97

    Article  Google Scholar 

  20. Zhu D, Shen G, Chen J, Zhou W, Kong X. A higher-order motif-based spatiotemporal graph imputation approach for transportation networks. Wireless Communications and Mobile Computing, 2022, 2022: 1702170

    Google Scholar 

  21. Yin H, Benson A R, Leskovec J, Gleich D F. Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017, 555–564

  22. Leskovec J, Sosič R. SNAP: a general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology, 2017, 8(1): 1

    Article  Google Scholar 

  23. Ma W, Cai L, He T, Chen L, Cao Z, Li R. Local expansion and optimization for higher-order graph clustering. IEEE Internet of Things Journal, 2019, 6(5): 8702–8713

    Article  Google Scholar 

  24. Huang S, Li Y, Bao Z, Li Z. Towards efficient motif-based graph partitioning: an adaptive sampling approach. In: Proceedings of the 37th International Conference on Data Engineering. 2021, 528–539

  25. Miller R B. Response time in man-computer conversational transactions. In: Proceedings of the December 9–11, 1968, Fall Joint Computer Conference, Part I. 1968, 267–277

  26. Liu Z, Heer J. The effects of interactive latency on exploratory visual analysis. IEEE Transactions on visualization and Computer Graphics, 2014, 20(12): 2122–2131

    Article  Google Scholar 

  27. Andersen R, Chung F. Detecting sharp drops in PageRank and a simplified local partitioning algorithm. In: Proceedings of the 4th International Conference on Theory and Applications of Models of Computation. 2007, 1–12

  28. Chung F. The heat kernel as the pagerank of a graph. Proceedings of the National Academy of Sciences of the United States of America, 2007, 104(50): 19735–19740

    Article  Google Scholar 

  29. Kloster K, Gleich D F. Heat kernel based community detection. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 1386–1395

  30. Li P, Chien I, Milenkovic O. Optimizing generalized pagerank methods for seed-expansion community detection. Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1050

  31. Wang H, He M, Wei Z, Wang S, Yuan Y, Du X, Wen J R. Approximate graph propagation. In: Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2021, 1686–1696

  32. Zhou D, Zhang S, Yildirim M Y, Alcorn S, Tong H, Davulcu H, He J. High-order structure exploration on massive graphs: a local graph clustering perspective. ACM Transactions on Knowledge Discovery from Data, 2021, 15(2): 1–26

    Article  Google Scholar 

  33. Casella G, Robert C P, Wells M T. Generalized accept-reject sampling schemes. In: DasGupta A, ed. A Festschrift for Herman Rubin. Institute of Mathematical Statistics, 2004, 342–348

  34. Paramonov K, Shemetov D, Sharpnack J. Estimating graphlet statistics via lifting. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 587–595

  35. Guo W, Li Y, Sha M, He B, Xiao X, Tan K L. GPU-accelerated subgraph enumeration on partitioned graphs. In: Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. 2020, 1067–1082

Download references

Acknowledgements

We thank Wang Shijun, Huang Jin and Li Lianbei from Renmin University of China for putting their time and effort into discussing the ideas and algorithms. We thank Dr. Wu Jun for providing supports on the core techniques and proof details. We also thank Pr. Chen Zheng from Beijing Foreign Studies University for her encourages and supports. This work was partially supported by the Fundamental Research Funds for the Central Universities (No. 2020JS005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhewei Wei.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Zhe Yuan received the PhD degree in computer science from Renmin University of China, China. He is currently an assistant professor with Beijing Foreign Studies University, China. His research interests include graph algorithms, knowledge graph, and interpretable machine learning.

Zhewei Wei received the PhD degree in computer science and engineering from The Hong Kong University of Science and Technology, China. He is currently a professor with Renmin University of China, China. His research interests include algorithms for massive data, streaming algorithms, and graph algorithms.

Fangrui Lv received the BS and MS degrees from Renmin University of China, China. He is currently an employee at the China Development Bank, China. His research interests include graph algorithm, and data mining.

Ji-Rong Wen is a professor, the dean of School of Information and the executive dean of Gaoling School of Artificial Intelligence at Renmin University of China, China. His main research interests include information retrieval, data mining and machine learning. He once was a senior researcher and group manager of the Web Search and Mining Group at Microsoft Research Asia (MSRA).

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, Z., Wei, Z., Lv, F. et al. Index-free triangle-based graph local clustering. Front. Comput. Sci. 18, 183404 (2024). https://doi.org/10.1007/s11704-023-2768-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-023-2768-7

Keywords

Navigation