skip to main content
research-article

Efficient and Effective Attributed Hypergraph Clustering via K-Nearest Neighbor Augmentation

Published:20 June 2023Publication History
Skip Abstract Section

Abstract

Hypergraphs are an omnipresent data structure used to represent high-order interactions among entities. Given a hypergraph H wherein nodes are associated with attributes, attributed hypergraph clustering (AHC) aims to partition the nodes in H into k disjoint clusters, such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes on large attributed hypergraphs for accurate clustering. Existing AHC solutions suffer from issues of prohibitive computational costs, sub-par clustering quality, or both. In this paper, we present AHCKA, an efficient approach to AHC, which achieves state-of-the-art result quality via several algorithmic designs. Under the hood, AHCKA includes three key components: (i) a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, (ii) a joint hypergraph random walk model to devise an effective optimization objective towards AHC, and (iii) a highly efficient solver with speedup techniques for the problem optimization. Extensive experiments, comparing AHCKA against 15 baselines over 8 real attributed hypergraphs, reveal that AHCKA is superior to existing competitors in terms of clustering quality, while often being up to orders of magnitude faster.

Skip Supplemental Material Section

Supplemental Material

video1962831160.mp4

mp4

25 MB

References

  1. Zeyuan Allen Zhu, Silvio Lattanzi, and Vahab Mirrokni. 2013. A Local Algorithm for Finding Well-Connected Clusters. In ICML, Vol. 28. 396--404.Google ScholarGoogle Scholar
  2. Deng Cai, Xiaofei He, Jiawei Han, and Thomas S Huang. 2010. Graph regularized nonnegative matrix factorization for data representation. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 8 (2010), 1548--1560.Google ScholarGoogle Scholar
  3. Yaoming Cai, Zijia Zhang, Zhihua Cai, Xiaobo Liu, and Xinwei Jiang. 2022. Hypergraph-Structured Autoencoder for Unsupervised and Semisupervised Classification of Hyperspectral Image. IEEE Geosci. Remote. Sens. Lett., Vol. 19 (2022), 1--5.Google ScholarGoogle Scholar
  4. T.-H. Hubert Chan and Zhibin Liang. 2018. Generalizing the Hypergraph Laplacian via a Diffusion Process with Mediators. arXiv:1804.11128 [cs] (2018).Google ScholarGoogle Scholar
  5. Hong Cheng, Yang Zhou, and Jeffrey Xu Yu. 2011. Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 5, 2 (2011), 1--33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Rundong Du, Barry Drake, and Haesun Park. 2019. Hybrid Clustering Based on Content and Connection Structure Using Joint Nonnegative Matrix Factorization. Journal of Global Optimization, Vol. 74, 4 (2019), 861--877.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barakeel Fanseu Kamhoua, Lin Zhang, Kaili Ma, James Cheng, Bo Li, and Bo Han. 2021. HyperGraph Convolution Based Attributed HyperGraph Clustering. In CIKM. 453--463.Google ScholarGoogle Scholar
  8. Thomas Gaudelet, Noë l Malod-Dognin, and Natasa Przulj. 2018. Higher-order molecular organization as a source of biological function. Bioinform., Vol. 34, 17 (2018), i944--i953.Google ScholarGoogle ScholarCross RefCross Ref
  9. Lars Gottesbüren, Tobias Heuer, and Peter Sanders. 2022. Parallel Flow-Based Hypergraph Partitioning. In SEA, Vol. 233. 5:1--5:21.Google ScholarGoogle Scholar
  10. Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. In ICML, Vol. 119. 3887--3896.Google ScholarGoogle Scholar
  11. Koby Hayashi, Sinan G. Aksoy, Cheong Hee Park, and Haesun Park. 2020. Hypergraph Random Walks, Laplacians, and Clustering. In CIKM. 495--504.Google ScholarGoogle Scholar
  12. Matthias Hein, Simon Setzer, Leonardo Jost, and Syama Sundar Rangapuram. 2013. The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited. In NeurIPS, Vol. 26.Google ScholarGoogle Scholar
  13. Ling Huang, Chang-Dong Wang, and Philip S. Yu. 2021. Higher Order Connection Enhanced Community Detection in Adversarial Multiview Networks. IEEE Transactions on Cybernetics (2021), 1--15.Google ScholarGoogle Scholar
  14. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web. 271--279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jinhong Jung, Namyong Park, Sael Lee, and U Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In SIGMOD. 789--804.Google ScholarGoogle Scholar
  16. G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. 1999. Multilevel Hypergraph Partitioning: Applications in VLSI Domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 7, 1 (1999), 69--79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. George Karypis and Vipin Kumar. 1998. Multilevelk-Way Partitioning Scheme for Irregular Graphs. J. Parallel and Distrib. Comput., Vol. 48, 1 (1998), 96--129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang Yoo. 2011. Higher-Order Correlation Clustering for Image Segmentation. In NeurIPS, Vol. 24.Google ScholarGoogle Scholar
  19. Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.Google ScholarGoogle Scholar
  20. Tarun Kumar, Sankaran Vaidyanathan, Harini Ananthapadmanabhan, Srinivasan Parthasarathy, and Balaraman Ravindran. 2018. Hypergraph Clustering: A Modularity Maximization Approach. arXiv:1812.10869 [cs, stat] (2018).Google ScholarGoogle Scholar
  21. Pan Li and Olgica Milenkovic. 2018. Submodular Hypergraphs: p-Laplacians, Cheeger Inequalities and Spectral Clustering. In ICML. 3014--3023.Google ScholarGoogle Scholar
  22. S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory, Vol. 28, 2 (1982), 129--137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Christopher Lueg. 2003. From Usenet to CoWebs: interacting with social information spaces. Springer Science & Business Media.Google ScholarGoogle Scholar
  24. Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In EMNLP-IJCNLP. 188--197.Google ScholarGoogle Scholar
  25. Haekyu Park, Jinhong Jung, and U. Kang. 2017. A comparative study of matrix factorization and random walk with restart in recommender systems. In 2017 IEEE International Conference on Big Data (Big Data). 756--765.Google ScholarGoogle Scholar
  26. Matthew J. Rattigan, Marc Maier, and David Jensen. 2007. Graph Clustering with Network Structure Indices. In ICML. 783--790.Google ScholarGoogle Scholar
  27. J.A. Rodri´guez. 2002. On the Laplacian Eigenvalues and Metric Parameters of Hypergraphs. Linear and Multilinear Algebra, Vol. 50, 1 (2002), 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  28. Y SAAD. 1992. Numerical Methods for Large Eigenvalue Problems. Algorithms and Architectures for Advanced Scientific Computing (1992).Google ScholarGoogle Scholar
  29. Sebastian Schlag, Tobias Heuer, Lars Gottesbüren, Yaroslav Akhremtsev, Christian Schulz, and Peter Sanders. 2022. High-Quality Hypergraph Partitioning. ACM J. Exp. Algorithmics (2022).Google ScholarGoogle Scholar
  30. Jianbo Shi and Jitendra Malik. 2000. Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 22, 8 (2000), 888--905.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Paul Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In WWW. 243--246.Google ScholarGoogle Scholar
  32. Hwanjun Song, Jae-Gil Lee, and Wook-Shin Han. 2017. PAMAE: Parallel k-Medoids Clustering with High Accuracy and Efficiency. In KDD. 1087--1096.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida. 2020. Hypergraph Clustering Based on PageRank. In KDD. 1970--1978.Google ScholarGoogle Scholar
  34. Hanghang Tong, Christos Faloutsos, and Jia-yu Pan. 2006. Fast Random Walk with Restart and Its Applications. In ICDM. 613--622.Google ScholarGoogle Scholar
  35. Nate Veldt, Austin R. Benson, and Jon Kleinberg. 2022. Hypergraph Cuts with General Splitting Functions. SIAM Rev., Vol. 64, 3 (2022), 650--685.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Joyce Jiyoung Whang, Rundong Du, Sangwon Jung, Geon Lee, Barry Drake, Qingqing Liu, Seonggoo Kang, and Haesun Park. 2020. MEGA : Multi-View Semi-Supervised Clustering of Hypergraphs. Proceedings of the VLDB Endowment, Vol. 13, 5 (2020), 698--711.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Joong-Ho Won, Hua Zhou, and Kenneth Lange. 2021. Orthogonal trace-sum maximization: Applications, local algorithms, and global optimality. SIAM J. Matrix Anal. Appl., Vol. 42, 2 (2021), 859--882.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Lei Wu, Yufeng Hu, Yajin Zhou, Haoyu Wang, Xiapu Luo, Zhi Wang, Fan Zhang, and Kui Ren. 2021. Towards Understanding and Demystifying Bitcoin Mixing Services. In WWW. 33--44.Google ScholarGoogle Scholar
  39. Ming-Juan Wu, Ying-Lian Gao, Jin-Xing Liu, Chun-Hou Zheng, and Juan Wang. 2020. Integrative Hypergraph Regularization Principal Component Analysis for Sample Clustering and Co-Expression Genes Network Analysis on Multi-Omics Data. IEEE Journal of Biomedical and Health Informatics, Vol. 24, 6 (2020), 1823--1834.Google ScholarGoogle ScholarCross RefCross Ref
  40. Zhiqiang Xu, Yiping Ke, Yi Wang, Hong Cheng, and James Cheng. 2012. A Model-Based Approach to Attributed Graph Clustering. In SIGMOD. 505--516.Google ScholarGoogle Scholar
  41. Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, and Partha Talukdar. 2019. HyperGCN: a new method of training graph convolutional networks on hypergraphs. In NeurIPS. Number 135. 1511--1522.Google ScholarGoogle Scholar
  42. Renchi Yang, Jieming Shi, Yin Yang, Keke Huang, Shiqi Zhang, and Xiaokui Xiao. 2021. Effective and Scalable Clustering on Massive Attributed Graphs. In WWW. 3675--3687.Google ScholarGoogle Scholar
  43. Tianbao Yang, Rong Jin, Yun Chi, and Shenghuo Zhu. 2009. Combining Link and Content for Community Detection: A Discriminative Approach. In KDD. 927--936.Google ScholarGoogle Scholar
  44. Stella X. Yu and Jianbo Shi. 2003. Multiclass Spectral Clustering. In ICCV. 313.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Raphael Yuster and Uri Zwick. 2005. Fast sparse matrix multiplication. ACM Transactions On Algorithms (TALG), Vol. 1, 1 (2005), 2--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xiaotong Zhang, Han Liu, Qimai Li, and Xiao-Ming Wu. 2019. Attributed Graph Clustering via Adaptive Graph Convolution. In IJCAI. 4327--4333.Google ScholarGoogle Scholar
  47. Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2007. Learning with Hypergraphs : Clustering, Classification, and Embedding. In NeurIPS, Vol. 19.Google ScholarGoogle ScholarCross RefCross Ref
  48. Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment, Vol. 2, 1 (2009), 718--729.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2010. Clustering Large Attributed Graphs : An Efficient Incremental Approach. In 2010 IEEE International Conference on Data Mining. 689--698.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient and Effective Attributed Hypergraph Clustering via K-Nearest Neighbor Augmentation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the ACM on Management of Data
          Proceedings of the ACM on Management of Data  Volume 1, Issue 2
          PACMMOD
          June 2023
          2310 pages
          EISSN:2836-6573
          DOI:10.1145/3605748
          Issue’s Table of Contents

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 June 2023
          Published in pacmmod Volume 1, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)415
          • Downloads (Last 6 weeks)39

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader