Abstract
Hypergraphs are an omnipresent data structure used to represent high-order interactions among entities. Given a hypergraph H wherein nodes are associated with attributes, attributed hypergraph clustering (AHC) aims to partition the nodes in H into k disjoint clusters, such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes on large attributed hypergraphs for accurate clustering. Existing AHC solutions suffer from issues of prohibitive computational costs, sub-par clustering quality, or both. In this paper, we present AHCKA, an efficient approach to AHC, which achieves state-of-the-art result quality via several algorithmic designs. Under the hood, AHCKA includes three key components: (i) a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, (ii) a joint hypergraph random walk model to devise an effective optimization objective towards AHC, and (iii) a highly efficient solver with speedup techniques for the problem optimization. Extensive experiments, comparing AHCKA against 15 baselines over 8 real attributed hypergraphs, reveal that AHCKA is superior to existing competitors in terms of clustering quality, while often being up to orders of magnitude faster.
Supplemental Material
- Zeyuan Allen Zhu, Silvio Lattanzi, and Vahab Mirrokni. 2013. A Local Algorithm for Finding Well-Connected Clusters. In ICML, Vol. 28. 396--404.Google Scholar
- Deng Cai, Xiaofei He, Jiawei Han, and Thomas S Huang. 2010. Graph regularized nonnegative matrix factorization for data representation. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 8 (2010), 1548--1560.Google Scholar
- Yaoming Cai, Zijia Zhang, Zhihua Cai, Xiaobo Liu, and Xinwei Jiang. 2022. Hypergraph-Structured Autoencoder for Unsupervised and Semisupervised Classification of Hyperspectral Image. IEEE Geosci. Remote. Sens. Lett., Vol. 19 (2022), 1--5.Google Scholar
- T.-H. Hubert Chan and Zhibin Liang. 2018. Generalizing the Hypergraph Laplacian via a Diffusion Process with Mediators. arXiv:1804.11128 [cs] (2018).Google Scholar
- Hong Cheng, Yang Zhou, and Jeffrey Xu Yu. 2011. Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 5, 2 (2011), 1--33.Google ScholarDigital Library
- Rundong Du, Barry Drake, and Haesun Park. 2019. Hybrid Clustering Based on Content and Connection Structure Using Joint Nonnegative Matrix Factorization. Journal of Global Optimization, Vol. 74, 4 (2019), 861--877.Google ScholarDigital Library
- Barakeel Fanseu Kamhoua, Lin Zhang, Kaili Ma, James Cheng, Bo Li, and Bo Han. 2021. HyperGraph Convolution Based Attributed HyperGraph Clustering. In CIKM. 453--463.Google Scholar
- Thomas Gaudelet, Noë l Malod-Dognin, and Natasa Przulj. 2018. Higher-order molecular organization as a source of biological function. Bioinform., Vol. 34, 17 (2018), i944--i953.Google ScholarCross Ref
- Lars Gottesbüren, Tobias Heuer, and Peter Sanders. 2022. Parallel Flow-Based Hypergraph Partitioning. In SEA, Vol. 233. 5:1--5:21.Google Scholar
- Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. In ICML, Vol. 119. 3887--3896.Google Scholar
- Koby Hayashi, Sinan G. Aksoy, Cheong Hee Park, and Haesun Park. 2020. Hypergraph Random Walks, Laplacians, and Clustering. In CIKM. 495--504.Google Scholar
- Matthias Hein, Simon Setzer, Leonardo Jost, and Syama Sundar Rangapuram. 2013. The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited. In NeurIPS, Vol. 26.Google Scholar
- Ling Huang, Chang-Dong Wang, and Philip S. Yu. 2021. Higher Order Connection Enhanced Community Detection in Adversarial Multiview Networks. IEEE Transactions on Cybernetics (2021), 1--15.Google Scholar
- Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web. 271--279.Google ScholarDigital Library
- Jinhong Jung, Namyong Park, Sael Lee, and U Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In SIGMOD. 789--804.Google Scholar
- G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. 1999. Multilevel Hypergraph Partitioning: Applications in VLSI Domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 7, 1 (1999), 69--79.Google ScholarDigital Library
- George Karypis and Vipin Kumar. 1998. Multilevelk-Way Partitioning Scheme for Irregular Graphs. J. Parallel and Distrib. Comput., Vol. 48, 1 (1998), 96--129.Google ScholarDigital Library
- Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang Yoo. 2011. Higher-Order Correlation Clustering for Image Segmentation. In NeurIPS, Vol. 24.Google Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.Google Scholar
- Tarun Kumar, Sankaran Vaidyanathan, Harini Ananthapadmanabhan, Srinivasan Parthasarathy, and Balaraman Ravindran. 2018. Hypergraph Clustering: A Modularity Maximization Approach. arXiv:1812.10869 [cs, stat] (2018).Google Scholar
- Pan Li and Olgica Milenkovic. 2018. Submodular Hypergraphs: p-Laplacians, Cheeger Inequalities and Spectral Clustering. In ICML. 3014--3023.Google Scholar
- S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory, Vol. 28, 2 (1982), 129--137.Google ScholarDigital Library
- Christopher Lueg. 2003. From Usenet to CoWebs: interacting with social information spaces. Springer Science & Business Media.Google Scholar
- Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In EMNLP-IJCNLP. 188--197.Google Scholar
- Haekyu Park, Jinhong Jung, and U. Kang. 2017. A comparative study of matrix factorization and random walk with restart in recommender systems. In 2017 IEEE International Conference on Big Data (Big Data). 756--765.Google Scholar
- Matthew J. Rattigan, Marc Maier, and David Jensen. 2007. Graph Clustering with Network Structure Indices. In ICML. 783--790.Google Scholar
- J.A. Rodri´guez. 2002. On the Laplacian Eigenvalues and Metric Parameters of Hypergraphs. Linear and Multilinear Algebra, Vol. 50, 1 (2002), 1--14.Google ScholarCross Ref
- Y SAAD. 1992. Numerical Methods for Large Eigenvalue Problems. Algorithms and Architectures for Advanced Scientific Computing (1992).Google Scholar
- Sebastian Schlag, Tobias Heuer, Lars Gottesbüren, Yaroslav Akhremtsev, Christian Schulz, and Peter Sanders. 2022. High-Quality Hypergraph Partitioning. ACM J. Exp. Algorithmics (2022).Google Scholar
- Jianbo Shi and Jitendra Malik. 2000. Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 22, 8 (2000), 888--905.Google ScholarDigital Library
- Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Paul Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In WWW. 243--246.Google Scholar
- Hwanjun Song, Jae-Gil Lee, and Wook-Shin Han. 2017. PAMAE: Parallel k-Medoids Clustering with High Accuracy and Efficiency. In KDD. 1087--1096.Google ScholarDigital Library
- Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida. 2020. Hypergraph Clustering Based on PageRank. In KDD. 1970--1978.Google Scholar
- Hanghang Tong, Christos Faloutsos, and Jia-yu Pan. 2006. Fast Random Walk with Restart and Its Applications. In ICDM. 613--622.Google Scholar
- Nate Veldt, Austin R. Benson, and Jon Kleinberg. 2022. Hypergraph Cuts with General Splitting Functions. SIAM Rev., Vol. 64, 3 (2022), 650--685.Google ScholarDigital Library
- Joyce Jiyoung Whang, Rundong Du, Sangwon Jung, Geon Lee, Barry Drake, Qingqing Liu, Seonggoo Kang, and Haesun Park. 2020. MEGA : Multi-View Semi-Supervised Clustering of Hypergraphs. Proceedings of the VLDB Endowment, Vol. 13, 5 (2020), 698--711.Google ScholarDigital Library
- Joong-Ho Won, Hua Zhou, and Kenneth Lange. 2021. Orthogonal trace-sum maximization: Applications, local algorithms, and global optimality. SIAM J. Matrix Anal. Appl., Vol. 42, 2 (2021), 859--882.Google ScholarDigital Library
- Lei Wu, Yufeng Hu, Yajin Zhou, Haoyu Wang, Xiapu Luo, Zhi Wang, Fan Zhang, and Kui Ren. 2021. Towards Understanding and Demystifying Bitcoin Mixing Services. In WWW. 33--44.Google Scholar
- Ming-Juan Wu, Ying-Lian Gao, Jin-Xing Liu, Chun-Hou Zheng, and Juan Wang. 2020. Integrative Hypergraph Regularization Principal Component Analysis for Sample Clustering and Co-Expression Genes Network Analysis on Multi-Omics Data. IEEE Journal of Biomedical and Health Informatics, Vol. 24, 6 (2020), 1823--1834.Google ScholarCross Ref
- Zhiqiang Xu, Yiping Ke, Yi Wang, Hong Cheng, and James Cheng. 2012. A Model-Based Approach to Attributed Graph Clustering. In SIGMOD. 505--516.Google Scholar
- Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, and Partha Talukdar. 2019. HyperGCN: a new method of training graph convolutional networks on hypergraphs. In NeurIPS. Number 135. 1511--1522.Google Scholar
- Renchi Yang, Jieming Shi, Yin Yang, Keke Huang, Shiqi Zhang, and Xiaokui Xiao. 2021. Effective and Scalable Clustering on Massive Attributed Graphs. In WWW. 3675--3687.Google Scholar
- Tianbao Yang, Rong Jin, Yun Chi, and Shenghuo Zhu. 2009. Combining Link and Content for Community Detection: A Discriminative Approach. In KDD. 927--936.Google Scholar
- Stella X. Yu and Jianbo Shi. 2003. Multiclass Spectral Clustering. In ICCV. 313.Google ScholarDigital Library
- Raphael Yuster and Uri Zwick. 2005. Fast sparse matrix multiplication. ACM Transactions On Algorithms (TALG), Vol. 1, 1 (2005), 2--13.Google ScholarDigital Library
- Xiaotong Zhang, Han Liu, Qimai Li, and Xiao-Ming Wu. 2019. Attributed Graph Clustering via Adaptive Graph Convolution. In IJCAI. 4327--4333.Google Scholar
- Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2007. Learning with Hypergraphs : Clustering, Classification, and Embedding. In NeurIPS, Vol. 19.Google ScholarCross Ref
- Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment, Vol. 2, 1 (2009), 718--729.Google ScholarDigital Library
- Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2010. Clustering Large Attributed Graphs : An Efficient Incremental Approach. In 2010 IEEE International Conference on Data Mining. 689--698.Google ScholarDigital Library
Index Terms
- Efficient and Effective Attributed Hypergraph Clustering via K-Nearest Neighbor Augmentation
Recommendations
HyperGraph Convolution Based Attributed HyperGraph Clustering
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementAttributed Graph Clustering (AGC) and Attributed Hypergraph Clustering (AHC) are important topics in graph mining with many applications. For AGC, amongst the unsupervised methods that combine the graph structure with node attributes, graph convolution ...
Efficient High-Quality Clustering for Large Bipartite Graphs
PACMMODA bipartite graph contains inter-set edges between two disjoint vertex sets, and is widely used to model real-world data, such as user-item purchase records, author-article publications, and biological interactions between drugs and proteins. k-Bipartite ...
Co-clustering Interactions via Attentive Hypergraph Neural Network
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalWith the rapid growth of interaction data, many clustering methods have been proposed to discover interaction patterns as prior knowledge beneficial to downstream tasks. Considering that an interaction can be seen as an action occurring among multiple ...
Comments