research-article

Efficient and Effective Attributed Hypergraph Clustering via K-Nearest Neighbor Augmentation

Authors:
Yiran Li

Hong Kong Polytechnic University, Hong Kong, China

Hong Kong Polytechnic University, Hong Kong, China

0000-0003-4476-016X
View Profile

,
Renchi Yang

Hong Kong Baptist University, Hong Kong, China

Hong Kong Baptist University, Hong Kong, China

0000-0002-7284-3096
View Profile

,
Jieming Shi

Hong Kong Polytechnic University, Hong Kong, China

Hong Kong Polytechnic University, Hong Kong, China

0000-0002-0465-1551
View Profile

Proceedings of the ACM on Management of Data Volume 1 Issue 2Article No.: 116pp 1–23https://doi.org/10.1145/3589261

Published:20 June 2023Publication History

Proceedings of the ACM on Management of Data

Abstract

Hypergraphs are an omnipresent data structure used to represent high-order interactions among entities. Given a hypergraph H wherein nodes are associated with attributes, attributed hypergraph clustering (AHC) aims to partition the nodes in H into k disjoint clusters, such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes on large attributed hypergraphs for accurate clustering. Existing AHC solutions suffer from issues of prohibitive computational costs, sub-par clustering quality, or both. In this paper, we present AHCKA, an efficient approach to AHC, which achieves state-of-the-art result quality via several algorithmic designs. Under the hood, AHCKA includes three key components: (i) a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, (ii) a joint hypergraph random walk model to devise an effective optimization objective towards AHC, and (iii) a highly efficient solver with speedup techniques for the problem optimization. Extensive experiments, comparing AHCKA against 15 baselines over 8 real attributed hypergraphs, reveal that AHCKA is superior to existing competitors in terms of clustering quality, while often being up to orders of magnitude faster.

Supplemental Material

video1962831160.mp4

mp4

25 MB

Download

References

Zeyuan Allen Zhu, Silvio Lattanzi, and Vahab Mirrokni. 2013. A Local Algorithm for Finding Well-Connected Clusters. In ICML, Vol. 28. 396--404.Google Scholar
Deng Cai, Xiaofei He, Jiawei Han, and Thomas S Huang. 2010. Graph regularized nonnegative matrix factorization for data representation. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 8 (2010), 1548--1560.Google Scholar
Yaoming Cai, Zijia Zhang, Zhihua Cai, Xiaobo Liu, and Xinwei Jiang. 2022. Hypergraph-Structured Autoencoder for Unsupervised and Semisupervised Classification of Hyperspectral Image. IEEE Geosci. Remote. Sens. Lett., Vol. 19 (2022), 1--5.Google Scholar
T.-H. Hubert Chan and Zhibin Liang. 2018. Generalizing the Hypergraph Laplacian via a Diffusion Process with Mediators. arXiv:1804.11128 [cs] (2018).Google Scholar
Hong Cheng, Yang Zhou, and Jeffrey Xu Yu. 2011. Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 5, 2 (2011), 1--33.Google ScholarDigital Library
Rundong Du, Barry Drake, and Haesun Park. 2019. Hybrid Clustering Based on Content and Connection Structure Using Joint Nonnegative Matrix Factorization. Journal of Global Optimization, Vol. 74, 4 (2019), 861--877.Google ScholarDigital Library
Barakeel Fanseu Kamhoua, Lin Zhang, Kaili Ma, James Cheng, Bo Li, and Bo Han. 2021. HyperGraph Convolution Based Attributed HyperGraph Clustering. In CIKM. 453--463.Google Scholar
Thomas Gaudelet, Noë l Malod-Dognin, and Natasa Przulj. 2018. Higher-order molecular organization as a source of biological function. Bioinform., Vol. 34, 17 (2018), i944--i953.Google ScholarCross Ref
Lars Gottesbüren, Tobias Heuer, and Peter Sanders. 2022. Parallel Flow-Based Hypergraph Partitioning. In SEA, Vol. 233. 5:1--5:21.Google Scholar
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. In ICML, Vol. 119. 3887--3896.Google Scholar
Koby Hayashi, Sinan G. Aksoy, Cheong Hee Park, and Haesun Park. 2020. Hypergraph Random Walks, Laplacians, and Clustering. In CIKM. 495--504.Google Scholar
Matthias Hein, Simon Setzer, Leonardo Jost, and Syama Sundar Rangapuram. 2013. The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited. In NeurIPS, Vol. 26.Google Scholar
Ling Huang, Chang-Dong Wang, and Philip S. Yu. 2021. Higher Order Connection Enhanced Community Detection in Adversarial Multiview Networks. IEEE Transactions on Cybernetics (2021), 1--15.Google Scholar
Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web. 271--279.Google ScholarDigital Library
Jinhong Jung, Namyong Park, Sael Lee, and U Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In SIGMOD. 789--804.Google Scholar
G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. 1999. Multilevel Hypergraph Partitioning: Applications in VLSI Domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 7, 1 (1999), 69--79.Google ScholarDigital Library
George Karypis and Vipin Kumar. 1998. Multilevelk-Way Partitioning Scheme for Irregular Graphs. J. Parallel and Distrib. Comput., Vol. 48, 1 (1998), 96--129.Google ScholarDigital Library
Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang Yoo. 2011. Higher-Order Correlation Clustering for Image Segmentation. In NeurIPS, Vol. 24.Google Scholar
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.Google Scholar
Tarun Kumar, Sankaran Vaidyanathan, Harini Ananthapadmanabhan, Srinivasan Parthasarathy, and Balaraman Ravindran. 2018. Hypergraph Clustering: A Modularity Maximization Approach. arXiv:1812.10869 [cs, stat] (2018).Google Scholar
Pan Li and Olgica Milenkovic. 2018. Submodular Hypergraphs: p-Laplacians, Cheeger Inequalities and Spectral Clustering. In ICML. 3014--3023.Google Scholar
S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory, Vol. 28, 2 (1982), 129--137.Google ScholarDigital Library
Christopher Lueg. 2003. From Usenet to CoWebs: interacting with social information spaces. Springer Science & Business Media.Google Scholar
Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In EMNLP-IJCNLP. 188--197.Google Scholar
Haekyu Park, Jinhong Jung, and U. Kang. 2017. A comparative study of matrix factorization and random walk with restart in recommender systems. In 2017 IEEE International Conference on Big Data (Big Data). 756--765.Google Scholar
Matthew J. Rattigan, Marc Maier, and David Jensen. 2007. Graph Clustering with Network Structure Indices. In ICML. 783--790.Google Scholar
J.A. Rodri´guez. 2002. On the Laplacian Eigenvalues and Metric Parameters of Hypergraphs. Linear and Multilinear Algebra, Vol. 50, 1 (2002), 1--14.Google ScholarCross Ref
Y SAAD. 1992. Numerical Methods for Large Eigenvalue Problems. Algorithms and Architectures for Advanced Scientific Computing (1992).Google Scholar
Sebastian Schlag, Tobias Heuer, Lars Gottesbüren, Yaroslav Akhremtsev, Christian Schulz, and Peter Sanders. 2022. High-Quality Hypergraph Partitioning. ACM J. Exp. Algorithmics (2022).Google Scholar
Jianbo Shi and Jitendra Malik. 2000. Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 22, 8 (2000), 888--905.Google ScholarDigital Library
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Paul Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In WWW. 243--246.Google Scholar
Hwanjun Song, Jae-Gil Lee, and Wook-Shin Han. 2017. PAMAE: Parallel k-Medoids Clustering with High Accuracy and Efficiency. In KDD. 1087--1096.Google ScholarDigital Library
Yuuki Takai, Atsushi Miyauchi, Masahiro Ikeda, and Yuichi Yoshida. 2020. Hypergraph Clustering Based on PageRank. In KDD. 1970--1978.Google Scholar
Hanghang Tong, Christos Faloutsos, and Jia-yu Pan. 2006. Fast Random Walk with Restart and Its Applications. In ICDM. 613--622.Google Scholar
Nate Veldt, Austin R. Benson, and Jon Kleinberg. 2022. Hypergraph Cuts with General Splitting Functions. SIAM Rev., Vol. 64, 3 (2022), 650--685.Google ScholarDigital Library
Joyce Jiyoung Whang, Rundong Du, Sangwon Jung, Geon Lee, Barry Drake, Qingqing Liu, Seonggoo Kang, and Haesun Park. 2020. MEGA : Multi-View Semi-Supervised Clustering of Hypergraphs. Proceedings of the VLDB Endowment, Vol. 13, 5 (2020), 698--711.Google ScholarDigital Library
Joong-Ho Won, Hua Zhou, and Kenneth Lange. 2021. Orthogonal trace-sum maximization: Applications, local algorithms, and global optimality. SIAM J. Matrix Anal. Appl., Vol. 42, 2 (2021), 859--882.Google ScholarDigital Library
Lei Wu, Yufeng Hu, Yajin Zhou, Haoyu Wang, Xiapu Luo, Zhi Wang, Fan Zhang, and Kui Ren. 2021. Towards Understanding and Demystifying Bitcoin Mixing Services. In WWW. 33--44.Google Scholar
Ming-Juan Wu, Ying-Lian Gao, Jin-Xing Liu, Chun-Hou Zheng, and Juan Wang. 2020. Integrative Hypergraph Regularization Principal Component Analysis for Sample Clustering and Co-Expression Genes Network Analysis on Multi-Omics Data. IEEE Journal of Biomedical and Health Informatics, Vol. 24, 6 (2020), 1823--1834.Google ScholarCross Ref
Zhiqiang Xu, Yiping Ke, Yi Wang, Hong Cheng, and James Cheng. 2012. A Model-Based Approach to Attributed Graph Clustering. In SIGMOD. 505--516.Google Scholar
Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, and Partha Talukdar. 2019. HyperGCN: a new method of training graph convolutional networks on hypergraphs. In NeurIPS. Number 135. 1511--1522.Google Scholar
Renchi Yang, Jieming Shi, Yin Yang, Keke Huang, Shiqi Zhang, and Xiaokui Xiao. 2021. Effective and Scalable Clustering on Massive Attributed Graphs. In WWW. 3675--3687.Google Scholar
Tianbao Yang, Rong Jin, Yun Chi, and Shenghuo Zhu. 2009. Combining Link and Content for Community Detection: A Discriminative Approach. In KDD. 927--936.Google Scholar
Stella X. Yu and Jianbo Shi. 2003. Multiclass Spectral Clustering. In ICCV. 313.Google ScholarDigital Library
Raphael Yuster and Uri Zwick. 2005. Fast sparse matrix multiplication. ACM Transactions On Algorithms (TALG), Vol. 1, 1 (2005), 2--13.Google ScholarDigital Library
Xiaotong Zhang, Han Liu, Qimai Li, and Xiao-Ming Wu. 2019. Attributed Graph Clustering via Adaptive Graph Convolution. In IJCAI. 4327--4333.Google Scholar
Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2007. Learning with Hypergraphs : Clustering, Classification, and Embedding. In NeurIPS, Vol. 19.Google ScholarCross Ref
Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment, Vol. 2, 1 (2009), 718--729.Google ScholarDigital Library
Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2010. Clustering Large Attributed Graphs : An Efficient Incremental Approach. In 2010 IEEE International Conference on Data Mining. 689--698.Google ScholarDigital Library

Index Terms

Efficient and Effective Attributed Hypergraph Clustering via K-Nearest Neighbor Augmentation

Recommendations

HyperGraph Convolution Based Attributed HyperGraph Clustering
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Attributed Graph Clustering (AGC) and Attributed Hypergraph Clustering (AHC) are important topics in graph mining with many applications. For AGC, amongst the unsupervised methods that combine the graph structure with node attributes, graph convolution ...
Read More
Efficient High-Quality Clustering for Large Bipartite Graphs
PACMMOD

A bipartite graph contains inter-set edges between two disjoint vertex sets, and is widely used to model real-world data, such as user-item purchase records, author-article publications, and biological interactions between drugs and proteins. k-Bipartite ...
Read More
Co-clustering Interactions via Attentive Hypergraph Neural Network
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

With the rapid growth of interaction data, many clustering methods have been proposed to discover interaction patterns as prior knowledge beneficial to downstream tasks. Considering that an interaction can be seen as an action occurring among multiple ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Management of Data Volume 1, Issue 2
PACMMOD
June 2023
2310 pages
EISSN:2836-6573
DOI:10.1145/3605748
Editor:
Divyakant Agrawal
UC Santa Barbara, United States
Issue’s Table of Contents
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2023
Published in pacmmod Volume 1, Issue 2

Permissions
Request permissions about this article.
Request Permissions
Author Tags
clustering
eigenvector
hypergraph
random walk
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 415
  Total Downloads
- Downloads (Last 12 months)415
- Downloads (Last 6 weeks)39
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient and Effective Attributed Hypergraph Clustering via K-Nearest Neighbor Augmentation

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

HyperGraph Convolution Based Attributed HyperGraph Clustering

Efficient High-Quality Clustering for Large Bipartite Graphs

Co-clustering Interactions via Attentive Hypergraph Neural Network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient and Effective Attributed Hypergraph Clustering via K-Nearest Neighbor Augmentation

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

HyperGraph Convolution Based Attributed HyperGraph Clustering

Efficient High-Quality Clustering for Large Bipartite Graphs

Co-clustering Interactions via Attentive Hypergraph Neural Network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media