skip to main content
research-article

Deep Adaptive Graph Clustering via von Mises-Fisher Distributions

Published: 08 January 2024 Publication History

Abstract

Graph clustering has been a hot research topic and is widely used in many fields, such as community detection in social networks. Lots of works combining auto-encoder and graph neural networks have been applied to clustering tasks by utilizing node attributes and graph structure. These works usually assumed the inherent parameters (i.e., size and variance) of different clusters in the latent embedding space are homogeneous, and hence the assigned probability is monotonous over the Euclidean distance between node embeddings and centroids. Unfortunately, this assumption usually does not hold since the size and concentration of different clusters can be quite different, which limits the clustering accuracy. In addition, the node embeddings in deep graph clustering methods are usually L2 normalized so that it lies on the surface of a unit hyper-sphere. To solve this problem, we proposed Deep Adaptive Graph Clustering via von Mises-Fisher distributions, namely DAGC. DAGC assumes the node embeddings H can be drawn from a von Mises-Fisher distribution and each cluster k is associated with cluster inherent parameters ρk which includes cluster center μ and cluster cohesion degree κ. Then we adopt an EM-like approach (i.e., 𝒫(H|ρ) and 𝒫(ρ|H), respectively) to learn the embedding and cluster inherent parameters alternately. Specifically, with the node embeddings, we proposed to update the cluster centers in an attraction-repulsion manner to make the cluster centers more separable. And given the cluster inherent parameters, a likelihood-based loss is proposed to make node embeddings more concentrated around cluster centers. Thus, DAGC can simultaneously improve the intra-cluster compactness and inter-cluster heterogeneity. Finally, extensive experiments conducted on four benchmark datasets have demonstrated that the proposed DAGC consistently outperforms the state-of-the-art methods, especially on imbalanced datasets.

References

[1]
Cem Aksoylar, Jing Qian, and Venkatesh Saligrama. 2016. Clustering and community detection with imbalanced clusters. IEEE Transactions on Signal and Information Processing over Networks 3 (2016), 61–76.
[2]
Jason Altschuler, Jonathan Niles-Weed, and Philippe Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS). 1961–1971.
[3]
Arindam Banerjee, Inderjit S. Dhillon, Joydeep Ghosh, and Suvrit Sra. 2005. Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research 6 (2005), 1345–1382.
[4]
Aseem Baranwal, Kimon Fountoulakis, and Aukosh Jagannath. 2021. Graph convolution for semi-supervised classification: Improved linear separability and out-of-distribution generalization. arXiv preprint arXiv:2102.06966 (2021).
[5]
Deyu Bo, Xiao Wang, Chuan Shi, Meiqi Zhu, Emiao Lu, and Peng Cui. 2020. Structural deep clustering network. In Proceedings of the World Wide Web Conference (WWW). 1400–1410.
[6]
Hongyun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang. 2018. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30 (2018), 1616–1637.
[7]
Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. 2018. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV). 132–149.
[8]
Ganqu Cui, Jie Zhou, Cheng Yang, and Zhiyuan Liu. 2020. Adaptive graph encoder for attributed graph embedding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 976–985.
[9]
Bert De Brabandere, Davy Neven, and Luc Van Gool. 2017. Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551 (2017).
[10]
Inderjit S. Dhillon and Dharmendra S. Modha. 2001. Concept decompositions for large sparse text data using clustering. Machine Learning 42 (2001), 143–175.
[11]
Yun Fu and Yunqian Ma. 2012. Graph Embedding for Pattern Analysis. Springer Science & Business Media.
[12]
Hongchang Gao and Heng Huang. 2018. Deep attributed network embedding. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 3364–3370.
[13]
Siddharth Gopal and Yiming Yang. 2014. Von Mises-Fisher clustering models. In International Conference on Machine Learning. PMLR, 154–162.
[14]
Xifeng Guo, Long Gao, Xinwang Liu, and Jianping Yin. 2017. Improved deep embedded clustering with local structure preservation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 1753–1759.
[15]
Kai Han, Andrea Vedaldi, and Andrew Zisserman. 2019. Learning to discover novel visual categories via deep transfer clustering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 8401–8409.
[16]
John A. Hartigan and Manchek A. Wong. 1979. A k-means clustering algorithm. Applied Stats 28 (1979), 100–108.
[17]
Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21 (2009), 1263–1284.
[18]
Geoffrey Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313 (2006), 504–507.
[19]
Rongyao Hu, Zhenyun Deng, and Xiaofeng Zhu. 2021. Multi-scale graph fusion for co-saliency detection. In Proceedings of the AAAI Conference on Artificial Intelligence. 7789–7796.
[20]
Xiao Huang, Jundong Li, and Xia Hu. 2017. Label informed attributed network embedding. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM). 731–739.
[21]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[22]
Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[23]
Thomas N. Kipf and Max Welling. 2016. Variational graph auto-encoders. ArXiv abs/1611.07308 (2016).
[24]
Chen Li, Xutan Peng, Yuhang Niu, Shanghang Zhang, Hao Peng, Chuan Zhou, and Jianxin Li. 2021. Learning graph attention-aware knowledge graph embedding. Neurocomputing 461 (2021), 516–529.
[25]
Jundong Li, Harsh Dani, Xia Hu, Jiliang Tang, Yi Chang, and Huan Liu. 2017. Attributed network embedding for learning in a dynamic environment. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). 387–396.
[26]
Peizhao Li, Han Zhao, and Hongfu Liu. 2020. Deep fair clustering for visual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9070–9079.
[27]
K. Liu, Y. Fu, P. Wang, L. Wu, and X. Li. 2019. Automating feature subspace exploration via multi-agent reinforcement learning. In the 25th ACM SIGKDD International Conference.
[28]
L. Lovász and M. D. Plummer. 1986. Matching Theory. (1986).
[29]
Gongxu Luo, Jianxin Li, Jianlin Su, Hao Peng, Carl Yang, Lichao Sun, Philip S. Yu, and Lifang He. 2021. Graph entropy guided node embedding dimension selection for graph neural networks. arXiv preprint arXiv:2105.03178 (2021).
[30]
Muhammad Muzzamil Luqman, Jean-Yves Ramel, Josep Lladós, and Thierry Brouard. 2013. Fuzzy multilevel graph embedding. Pattern Recognition 46, 2 (2013), 551–565.
[31]
Mario Manzo. 2019. KGEARSRG: Kernel graph embedding on attributed relational sift-based regions graph. Machine Learning and Knowledge Extraction 1, 3 (2019), 55.
[32]
Elahe Nasiri, Kamal Berahmand, Mehrdad Rostami, and Mohammad Dabiri. 2021. A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding. Computers in Biology and Medicine 137 (2021), 104772.
[33]
Shirui Pan, Ruiqi Hu, Sai-Fu Fung, Guodong Long, Jing Jiang, and Chengqi Zhang. 2020. Learning graph embedding with adversarial training methods. IEEE Transactions on Cybernetics 50, 6 (2020), 2475–2487.
[34]
Hao Peng, Ruitong Zhang, Yingtong Dou, Renyu Yang, Jingyi Zhang, and Philip S. Yu. 2021. Reinforced neighborhood selection guided multi-relational graph neural networks. ACM Transactions on Information Systems (TOIS) 40, 4 (2021), 1–46.
[35]
Hao Peng, Ruitong Zhang, Shaoning Li, Yuwei Cao, Shirui Pan, and Philip Yu. 2022. Reinforced, incremental and cross-lingual event detection from social messages. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
[36]
Zhihao Peng, Hui Liu, Yuheng Jia, and Junhui Hou. 2021. Attention-driven graph clustering network. arXiv preprint arXiv:2108.05499 (2021).
[37]
Z. Qiao, Y. Du, Y. Fu, P. Wang, and Y. Zhou. 2019. Unsupervised author disambiguation using heterogeneous graph convolutional network embedding. In 2019 IEEE International Conference on Big Data (Big Data).
[38]
Z. Qiao, P. Wang, Y. Fu, Y. Du, and Y. Zhou. 2020. Tree Structure-Aware Graph Representation Learning via Integrated Hierarchical Aggregation and Relational Metric Learning.
[39]
Saurabh Sawlani, Lingxiao Zhao, and Leman Akoglu. 2021. Fast attributed graph embedding via density of states. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 559–568.
[40]
Satu Elisa Schaeffer. 2007. Graph clustering. Computer Science Review 1, 1 (2007), 27–64.
[41]
Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868 (2018).
[42]
Richard Sinkhorn. 1967. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly 74 (1967), 402–405.
[43]
Guolei Sun and Xiangliang Zhang. 2019. A novel framework for node/edge attributed graph embedding. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 169–182.
[44]
Heli Sun, Fang He, Jianbin Huang, Yizhou Sun, Yang Li, Chenyu Wang, Liang He, Zhongbin Sun, and Xiaolin Jia. 2020. Network embedding for community detection in attributed networks. ACM Transactions on Knowledge Discovery from Data 14 (2020), 1–25.
[45]
Fei Tian, Bin Gao, Qing Cui, Enhong Chen, and Tie-Yan Liu. 2014. Learning deep representations for graph clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 28.
[46]
Wenxuan Tu, Sihang Zhou, Xinwang Liu, Xifeng Guo, Zhiping Cai, En Zhu, and Jieren Cheng. 2021. Deep fusion clustering network. In Proceedings of the AAAI Conference on Artificial Intelligence. 9978–9987.
[47]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the International Conference on Learning Representations (ICLR).
[48]
Chun Wang, Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, and Chengqi Zhang. 2019. Attributed graph clustering: A deep attentional embedding approach. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 3670–3676.
[49]
Jinghua Wang and Jianmin Jiang. 2021. Unsupervised deep clustering via adaptive GMM modeling and optimization. Neurocomputing 433 (2021), 199–211.
[50]
P. Wang, Y. Fu, G. Liu, W. Hu, and C. Aggarwal. 2017. Human mobility synchronization and trip purpose detection with mixture of Hawkes processes. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
[51]
P. Wang, Y. Fu, J. Zhang, P. Wang, Y. Zheng, and C. C. Aggarwal. 2018. You are how you drive: Peer and temporal-aware representation learning for driving behavior analysis. ACM (2018).
[52]
P. Wang, G. Liu, Y. Fu, Y. Zhou, and J. Li. 2018. Spotting trip purposes from taxi trajectories: A general probabilistic model. ACM Transactions on Intelligent Systems 9, 3 (2018), 29.1–29.26.
[53]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous graph attention network. In Proceedings of the World Wide Web Conference (WWW). 2022–2032.
[54]
Rongkai Xia, Yan Pan, Lei Du, and Jian Yin. 2014. Robust multi-view spectral clustering via low-rank and sparse decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence. 2149–2155.
[55]
Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embedding for clustering analysis. In Proceedings of the International Conference on Machine Learning (ICML). 478–487.
[56]
Zhipeng Xie, Rui Dong, Zhengheng Deng, Zhenying He, and Weidong Yang. 2013. A probabilistic approach to latent cluster analysis. In Twenty-third International Joint Conference on Artificial Intelligence.
[57]
Mengjia Xu. 2021. Understanding graph embedding methods and their applications. SIAM Rev. 63, 4 (2021), 825–853.
[58]
Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Chang. 2015. Network representation learning with rich text information. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2111–2117.
[59]
Linxiao Yang, Ngai-Man Cheung, Jiaying Li, and Jun Fang. 2019. Deep clustering by Gaussian mixture variational autoencoders with graph embedding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6440–6449.
[60]
Jianhua Yin and Jianyong Wang. 2014. A dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 233–242.
[61]
Zelin Zang, Siyuan Li, Di Wu, Jianzhu Guo, Yongjie Xu, and Stan Z. Li. 2021. Unsupervised deep manifold attributed graph embedding. arXiv preprint arXiv:2104.13048 (2021).
[62]
Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2018. Network representation learning: A survey. IEEE Transactions on Big Data 6 (2018), 3–28.
[63]
Xiaotong Zhang, Han Liu, Qimai Li, and Xiao-Ming Wu. 2019. Attributed graph clustering via adaptive graph convolution. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 4327–4333.
[64]
Xiaotong Zhang, Han Liu, Xiao-Ming Wu, Xianchao Zhang, and Xinyue Liu. 2021. Spectral embedding network for attributed graph clustering. Neural Networks 142 (2021), 388–396.
[65]
Zhen Zhang, Jiajun Bu, Martin Ester, Jianfeng Zhang, Chengwei Yao, Zhao Li, and Can Wang. 2020. Learning temporal interaction graph embedding via coupled memory networks. In Proceedings of the Web Conference 2020. 3049–3055.
[66]
Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2, 1 (2009), 718–729.

Cited By

View all
  • (2025)Contrastive deep graph clustering with hard boundary sample awarenessInformation Processing & Management10.1016/j.ipm.2024.10405062:3(104050)Online publication date: May-2025
  • (2024)Bridging spherical mixture distributions and word semantic knowledge for neural topic modelingExpert Systems with Applications10.1016/j.eswa.2024.124850(124850)Online publication date: Jul-2024
  • (2023)HBay: Predicting Human Mobility via Hyperspherical Bayesian LearningKnowledge Science, Engineering and Management10.1007/978-3-031-40286-9_21(249-262)Online publication date: 16-Aug-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web
ACM Transactions on the Web  Volume 18, Issue 2
May 2024
378 pages
EISSN:1559-114X
DOI:10.1145/3613666
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 January 2024
Online AM: 31 January 2023
Accepted: 03 December 2022
Revised: 03 November 2022
Received: 01 February 2022
Published in TWEB Volume 18, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph embedding
  2. graph clustering
  3. vMF

Qualifiers

  • Research-article

Funding Sources

  • Natural Science Foundation of China
  • Strategic Priority Research Program of CAS
  • Chinese Academy of Sciences Network Security and Informatization Special Application Demonstration

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)325
  • Downloads (Last 6 weeks)17
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Contrastive deep graph clustering with hard boundary sample awarenessInformation Processing & Management10.1016/j.ipm.2024.10405062:3(104050)Online publication date: May-2025
  • (2024)Bridging spherical mixture distributions and word semantic knowledge for neural topic modelingExpert Systems with Applications10.1016/j.eswa.2024.124850(124850)Online publication date: Jul-2024
  • (2023)HBay: Predicting Human Mobility via Hyperspherical Bayesian LearningKnowledge Science, Engineering and Management10.1007/978-3-031-40286-9_21(249-262)Online publication date: 16-Aug-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media