Abstract
Clustering analysis is a data analysis technique, it groups a set of data points into multiple clusters with similar data points. However, clustering of high dimensional data is still a difficult task. In order to facilitate this task, people usually use hypergraphs to represent the complex relationships between high dimensional data. In this paper, the hypergraph is used to improve the representation of the complex high dimensional data, and a multi-stage hierarchical clustering method based on hypergraph partition and Chameleon algorithm is proposed. The proposed method constructs a hypergraph in the shared-nearest-neighbor (SNN) graph from the dataset and then employs a hypergraph partitioning method hMETIS to obtain a series of subgraphs, finally those subgraphs are merged to get the final clusters. Experiments show that the proposed method is better than Chameleon algorithm and the other four clustering methods when applied on four UCI datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Han, J., Kamber, M.: Data Mining: Concept and Technology. Machine Industry Press (2001)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (2009)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Zhao, Y., Karypis, G.: Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10(2), 141–168 (2005). https://doi.org/10.1007/s10618-005-0361-3
Cao, X., Su, T., Wang, P., et al.: An optimized chameleon algorithm based on local features. In: Proceedings of the 2018 10th International Conference on Machine Learning and Computing, pp. 184–192 (2018)
Dong, Y., Wang, Y., Jiang, K.: Improvement of partitioning and merging phase in chameleon clustering algorithm. In: 2018 3rd International Conference on Computer and Communication Systems, pp. 29–32 (2018)
Barton, T., Bruna, T., KordÃk, P.: Chameleon 2: an improved graph-based clustering algorithm. ACM Trans. Knowl. Discov. Data 13(1), 10.1–10.27 (2019)
Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: clustering, classification, and embedding. In: Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, pp. 1601–1608. MIT Press (2010)
Karypis, G., Aggarwal, R., Kumar, V., et al.: Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 7(1), 69–79 (1999)
Wang, T., Lu, Y., Han, Y.: Clustering of high dimensional handwritten data by an improved hypergraph partition method. In: Huang, D.-S., Hussain, A., Han, K., Gromiha, M.M. (eds.) ICIC 2017. LNCS (LNAI), vol. 10363, pp. 323–334. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63315-2_28
Kumar, T., Vaidyanathan, S., Ananthapadmanabhan, H., et al.: Hypergraph clustering: a modularity maximization approach. arXiv preprint arXiv:1812.10869 (2018)
Veldt, N., Benson, A.R., Kleinberg, J.: Localized flow-based clustering in hypergraphs. arXiv preprint arXiv:2002.09441 (2020)
Karypis, G., Kumar, V.: METIS–unstructured graph partitioning and sparse matrix ordering system, version 2.0 (1995)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Eppstein, D., Löffler, M., Strash, D.: Listing all maximal cliques in sparse graphs in near-optimal time. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010. LNCS, vol. 6506, pp. 403–414. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17517-6_36
Hu, T., Liu, C., Tang, Y., et al.: High dimensional clustering: a clique-based hypergraph partitioning framework. Knowl. Inf. Syst. 39(1), 61–88 (2014). https://doi.org/10.1007/s10115-012-0609-3
Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Papers on Twenty-Five Years of Electronic Design Automation, pp. 241–247. ACM (1988)
UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/index.php. Accessed 11 Apr 2020
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)
Acknowledgments
This work is supported by the National Key R&D Program of China (Grants No. 2017YFE0111900, 2018YFB1003205).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Xi, Y., Lu, Y. (2020). Multi-stage Hierarchical Clustering Method Based on Hypergraph. In: Huang, DS., Premaratne, P. (eds) Intelligent Computing Methodologies. ICIC 2020. Lecture Notes in Computer Science(), vol 12465. Springer, Cham. https://doi.org/10.1007/978-3-030-60796-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-60796-8_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60795-1
Online ISBN: 978-3-030-60796-8
eBook Packages: Computer ScienceComputer Science (R0)