Abstract
Chameleon algorithm is a hierarchical clustering based on dynamic modeling. It can find high-quality clusters with different shapes, sizes and densities. However, Chameleon algorithm requires user-specifiedkwhen constructing sparse graph, which directly influences the clustering performance. In addition, the graph-partitioning technology used in the original algorithm, hMetis algorithm, is hard to build operation environment, and the number of partitions needs to be specified. These problems are arduous to determine without prior knowledge. In order to overcome the first problem, this paper introduces an improved natural neighbor method to construct a sparse graph, which can reflect the initial sparseness of the data. To address the second problems, this paper proposes a new method of generating sub-clusters in sparse graphs, which is simple and objective. In summary, this paper proposes Chameleon Algorithm Based on Improved Natural Neighbor Graph Generating Sub-clusters (INNGS-Chameleon). This algorithm is tested on 8 synthetic data sets and 10 UCI data sets. The results are compared with the Chameleon algorithm, its improved algorithm and several classic algorithms. The experimental results show that the INNGS-Chameleon algorithm is feasible and effective.
Similar content being viewed by others
References
Xu X, Ding S, Shi Z (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl-Based Syst 158:65–74
Wang H, Yang Y, Liu B, Fujita H (2019) A study of graph-based system for multi-view clustering. Knowl-Based Syst 163:1009–1019
Zanin M, Papo D, Sousa PA et al (2016) Combining complex networks and data mining: why and how. Phys Rep-Rev Sect Phys Lett 635:1–44
Fahy C, Yang S, Gongora M (2019) Ant Colony stream clustering: a fast density clustering algorithm for dynamic data streams. IEEE Trans Cybern 49(6):2215–2228
Mojarad M, Nejatian S, Parvin H, Mohammadpoor M (2019) A fuzzy clustering ensemble based on cluster clustering and iterative fusion of base clusters. Appl Intell 49(7):2567–2581
Lai T, Chen R, Yang C, Li Q, Fujita H, Sadri A, Wang H (2020) Efficient robust model fitting for multistructure data using global greedy search. IEEE Trans Cybern 50(7):3294–3306
Weber LM, Robinson MD (2016) Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data. Cytom Part A 89A(12):1084–1096
Chen SM, Cheng SH, Lan TC (2016) A novel similarity measure between intuitionistic fuzzy sets based on the centroid points of transformed fuzzy numbers with applications to pattern recognition. Inf Sci 343:15–40
Zhong YF, Ma AL, Ong YS et al (2018) Computational intelligence in optical remote sensing image processing. Appl Soft Comput 64:75–93
Thomas MC, Zhu W, Romagnoli JA (2017) Data mining and clustering in chemical process databases for monitoring and knowledge discovery. J Process Control 67:160–175
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Cai Q, Liu J (2019) Hierarchical clustering of bipartite networks based on multiobjective optimization. IEEE Trans Netw Sci Eng 7(1):421–434
Ros F, Guillaume S (2019) A hierarchical clustering algorithm and an improvement of the single linkage criterion to deal with noise. Expert Syst Appl 128:96–108
Jafarzadegan M, Safi-Esfahani F, Beheshti Z (2019) Combining hierarchical clustering approaches using the PCA method. Expert Syst Appl 137:1–10
Xie WB, Lee YL, Wang C, Chen DB, Zhou T (2020) Hierarchical clustering supported by reciprocal nearest neighbors. Inf Sci 527:279–292
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large. ACM SIGMOD Rec 25(2):103–114
Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. Inf Syst 26(1):35–58
Guha S, Rastogi R, Shim K (2002) ROCK: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366
Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Karypis G, Aggarwal R, Kumar V, Shekhar S (1999) Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans Very Large Scale Integr Syst 7(1):69–79
Zhang Y, Ding S, Wang L, Wang Y, Ding L (2020) Chameleon algorithm based on mutual k-nearest neighbors. Appl Intell. https://doi.org/10.1007/s10489-020-01926-7
Zhang W, Li J (2015) Extended fast search clustering algorithm: widely density clusters, no density peaks. Comput Sci Inf Technol 5(7):1–17
Barton T, Bruna T, Kordik P (2019) Chameleon 2: an improved graph-based clustering algorithm. ACM Trans Knowl Discov Data 13(1):1–27
Guo D, Zhao J, Liu J (2019) Research and Application of Improved CHAMELEON Algorithm Based on Condensed Hierarchical Clustering Method. In: Proceedings of the 2019 8th international conference on networks. Communication and Computing. Association for Computing Machinery, Luoyang, pp 14–18
Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter K. Pattern Recogn Lett 80:30–36
Agarwal M, Jaiswal R, Pal A (2015) K-means++ under approximation stability. Theor Comput Sci 588:37–51
Xu X, Ding S, Xu H et al (2018) A feasible density peaks clustering algorithm with a merging strategy. Soft Comput 23(13):5171–5183
Zhang S, Wong HS, Shen Y (2012) Generalized adjusted Rand indices for cluster ensembles. Pattern Recogn 45(6):2214–2226
Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854
Macqueen J B (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, University of California Press, 5.1:281–297
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD'96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 226–231
Schubert E, Sander J, Ester M, Kriegel HP, Xu X (2017) DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans Database Syst 42(3):1–21
Acknowledgements
This work is supported by the National Natural Science Foundations of China (no.61976216 and no.61672522).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Y., Ding, S., Wang, Y. et al. Chameleon algorithm based on improved natural neighbor graph generating sub-clusters. Appl Intell 51, 8399–8415 (2021). https://doi.org/10.1007/s10489-021-02389-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02389-0