Abstract
Clustering by fast search and find of density peaks (DPC) is a well-known algorithm due to the simple structure and high extensibility. It requires neither iteration nor additional parameters. However, DPC and most of its improvements still encounter some challenges such as parameter dependence, unreasonable metric and cluster center determination difficulty. Aiming at these issues, we propose a novel clustering algorithm based on cluster center selection model (CA-CSM). Firstly, we calculate the density parameter automatically according to the local information of the object to reduce the parameter dependence. Subsequently, we propose the concept of boundary degree to discriminate core objects from non-core objects. With the local density metric, we establish a model (CSM) with high expansibility to automatically detect the cluster centers from core objects. We test CA-CSM on 21 datasets using five benchmarks and compare it to 7 state-of-the-art algorithms. Extensive experiments and analysis show that our algorithm is feasible and effective.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aliguliyev RM (2009) Performance evaluation of density-based clustering methods. Inf Sci 179(20):3583–3602
Averbuch-Elor H, Bar N, Cohen-Or D (2020) Border-peeling clustering. IEEE Trans Pattern Anal Mach Intell 42(7):1791–1797
Bache K, Lichman M (2013) Uci machine learning repository. http://archive.ics.uci.edu/ml
Bezdek JC, Ehrlich R, Full W (1984) Fcm: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Cao X, Qiu B, Li X, Shi Z, Xu G, Xu J (2018) Multidimensional balance-based cluster boundary detection for high-dimensional data. IEEE Trans Neural Netw Learn Syst 30(6):1867–1880
Chen JY, He HH (2016) A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data. Inf Sci 345:271–293
Chen Y, Zhou L, Bouguila N, Wang C, Chen Y, Du J (2021) Block-dbscan: fast clustering for large scale data. Pattern Recognit 109:107624
Dai QZ, Xiong ZY, Xie J, Wang XX, Zhang YF, Shang JX (2019) A novel clustering algorithm based on the natural reverse nearest neighbor structure. Inf Syst 84:1–16
Ding S, Du M, Sun T, Xu X, Xue Y (2017) An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood. Knowl-Based Syst 133:294–313
Drugman T (2013) Residual excitation skewness for automatic speech polarity detection. IEEE Signal Process Lett 20(4):387–390
Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99(may1):135–145
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96:226–231
Fang F, Qiu L, Yuan S (2020) Adaptive core fusion-based density peak clustering for complex data with arbitrary shapes and densities. Pattern Recognit 107:107452
Flores KG, Garza SE (2020) Density peaks clustering with gap-based automatic center detection. Knowl-Based Syst 206:106350
Geng X, Meng L, Li L, Ji L, Sun K (2015) Momentum principal skewness analysis. IEEE Geosci Remote Sens Lett 12(11):2262–2266
Hacibeyoglu M, Ibrahim MH (2018) Ef-unique: an improved version of unsupervised equal frequency discretization method. Arab J Sci Eng 43(12):1–10
Hai-peng C, Xuan-Jing S, Ying-da L, Jian-Wu L (2017) A novel automatic fuzzy clustering algorithm based on soft partition and membership information. Neurocomputing 236:104–112
Hou J, Zhang A, Qi N (2020) Density peak clustering based on relative density relationship. Pattern Recognit 108:107554
Ibrahim MH (2020) Wbba-km: a hybrid weight-based bat algorithm with k-means algorithm for cluster analysis. J Polytechn. https://doi.org/10.2339/politeknik.689384
Jiang J, Chen Y, Hao D, Li K (2019) Dpc-lg: density peaks clustering based on logistic distribution and gravitation. Phys A-Stat Mech Appl 514:25–35
Jiang J, Hao D, Chen Y, Parmar M, Li K (2018) Gdpc: gravitation-based density peaks clustering algorithm. Phys A-Stat Mech Appl 502:345–355
Jinyin C, Xiang L, Haibing Z, Xintong B (2017) A novel cluster center fast determination clustering algorithm. Appl Soft Comput 57:539–555
Li Z, Tang Y (2018) Comparative density peaks clustering. Expert Syst Appl 95:236–247
Liu R, Huang W, Fei Z, Wang K, Liang J (2019) Constraint-based clustering by fast search and find of density peaks. Neurocomputing 330:223–237
Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Lord E, Willems M, Lapointe FJ, Makarenkov V (2017) Using the stability of objects to determine the number of clusters in datasets. Inf Sci 393:29–46
Lotfi A, Moradi P, Beigy H (2020) Density peaks clustering based on density backbone and fuzzy neighborhood. Pattern Recognit 107:107449
MacQueen J, et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1. Oakland, CA, USA, pp 281–297
Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217
Parmar M, Wang D, Zhang X, Tan AH, Miao C, Jiang J, Zhou Y (2019) Redpc: a residual error-based density peak clustering algorithm. Neurocomputing 348:82–96
Qiu B, Zhang R, Li X (2020) Clustering algorithm for mixed data based on residual analysis. ACTA Automatica Sinica 7:1420–1432
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Saha J, Mukherjee J (2021) Cnak: cluster number assisted k-means. Pattern Recognit 110:107625
Samet H (2008) K-nearest neighbor finding using maxnearestdist. IEEE Trans Pattern Anal Mach Intell 30(2):243–252
Xie J, Xiong ZY, Zhang YF, Feng Y, Ma J (2018) Density core-based clustering algorithm with dynamic scanning radius. Knowl-Based Syst 142:58–70
Xu Q, Zhang Q, Liu J, Luo B (2020) Efficient synthetical clustering validity indexes for hierarchical clustering. Expert Syst Appl 151:113367
Xu X, Ding S, Wang L, Wang Y (2020) A robust density peaks clustering algorithm with density-sensitive similarity. Knowl Based Syst 200:106028
Yan Y, Hao H, Xu B, Zhao J, Shen F (2020) Image clustering via deep embedded dimensionality reduction and probability-based triplet loss. IEEE Trans Image Process 29:5652–5661. https://doi.org/10.1109/TIP.2020.2984360
Yang XH, Zhu QP, Huang YJ, Xiao J, Wang L, Tong FC (2017) Parameter-free Laplacian centrality peaks clustering. Pattern Recognit Lett 100:167–173
Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl-Based Syst 133:208–220
Yu H, Chen L, Yao J (2021) A three-way density peak clustering method based on evidence theory. Knowl Based Syst 211:106532
Zhang Z, Liu L, Shen F, Shen HT, Shao L (2019) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782. https://doi.org/10.1109/TPAMI.2018.2847335
Zhang Y, Mandziuk J, Quek CH, Goh BW (2017) Curvature-based method for determining the number of clusters–sciencedirect. Inf Sci 415–416:414–428
Acknowledgements
This study was funded by Shenzhen Fundamental Research Plan (No. JCYJ20160226201453085).
Funding
This study was funded by Shenzhen Fundamental Research Plan (NO. JCYJ20160226201453085).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no conflict of interest.
Research involving human participants and/or animal
This article does not contain any studies with human participants performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, R., Song, X., Ying, S. et al. CA-CSM: a novel clustering algorithm based on cluster center selection model. Soft Comput 25, 8015–8033 (2021). https://doi.org/10.1007/s00500-021-05835-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05835-w