Skip to main content
Log in

CA-CSM: a novel clustering algorithm based on cluster center selection model

  • Foundations
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Clustering by fast search and find of density peaks (DPC) is a well-known algorithm due to the simple structure and high extensibility. It requires neither iteration nor additional parameters. However, DPC and most of its improvements still encounter some challenges such as parameter dependence, unreasonable metric and cluster center determination difficulty. Aiming at these issues, we propose a novel clustering algorithm based on cluster center selection model (CA-CSM). Firstly, we calculate the density parameter automatically according to the local information of the object to reduce the parameter dependence. Subsequently, we propose the concept of boundary degree to discriminate core objects from non-core objects. With the local density metric, we establish a model (CSM) with high expansibility to automatically detect the cluster centers from core objects. We test CA-CSM on 21 datasets using five benchmarks and compare it to 7 state-of-the-art algorithms. Extensive experiments and analysis show that our algorithm is feasible and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aliguliyev RM (2009) Performance evaluation of density-based clustering methods. Inf Sci 179(20):3583–3602

    Article  Google Scholar 

  • Averbuch-Elor H, Bar N, Cohen-Or D (2020) Border-peeling clustering. IEEE Trans Pattern Anal Mach Intell 42(7):1791–1797

    Article  Google Scholar 

  • Bache K, Lichman M (2013) Uci machine learning repository. http://archive.ics.uci.edu/ml

  • Bezdek JC, Ehrlich R, Full W (1984) Fcm: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  • Cao X, Qiu B, Li X, Shi Z, Xu G, Xu J (2018) Multidimensional balance-based cluster boundary detection for high-dimensional data. IEEE Trans Neural Netw Learn Syst 30(6):1867–1880

    Article  MathSciNet  Google Scholar 

  • Chen JY, He HH (2016) A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data. Inf Sci 345:271–293

    Article  Google Scholar 

  • Chen Y, Zhou L, Bouguila N, Wang C, Chen Y, Du J (2021) Block-dbscan: fast clustering for large scale data. Pattern Recognit 109:107624

    Article  Google Scholar 

  • Dai QZ, Xiong ZY, Xie J, Wang XX, Zhang YF, Shang JX (2019) A novel clustering algorithm based on the natural reverse nearest neighbor structure. Inf Syst 84:1–16

    Article  Google Scholar 

  • Ding S, Du M, Sun T, Xu X, Xue Y (2017) An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood. Knowl-Based Syst 133:294–313

    Article  Google Scholar 

  • Drugman T (2013) Residual excitation skewness for automatic speech polarity detection. IEEE Signal Process Lett 20(4):387–390

    Article  Google Scholar 

  • Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99(may1):135–145

    Article  Google Scholar 

  • Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96:226–231

    Google Scholar 

  • Fang F, Qiu L, Yuan S (2020) Adaptive core fusion-based density peak clustering for complex data with arbitrary shapes and densities. Pattern Recognit 107:107452

    Article  Google Scholar 

  • Flores KG, Garza SE (2020) Density peaks clustering with gap-based automatic center detection. Knowl-Based Syst 206:106350

    Article  Google Scholar 

  • Geng X, Meng L, Li L, Ji L, Sun K (2015) Momentum principal skewness analysis. IEEE Geosci Remote Sens Lett 12(11):2262–2266

    Article  Google Scholar 

  • Hacibeyoglu M, Ibrahim MH (2018) Ef-unique: an improved version of unsupervised equal frequency discretization method. Arab J Sci Eng 43(12):1–10

    Google Scholar 

  • Hai-peng C, Xuan-Jing S, Ying-da L, Jian-Wu L (2017) A novel automatic fuzzy clustering algorithm based on soft partition and membership information. Neurocomputing 236:104–112

    Article  Google Scholar 

  • Hou J, Zhang A, Qi N (2020) Density peak clustering based on relative density relationship. Pattern Recognit 108:107554

    Article  Google Scholar 

  • Ibrahim MH (2020) Wbba-km: a hybrid weight-based bat algorithm with k-means algorithm for cluster analysis. J Polytechn. https://doi.org/10.2339/politeknik.689384

  • Jiang J, Chen Y, Hao D, Li K (2019) Dpc-lg: density peaks clustering based on logistic distribution and gravitation. Phys A-Stat Mech Appl 514:25–35

    Article  Google Scholar 

  • Jiang J, Hao D, Chen Y, Parmar M, Li K (2018) Gdpc: gravitation-based density peaks clustering algorithm. Phys A-Stat Mech Appl 502:345–355

    Article  Google Scholar 

  • Jinyin C, Xiang L, Haibing Z, Xintong B (2017) A novel cluster center fast determination clustering algorithm. Appl Soft Comput 57:539–555

    Article  Google Scholar 

  • Li Z, Tang Y (2018) Comparative density peaks clustering. Expert Syst Appl 95:236–247

    Article  Google Scholar 

  • Liu R, Huang W, Fei Z, Wang K, Liang J (2019) Constraint-based clustering by fast search and find of density peaks. Neurocomputing 330:223–237

    Article  Google Scholar 

  • Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226

    Article  MathSciNet  Google Scholar 

  • Lord E, Willems M, Lapointe FJ, Makarenkov V (2017) Using the stability of objects to determine the number of clusters in datasets. Inf Sci 393:29–46

    Article  Google Scholar 

  • Lotfi A, Moradi P, Beigy H (2020) Density peaks clustering based on density backbone and fuzzy neighborhood. Pattern Recognit 107:107449

    Article  Google Scholar 

  • MacQueen J, et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1. Oakland, CA, USA, pp 281–297

  • Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217

    Article  Google Scholar 

  • Parmar M, Wang D, Zhang X, Tan AH, Miao C, Jiang J, Zhou Y (2019) Redpc: a residual error-based density peak clustering algorithm. Neurocomputing 348:82–96

    Article  Google Scholar 

  • Qiu B, Zhang R, Li X (2020) Clustering algorithm for mixed data based on residual analysis. ACTA Automatica Sinica 7:1420–1432

    MATH  Google Scholar 

  • Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  • Saha J, Mukherjee J (2021) Cnak: cluster number assisted k-means. Pattern Recognit 110:107625

    Article  Google Scholar 

  • Samet H (2008) K-nearest neighbor finding using maxnearestdist. IEEE Trans Pattern Anal Mach Intell 30(2):243–252

    Article  Google Scholar 

  • Xie J, Xiong ZY, Zhang YF, Feng Y, Ma J (2018) Density core-based clustering algorithm with dynamic scanning radius. Knowl-Based Syst 142:58–70

    Article  Google Scholar 

  • Xu Q, Zhang Q, Liu J, Luo B (2020) Efficient synthetical clustering validity indexes for hierarchical clustering. Expert Syst Appl 151:113367

    Article  Google Scholar 

  • Xu X, Ding S, Wang L, Wang Y (2020) A robust density peaks clustering algorithm with density-sensitive similarity. Knowl Based Syst 200:106028

    Article  Google Scholar 

  • Yan Y, Hao H, Xu B, Zhao J, Shen F (2020) Image clustering via deep embedded dimensionality reduction and probability-based triplet loss. IEEE Trans Image Process 29:5652–5661. https://doi.org/10.1109/TIP.2020.2984360

    Article  Google Scholar 

  • Yang XH, Zhu QP, Huang YJ, Xiao J, Wang L, Tong FC (2017) Parameter-free Laplacian centrality peaks clustering. Pattern Recognit Lett 100:167–173

    Article  Google Scholar 

  • Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl-Based Syst 133:208–220

    Article  Google Scholar 

  • Yu H, Chen L, Yao J (2021) A three-way density peak clustering method based on evidence theory. Knowl Based Syst 211:106532

    Article  Google Scholar 

  • Zhang Z, Liu L, Shen F, Shen HT, Shao L (2019) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782. https://doi.org/10.1109/TPAMI.2018.2847335

    Article  Google Scholar 

  • Zhang Y, Mandziuk J, Quek CH, Goh BW (2017) Curvature-based method for determining the number of clusters–sciencedirect. Inf Sci 415–416:414–428

    Article  Google Scholar 

Download references

Acknowledgements

This study was funded by Shenzhen Fundamental Research Plan (No. JCYJ20160226201453085).

Funding

This study was funded by Shenzhen Fundamental Research Plan (NO. JCYJ20160226201453085).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongpeng Wang.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Research involving human participants and/or animal

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, R., Song, X., Ying, S. et al. CA-CSM: a novel clustering algorithm based on cluster center selection model. Soft Comput 25, 8015–8033 (2021). https://doi.org/10.1007/s00500-021-05835-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05835-w

Keywords

Navigation