Skip to main content

An efficient clustering algorithm based on searching popularity peaks

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In order to address some deficiencies of the density peak clustering algorithm, namely sensitivity to density kernels and challenges with large density differences across clusters, we propose a popularity peak clustering algorithm that is based on a more robust notion of density called popularity. The popularity of a sample is computed according to the number, similarity and popularity of points that have the sample in their k-nearest neighbors. The popularity concept has some properties that help in handling challenges like identifying cluster centers in sparse regions and handling situations with large density differences across clusters. Moreover, in the density peak clustering algorithm, the strategy of assigning non-center points to the same cluster as their nearest higher-density neighbor can cause error propagation. To address this issue, we also propose a new popularity-based label assignment strategy. Our results demonstrate that the proposed algorithm can recognize clusters regardless of their densities and overlap degree and can often outperform the existing density peak clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The data that support the findings of this study are openly available in Kaggle and UCI Machine Learning Repositories at https://www.kaggle.com/datasets and https://archive.ics.uci.edu/ml/index.php respectively and also at https://cs.joensuu.fi/sipu/datasets/.

References

  1. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Google Scholar 

  2. Ji X, Wang G, Deng W (2016) DenPEHC: density peak based efficient hierarchical clustering. Inf Sci 373:200–218

    Google Scholar 

  3. Hou J, Cui H (2017) Experimental evaluation of a density kernel in clustering. In: International conference on intelligent control & information processing, pp 55-59

  4. Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217

    Google Scholar 

  5. Zhou Z, Si G, Zhang Y, Zheng K (2018) Robust clustering by identifying the veins of clusters based on kernel density estimation. (Knowl Based Syst) Based Syst 159:309–320

    Google Scholar 

  6. Lotfi A, Moradi P, Beigy H (2020) Density peaks clustering based on density backbone and fuzzy neighborhood. Pattern Recognit 107:107449

    Google Scholar 

  7. Seyedi SA, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328

    Google Scholar 

  8. Mingjing D, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145

    Google Scholar 

  9. Guo Z, Huang T, Cai Z, Zhu W (2018) A new local density for density peak clustering. PAKDD 3:426–438

    Google Scholar 

  10. Fan J-C, Jia P, Ge L (2020) Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph. Int J Mach Learn Cybern 11(6):1179–1195

    Google Scholar 

  11. Wang Y, Wang D, Zhang X, Pang W, Miao C, Tan A-H, Zhou Y (2020) Mcdpc: multi-center density peak clustering. Neural Comput Appl 32(17):13465–13478

    Google Scholar 

  12. Xie J, Weiliang J (2017) Clustering by searching density peaks via local standard deviation. IDEAL, Lijuan Ding, pp 295–305

    Google Scholar 

  13. Xiao X, Ding S, Shi Z (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl Based Syst 158:65–74

    Google Scholar 

  14. Xiao X, Ding S, Mingjing D, Xue Yu (2018) DPCG: an efficient density peaks clustering algorithm based on grid. Int J Mach Learn Cybern 9(5):743–754

    Google Scholar 

  15. Agrawal Rakesh, Gehrke Johannes, Gunopulos Dimitrios, Raghavan Prabhakar (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD conference pp 94–105

  16. Xie J, Gao H, Xie W, Liu X, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors. Inf Sci 354:19–40

    Google Scholar 

  17. Liu Y, Zhengming Ma Yu, Fang (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220

    Google Scholar 

  18. Liu R, Wang H, Xiaomei Y (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226

    MathSciNet  Google Scholar 

  19. Zhang W, Li J (2015) Extended fast search clustering algorithm: widely density clusters, no density peaks. https://doi.org/10.5121/csit.2015.50701. arXiv preprint arXiv:1505.05610

  20. Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering. Pattern Recogn Lett 73:52–59

    Google Scholar 

  21. Gong S, Zhang Y (2016) EDDPC: an efficient distributed density peaks clustering algorithm. J Comput Res Develop 53(6):1400–1409

    Google Scholar 

  22. Chen Y, Hu X, Fan W, Shen L, Zhang Z, Liu X, Du J, Li H, Chen Y, Li H (2020) Fast density peak clustering for large scale data based on kNN. Knowl Based Syst 187:104824

    Google Scholar 

  23. Sieranoja S, Franti P (2019) Fast and general density peaks clustering. Pattern Recognit Lett 128:551–558

    Google Scholar 

  24. Parmar M, Wang D, Zhang X, Tan AH, Miao C, Jiang J, Zhou Y (2019) REDPC: a residual error-based density peak clustering algorithm. Neurocomputing 348:82–96

    Google Scholar 

  25. Huang L, Wang G, Wang Y et al (2016) A link density clustering algorithm based on automatically selecting density peaks for overlapping community detection. Int J Modern Phys B 30(24):1650167

    MathSciNet  Google Scholar 

  26. Chen YW, Lai DH, Qi H et al (2016) A new method to estimate ages of facial image for large database. Multimed Tools Appl 75(5):2877–2895

    Google Scholar 

  27. Mingjing D, Ding S, Xiao X, Xue Yu (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern 9(8):1335–1349

    Google Scholar 

  28. Sharma KK, Aya S, Anis Y, Ondrej K (2022) A new adaptive mixture distance-based improved density peaks clustering for gearbox fault diagnosis. IEEE Trans Instrum Meas 71:1–16

    Google Scholar 

  29. Sharma KK, Ayan S, Enrique H-V, Ondrej K (2021) An enhanced spectral clustering algorithm with S-distance. Symmetry 13(4):596

    Google Scholar 

  30. Ng Andrew Y, Jordan Michael I, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 14, pp 849–856

  31. Motallebi H, Nasihatkon R, Jamshidi M (2022) A local mean-based distance measure for spectral clustering. Pattern Anal Appl 25(2):351–359

    Google Scholar 

  32. Chakraborty S, Das S (2017) “k- means clustering with a new divergence-based distance metric: convergence and performance analysis,’’. Pattern Recogn Lett 100:67–73

    Google Scholar 

  33. Seal A, Karlekar A, Krejcar O, Herrera-Viedma E (2021) Performance and convergence analysis of modified C-means using Jeffreys-divergence for clustering. Int J Interact Multim Artif Intell 7(2):141

    Google Scholar 

  34. Sharma KK, Ayan S, Anis Y, Ali S, Ondrej K (2021) Clustering uncertain data objects using Jeffreys-divergence and maximum bipartite matching based similarity measure. IEEE Access 9:79505–79519

    Google Scholar 

  35. Lin J-L (2019) Accelerating density peak clustering algorithm. Symmetry 11(7):859

    Google Scholar 

  36. Hou J, Zhang A (2020) Enhancing density peak clustering via density normalization. IEEE Trans Ind Inf 16(4):2477–2485

    Google Scholar 

  37. Mingjing D, Shifei Ding Yu, Xue (2018) A robust density peaks clustering algorithm using fuzzy neighborhood. Int J Mach Learn Cybern 9(7):1131–1140

    Google Scholar 

  38. Nasibov EN, Ulutagay G (2007) A new unsupervised approach for fuzzy clustering. Fuzzy Sets Syst 158:2118–2133

    MathSciNet  Google Scholar 

  39. Hou J, Lv C, Zhang A (2019) Merging DBSCAN and density peak for robust clustering. ICANN 4:595–610

    Google Scholar 

  40. Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Int Conf Knowl Discov Data Mining 10:226–231

    Google Scholar 

  41. Liu X, Fan J-C, Chen Z (2020) Improved fuzzy C-means algorithm based on density peak. Int J Mach Learn Cybern 11(3):545–552

    Google Scholar 

  42. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57

    MathSciNet  Google Scholar 

  43. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Adv Appl Pattern Recognit 22(1171):203–239

    Google Scholar 

  44. Ethier SN, Kurtz TG (1986) Markov processes: characterization and convergence. Wiley series in probability and mathematical statistics. Wiley, New York. https://doi.org/10.1002/9780470316658

    Chapter  Google Scholar 

  45. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Google Scholar 

  46. Brendan J, Dueck FD (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    MathSciNet  Google Scholar 

  47. Dueck D, Frey BJ, Jojic N, Jojic V, Giaever G, Emili A, Gabe M (2008) Constructing treatment portfolios using affinity propagation. RECOMB, Robert Hegele, pp 360–371

    Google Scholar 

  48. Kumar Abhishek , Daume Hal (2011) A co-training approach for multi-view spectral clustering. ICML 393-400

  49. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619

    Google Scholar 

  50. Chen M, Li L, Wang B, Cheng J, Pan L, Chen X (2016) Effectively clustering by finding density backbone based-on kNN. Pattern Recognit 60:486–498

    Google Scholar 

  51. Hou J, Zhang A, Qi N (2020) Density peak clustering based on relative density relationship. Pattern Recognit 108:107554

    Google Scholar 

  52. Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: part I. SIGMOD Rec 31(2):40–45

    Google Scholar 

  53. Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203

    Google Scholar 

  54. Madson Luiz Dantas Dias (2019) fuzzy-c-means: An implementation of Fuzzy \(C\)-means clustering algorithm. https://doi.org/10.5281/zenodo.3066222 (https://git.io/fuzzy-c-means)

  55. Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):4

    Google Scholar 

  56. Fu L, Medico E (2007) FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform 8:1–15

    Google Scholar 

  57. Franti P, Virmajoki O (2006) Iterative shrinking method for clustering problems. Pattern Recognit 39(5):761–775

    Google Scholar 

  58. Rezaei M, Franti P (2020) Can the number of clusters be determined by external indices? IEEE Access 8:89239–89257

    Google Scholar 

  59. Franti P, Virmajoki O, Hautamaki V (2006) Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans Pattern Anal Mach Intell 28(11):1875–1881

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hassan Motallebi.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Motallebi, H., Malakoutifar, N. An efficient clustering algorithm based on searching popularity peaks. Pattern Anal Applic 27, 67 (2024). https://doi.org/10.1007/s10044-024-01261-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10044-024-01261-4

Keywords