Abstract
In order to address some deficiencies of the density peak clustering algorithm, namely sensitivity to density kernels and challenges with large density differences across clusters, we propose a popularity peak clustering algorithm that is based on a more robust notion of density called popularity. The popularity of a sample is computed according to the number, similarity and popularity of points that have the sample in their k-nearest neighbors. The popularity concept has some properties that help in handling challenges like identifying cluster centers in sparse regions and handling situations with large density differences across clusters. Moreover, in the density peak clustering algorithm, the strategy of assigning non-center points to the same cluster as their nearest higher-density neighbor can cause error propagation. To address this issue, we also propose a new popularity-based label assignment strategy. Our results demonstrate that the proposed algorithm can recognize clusters regardless of their densities and overlap degree and can often outperform the existing density peak clustering algorithms.
Similar content being viewed by others
Data availability
The data that support the findings of this study are openly available in Kaggle and UCI Machine Learning Repositories at https://www.kaggle.com/datasets and https://archive.ics.uci.edu/ml/index.php respectively and also at https://cs.joensuu.fi/sipu/datasets/.
References
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Ji X, Wang G, Deng W (2016) DenPEHC: density peak based efficient hierarchical clustering. Inf Sci 373:200–218
Hou J, Cui H (2017) Experimental evaluation of a density kernel in clustering. In: International conference on intelligent control & information processing, pp 55-59
Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217
Zhou Z, Si G, Zhang Y, Zheng K (2018) Robust clustering by identifying the veins of clusters based on kernel density estimation. (Knowl Based Syst) Based Syst 159:309–320
Lotfi A, Moradi P, Beigy H (2020) Density peaks clustering based on density backbone and fuzzy neighborhood. Pattern Recognit 107:107449
Seyedi SA, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328
Mingjing D, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145
Guo Z, Huang T, Cai Z, Zhu W (2018) A new local density for density peak clustering. PAKDD 3:426–438
Fan J-C, Jia P, Ge L (2020) Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph. Int J Mach Learn Cybern 11(6):1179–1195
Wang Y, Wang D, Zhang X, Pang W, Miao C, Tan A-H, Zhou Y (2020) Mcdpc: multi-center density peak clustering. Neural Comput Appl 32(17):13465–13478
Xie J, Weiliang J (2017) Clustering by searching density peaks via local standard deviation. IDEAL, Lijuan Ding, pp 295–305
Xiao X, Ding S, Shi Z (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl Based Syst 158:65–74
Xiao X, Ding S, Mingjing D, Xue Yu (2018) DPCG: an efficient density peaks clustering algorithm based on grid. Int J Mach Learn Cybern 9(5):743–754
Agrawal Rakesh, Gehrke Johannes, Gunopulos Dimitrios, Raghavan Prabhakar (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD conference pp 94–105
Xie J, Gao H, Xie W, Liu X, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors. Inf Sci 354:19–40
Liu Y, Zhengming Ma Yu, Fang (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220
Liu R, Wang H, Xiaomei Y (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Zhang W, Li J (2015) Extended fast search clustering algorithm: widely density clusters, no density peaks. https://doi.org/10.5121/csit.2015.50701. arXiv preprint arXiv:1505.05610
Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy: 3DC clustering. Pattern Recogn Lett 73:52–59
Gong S, Zhang Y (2016) EDDPC: an efficient distributed density peaks clustering algorithm. J Comput Res Develop 53(6):1400–1409
Chen Y, Hu X, Fan W, Shen L, Zhang Z, Liu X, Du J, Li H, Chen Y, Li H (2020) Fast density peak clustering for large scale data based on kNN. Knowl Based Syst 187:104824
Sieranoja S, Franti P (2019) Fast and general density peaks clustering. Pattern Recognit Lett 128:551–558
Parmar M, Wang D, Zhang X, Tan AH, Miao C, Jiang J, Zhou Y (2019) REDPC: a residual error-based density peak clustering algorithm. Neurocomputing 348:82–96
Huang L, Wang G, Wang Y et al (2016) A link density clustering algorithm based on automatically selecting density peaks for overlapping community detection. Int J Modern Phys B 30(24):1650167
Chen YW, Lai DH, Qi H et al (2016) A new method to estimate ages of facial image for large database. Multimed Tools Appl 75(5):2877–2895
Mingjing D, Ding S, Xiao X, Xue Yu (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern 9(8):1335–1349
Sharma KK, Aya S, Anis Y, Ondrej K (2022) A new adaptive mixture distance-based improved density peaks clustering for gearbox fault diagnosis. IEEE Trans Instrum Meas 71:1–16
Sharma KK, Ayan S, Enrique H-V, Ondrej K (2021) An enhanced spectral clustering algorithm with S-distance. Symmetry 13(4):596
Ng Andrew Y, Jordan Michael I, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 14, pp 849–856
Motallebi H, Nasihatkon R, Jamshidi M (2022) A local mean-based distance measure for spectral clustering. Pattern Anal Appl 25(2):351–359
Chakraborty S, Das S (2017) “k- means clustering with a new divergence-based distance metric: convergence and performance analysis,’’. Pattern Recogn Lett 100:67–73
Seal A, Karlekar A, Krejcar O, Herrera-Viedma E (2021) Performance and convergence analysis of modified C-means using Jeffreys-divergence for clustering. Int J Interact Multim Artif Intell 7(2):141
Sharma KK, Ayan S, Anis Y, Ali S, Ondrej K (2021) Clustering uncertain data objects using Jeffreys-divergence and maximum bipartite matching based similarity measure. IEEE Access 9:79505–79519
Lin J-L (2019) Accelerating density peak clustering algorithm. Symmetry 11(7):859
Hou J, Zhang A (2020) Enhancing density peak clustering via density normalization. IEEE Trans Ind Inf 16(4):2477–2485
Mingjing D, Shifei Ding Yu, Xue (2018) A robust density peaks clustering algorithm using fuzzy neighborhood. Int J Mach Learn Cybern 9(7):1131–1140
Nasibov EN, Ulutagay G (2007) A new unsupervised approach for fuzzy clustering. Fuzzy Sets Syst 158:2118–2133
Hou J, Lv C, Zhang A (2019) Merging DBSCAN and density peak for robust clustering. ICANN 4:595–610
Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Int Conf Knowl Discov Data Mining 10:226–231
Liu X, Fan J-C, Chen Z (2020) Improved fuzzy C-means algorithm based on density peak. Int J Mach Learn Cybern 11(3):545–552
Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Adv Appl Pattern Recognit 22(1171):203–239
Ethier SN, Kurtz TG (1986) Markov processes: characterization and convergence. Wiley series in probability and mathematical statistics. Wiley, New York. https://doi.org/10.1002/9780470316658
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Brendan J, Dueck FD (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Dueck D, Frey BJ, Jojic N, Jojic V, Giaever G, Emili A, Gabe M (2008) Constructing treatment portfolios using affinity propagation. RECOMB, Robert Hegele, pp 360–371
Kumar Abhishek , Daume Hal (2011) A co-training approach for multi-view spectral clustering. ICML 393-400
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Chen M, Li L, Wang B, Cheng J, Pan L, Chen X (2016) Effectively clustering by finding density backbone based-on kNN. Pattern Recognit 60:486–498
Hou J, Zhang A, Qi N (2020) Density peak clustering based on relative density relationship. Pattern Recognit 108:107554
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: part I. SIGMOD Rec 31(2):40–45
Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203
Madson Luiz Dantas Dias (2019) fuzzy-c-means: An implementation of Fuzzy \(C\)-means clustering algorithm. https://doi.org/10.5281/zenodo.3066222 (https://git.io/fuzzy-c-means)
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):4
Fu L, Medico E (2007) FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform 8:1–15
Franti P, Virmajoki O (2006) Iterative shrinking method for clustering problems. Pattern Recognit 39(5):761–775
Rezaei M, Franti P (2020) Can the number of clusters be determined by external indices? IEEE Access 8:89239–89257
Franti P, Virmajoki O, Hautamaki V (2006) Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans Pattern Anal Mach Intell 28(11):1875–1881
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Motallebi, H., Malakoutifar, N. An efficient clustering algorithm based on searching popularity peaks. Pattern Anal Applic 27, 67 (2024). https://doi.org/10.1007/s10044-024-01261-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10044-024-01261-4