Abstract
Density peaks clustering (DPC) model focuses on searching density peaks and clustering data with arbitrary shapes for machine learning. However, it is difficult for DPC to select a cut-off distance in the calculation of a local density of points, and DPC easily ignores the cluster centers with lower density in datasets with variable densities. In addition, for clusters with complex shapes, DPC selects only one cluster center for a cluster, meaning that the structure of the whole cluster is not fully reflected. To overcome these drawbacks, this paper presents a novel DPC model that merges microclusters based on k-nearest neighbors (kNN) and self-recommendation, called DPC-MC for short. First, the kNN-based neighbourhood of point is defined and the mutual neighbour degree of point is presented in this neighbourhood, and then a new local density based on the mutual neighbour degree is proposed. This local density does not need to set the cut-off distance manually. Second, to address the artificial setting of cluster centers, a self-recommendation strategy for local centers is provided. Third, after the selection of multiple local centers, the binding degree of microclusters is developed to quantify the combination degree between a microcluster and its neighbour clusters. After that, homogeneous clusters are found according to the binding degree of microclusters during the process of deleting boundary points layer by layer. The homologous clusters are merged, the points in the abnormal clusters are reallocated, and then the clustering process ends. Finally, the DPC-MC algorithm is designed, and nine synthetic datasets and twenty-seven real-world datasets are used to verify the effectiveness of our algorithm. The experimental results demonstrate that the presented algorithm outperforms other compared algorithms in terms of several evaluation metrics for clustering.
Similar content being viewed by others
References
Yang M, Changchien S, Nataliani Y (2019) Unsupervised fuzzy model-based Gaussian clustering. Inf Sci 481:1–23
Sun L, Liu R, Xu J, Zhang S, Tian Y (2018) An Affinity propagation clustering method using hybrid kernel function with LLE. IEEE Access 6:68892–68909
Wei S, Li Z, Zhang C (2018) Combined constraint-based with metric-based in semi-supervised clustering ensemble. Int J Mach Learn Cybern 9(7):1085–1100
Seyedi SA, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328
Fan J, Jia P, Ge L (2020) \(\text{ M}_{k-NN}\)G-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph. Int J Mach Learn Cybern 11:1179–1195
Wang Y, Wang D, Zhang X, Pang W, Miao C, Tan A, Zhou Y (2020) McDPC: multi-center density peak clustering. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04754-5
Cheng D, Zhang S, Huang J (2020) Dense members of local cores-based density peaks clustering algorithm. Knowl Based Syst 193:105454
Parmar M, Wang D, Zhang X, Tan AH, Miao C, Jiang J, Zhou Y (2019) REDPC: a residual error-based density peak clustering algorithm. Neurocomputing 348:82–96
Geng Y, Li Q, Zheng R, Zhuang F, He R, Xiong N (2018) RECOME: a new density-based clustering algorithm using relative KNN kernel density. Inf Sci 436:13–30
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: 5-th Berkeley symposium on mathematical statistics and probability, pp 281-297
Chen Y, Hu X, Fan W, Shen L, Zhang Z, Liu X et al (2020) Fast density peak clustering for large scale data based on kNN. Knowl Based Syst 187:104824
Hsu C, Chen C, Su Y et al (2007) Hierarchical clustering of mixed data based on distance hierarchy. Inf Sci 177(20):4474–4492
Li S, Li L, Yan J, He H (2018) SDE: a novel clustering framework based on sparsity-density entropy. IEEE Trans Knowl Data Eng 30(8):1575–1587
Dong S, Liu J, Liu Y, Zeng L, Xu C, Zhou T (2018) Clustering based on grid and local density with priority-based expansion for multi-density data. Inf Sci 468:103–116
Hireche C, Drias H, Moulai H et al (2020) Grid based clustering for satisfiability solving. Appl Soft Comput 88:106069
Chen J, Lin X, Xuan Q et al (2019) FGCH: a fast and grid based clustering algorithm for hybrid data stream. Appl Intell 49(4):1228–1244
Zhang XC (2017) Data Clustering. Science Press, Beijing
Zhang W, Di Y (2020) Model-based clustering with measurement or estimation errors. Genes 11(2):185
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Heidari S, Alborzi M, Radfar R, Afsharkazemi M, Ghatari A (2019) Big data clustering with varied density based on MapReduce. J Big Data 6(1):77
Sun L, Liu R, Xu J et al (2019) An adaptive density peaks clustering method with fisher linear discriminant. IEEE Access 7:72936–72955
Fang X, Tie Z, Song F et al (2019) Robust subspace clustering via symmetry constrained latent low rank representation with converted nuclear norm. Neurocomputing 340:211–221
Li R, Yang X, Qin X et al (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl Based Syst 184:104905
Sun L, Wang L, Ding W, Qian Y, Xu J (2020) Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst 29(1):19–33
Sun L, Yin T, Ding W, Qian Y, Xu J (2020) Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci 537:401–424
Sun L, Wang L, Ding W, Qian Y, Xu J (2020) Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl Based Syst 192:105373
Xie J, Xiong Z, Zhang Y, Feng Y, Ma J (2018) Density core-based clustering algorithm with dynamic scanning radius. Knowl Based Syst 142:58–70
Angelova M, Beliakov G, Zhu Y (2019) Density-based clustering using approximate natural neighbours. Appl Soft Comput 85:105867
Xie J, Gao H, Xie W, Liu X, Grant P (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(K\)-nearest neighbors. Inf Sci 354:19–40
Du M, Ding S, Xu X, Xue Y (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern 9(8):1335–1349
Hou J, Zhang A (2020) Enhancing density peak clustering via density normalization. IEEE Trans Ind Inf 16(4):2477–2485
Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Jiang Z, Liu X, Sun M (2019) A density peak clustering algorithm based on the K-nearest shannon entropy and tissue-like P system. Math Prob Eng 2019:1–13
Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220
Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145
Wang S, Wang D, Li C, Li Y (2015) Comment on” Clustering by fast search and find of density peaks”. arXiv preprint arXiv:1501.04267
Zhong J, Peter W, Wei Y (2017) An intelligent and improved density and distance-based clustering approach for industrial survey data classification. Expert Syst Appl 68:21–28
Wu C, Lee J, Isokawa T, Yao J, Xia Y (2019) Efficient clustering method based on density peaks with symmetric neighborhood relationship. IEEE Access 7:60684–60696
Chen Y, Tang S, Zhou L, Wang C, Du J, Wang T, Pei S (2018) Decentralized clustering by finding loose and distributed density cores. Inf Sci 433:510–526
Abbas M, Shoukry A (2012) CMUNE: a clustering using mutual nearest neighbors algorithm. Inf Sci Sig Process Appl 1:1192–1197
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial Databases with Noise. Knowl Discov Data Min 1:226–231
Sieranoja S, Franti P (2019) Fast and general density peaks clustering. Pattern Recogn Lett 128:551–558
Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203
Ankerst M, Breunig M, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Record 28(2):49–60
Floros D, Liu T, Pitsianis N, Sun X (2018) Sparse dual of the density peaks algorithm for cluster analysis of high-dimensional data. In: IEEE high performance extreme computing conference, pp 1-14
Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
McLachlan G, Peel D (2004) Finite mixture models. Wiley, Hoboken
Lotfi A, Seyedi S, Moradi P (2016) An improved density peaks method for data clustering. In: IEEE 6th international conference on computer and knowledge engineering, pp 263-268
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Bai L, Cheng X, Liang J, Shen H, Guo Y (2017) Fast density clustering strategies based on the k-means algorithm. Pattern Recogn 71:375–386
Vinh N, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854
Fowlkes E, Mallows C (1983) A method for comparing two hierarchical clusterings. J Am Stat Asso 78(383):553–569
Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: The First Pacific-Asia conference on knowledge discovery and data mining, pp 21C34
Cover T, Thomas J (2001) Elements of information theory. Wiley, Hoboken
Rand W (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Acknowledgements
This research was funded by the National Natural Science Foundation of China under Grants 62076089, 61772176, 61976082, 61976120, and 61402153, the Scientific and Technological Project of Henan Province under Grant 212102210136, the Young Scholar Program of Henan Province under Grant 2017GGJS041, the Natural Science Foundation of Jiangsu Province under Grant BK20191445, and the Six Talent Peaks Project of Jiangsu Province under Grant XYDXXJS-048, sponsored by Qing Lan Project of Jiangsu Province.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, L., Qin, X., Ding, W. et al. Density peaks clustering based on k-nearest neighbors and self-recommendation. Int. J. Mach. Learn. & Cyber. 12, 1913–1938 (2021). https://doi.org/10.1007/s13042-021-01284-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01284-x