Skip to main content
Log in

Density peaks clustering based on k-nearest neighbors and self-recommendation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Density peaks clustering (DPC) model focuses on searching density peaks and clustering data with arbitrary shapes for machine learning. However, it is difficult for DPC to select a cut-off distance in the calculation of a local density of points, and DPC easily ignores the cluster centers with lower density in datasets with variable densities. In addition, for clusters with complex shapes, DPC selects only one cluster center for a cluster, meaning that the structure of the whole cluster is not fully reflected. To overcome these drawbacks, this paper presents a novel DPC model that merges microclusters based on k-nearest neighbors (kNN) and self-recommendation, called DPC-MC for short. First, the kNN-based neighbourhood of point is defined and the mutual neighbour degree of point is presented in this neighbourhood, and then a new local density based on the mutual neighbour degree is proposed. This local density does not need to set the cut-off distance manually. Second, to address the artificial setting of cluster centers, a self-recommendation strategy for local centers is provided. Third, after the selection of multiple local centers, the binding degree of microclusters is developed to quantify the combination degree between a microcluster and its neighbour clusters. After that, homogeneous clusters are found according to the binding degree of microclusters during the process of deleting boundary points layer by layer. The homologous clusters are merged, the points in the abnormal clusters are reallocated, and then the clustering process ends. Finally, the DPC-MC algorithm is designed, and nine synthetic datasets and twenty-seven real-world datasets are used to verify the effectiveness of our algorithm. The experimental results demonstrate that the presented algorithm outperforms other compared algorithms in terms of several evaluation metrics for clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Yang M, Changchien S, Nataliani Y (2019) Unsupervised fuzzy model-based Gaussian clustering. Inf Sci 481:1–23

    Article  MathSciNet  MATH  Google Scholar 

  2. Sun L, Liu R, Xu J, Zhang S, Tian Y (2018) An Affinity propagation clustering method using hybrid kernel function with LLE. IEEE Access 6:68892–68909

    Article  Google Scholar 

  3. Wei S, Li Z, Zhang C (2018) Combined constraint-based with metric-based in semi-supervised clustering ensemble. Int J Mach Learn Cybern 9(7):1085–1100

    Article  Google Scholar 

  4. Seyedi SA, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328

    Article  Google Scholar 

  5. Fan J, Jia P, Ge L (2020) \(\text{ M}_{k-NN}\)G-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph. Int J Mach Learn Cybern 11:1179–1195

    Article  Google Scholar 

  6. Wang Y, Wang D, Zhang X, Pang W, Miao C, Tan A, Zhou Y (2020) McDPC: multi-center density peak clustering. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04754-5

  7. Cheng D, Zhang S, Huang J (2020) Dense members of local cores-based density peaks clustering algorithm. Knowl Based Syst 193:105454

    Article  Google Scholar 

  8. Parmar M, Wang D, Zhang X, Tan AH, Miao C, Jiang J, Zhou Y (2019) REDPC: a residual error-based density peak clustering algorithm. Neurocomputing 348:82–96

    Article  Google Scholar 

  9. Geng Y, Li Q, Zheng R, Zhuang F, He R, Xiong N (2018) RECOME: a new density-based clustering algorithm using relative KNN kernel density. Inf Sci 436:13–30

    Article  MathSciNet  Google Scholar 

  10. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: 5-th Berkeley symposium on mathematical statistics and probability, pp 281-297

  11. Chen Y, Hu X, Fan W, Shen L, Zhang Z, Liu X et al (2020) Fast density peak clustering for large scale data based on kNN. Knowl Based Syst 187:104824

    Article  Google Scholar 

  12. Hsu C, Chen C, Su Y et al (2007) Hierarchical clustering of mixed data based on distance hierarchy. Inf Sci 177(20):4474–4492

    Article  Google Scholar 

  13. Li S, Li L, Yan J, He H (2018) SDE: a novel clustering framework based on sparsity-density entropy. IEEE Trans Knowl Data Eng 30(8):1575–1587

    Article  Google Scholar 

  14. Dong S, Liu J, Liu Y, Zeng L, Xu C, Zhou T (2018) Clustering based on grid and local density with priority-based expansion for multi-density data. Inf Sci 468:103–116

    Article  MATH  Google Scholar 

  15. Hireche C, Drias H, Moulai H et al (2020) Grid based clustering for satisfiability solving. Appl Soft Comput 88:106069

    Article  Google Scholar 

  16. Chen J, Lin X, Xuan Q et al (2019) FGCH: a fast and grid based clustering algorithm for hybrid data stream. Appl Intell 49(4):1228–1244

    Article  Google Scholar 

  17. Zhang XC (2017) Data Clustering. Science Press, Beijing

    Google Scholar 

  18. Zhang W, Di Y (2020) Model-based clustering with measurement or estimation errors. Genes 11(2):185

    Article  Google Scholar 

  19. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  20. Heidari S, Alborzi M, Radfar R, Afsharkazemi M, Ghatari A (2019) Big data clustering with varied density based on MapReduce. J Big Data 6(1):77

    Article  Google Scholar 

  21. Sun L, Liu R, Xu J et al (2019) An adaptive density peaks clustering method with fisher linear discriminant. IEEE Access 7:72936–72955

    Article  Google Scholar 

  22. Fang X, Tie Z, Song F et al (2019) Robust subspace clustering via symmetry constrained latent low rank representation with converted nuclear norm. Neurocomputing 340:211–221

    Article  Google Scholar 

  23. Li R, Yang X, Qin X et al (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl Based Syst 184:104905

    Article  Google Scholar 

  24. Sun L, Wang L, Ding W, Qian Y, Xu J (2020) Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst 29(1):19–33

    Article  Google Scholar 

  25. Sun L, Yin T, Ding W, Qian Y, Xu J (2020) Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci 537:401–424

    Article  MathSciNet  Google Scholar 

  26. Sun L, Wang L, Ding W, Qian Y, Xu J (2020) Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl Based Syst 192:105373

    Article  Google Scholar 

  27. Xie J, Xiong Z, Zhang Y, Feng Y, Ma J (2018) Density core-based clustering algorithm with dynamic scanning radius. Knowl Based Syst 142:58–70

    Article  Google Scholar 

  28. Angelova M, Beliakov G, Zhu Y (2019) Density-based clustering using approximate natural neighbours. Appl Soft Comput 85:105867

    Article  Google Scholar 

  29. Xie J, Gao H, Xie W, Liu X, Grant P (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(K\)-nearest neighbors. Inf Sci 354:19–40

    Article  Google Scholar 

  30. Du M, Ding S, Xu X, Xue Y (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern 9(8):1335–1349

    Article  Google Scholar 

  31. Hou J, Zhang A (2020) Enhancing density peak clustering via density normalization. IEEE Trans Ind Inf 16(4):2477–2485

    Article  Google Scholar 

  32. Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226

    Article  MathSciNet  Google Scholar 

  33. Jiang Z, Liu X, Sun M (2019) A density peak clustering algorithm based on the K-nearest shannon entropy and tissue-like P system. Math Prob Eng 2019:1–13

    MATH  Google Scholar 

  34. Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220

    Article  Google Scholar 

  35. Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145

    Article  Google Scholar 

  36. Wang S, Wang D, Li C, Li Y (2015) Comment on” Clustering by fast search and find of density peaks”. arXiv preprint arXiv:1501.04267

  37. Zhong J, Peter W, Wei Y (2017) An intelligent and improved density and distance-based clustering approach for industrial survey data classification. Expert Syst Appl 68:21–28

    Article  Google Scholar 

  38. Wu C, Lee J, Isokawa T, Yao J, Xia Y (2019) Efficient clustering method based on density peaks with symmetric neighborhood relationship. IEEE Access 7:60684–60696

    Article  Google Scholar 

  39. Chen Y, Tang S, Zhou L, Wang C, Du J, Wang T, Pei S (2018) Decentralized clustering by finding loose and distributed density cores. Inf Sci 433:510–526

    Article  MathSciNet  Google Scholar 

  40. Abbas M, Shoukry A (2012) CMUNE: a clustering using mutual nearest neighbors algorithm. Inf Sci Sig Process Appl 1:1192–1197

    Google Scholar 

  41. Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial Databases with Noise. Knowl Discov Data Min 1:226–231

    Google Scholar 

  42. Sieranoja S, Franti P (2019) Fast and general density peaks clustering. Pattern Recogn Lett 128:551–558

    Article  Google Scholar 

  43. Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203

    Article  Google Scholar 

  44. Ankerst M, Breunig M, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Record 28(2):49–60

    Article  Google Scholar 

  45. Floros D, Liu T, Pitsianis N, Sun X (2018) Sparse dual of the density peaks algorithm for cluster analysis of high-dimensional data. In: IEEE high performance extreme computing conference, pp 1-14

  46. Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MathSciNet  MATH  Google Scholar 

  47. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619

    Article  Google Scholar 

  48. McLachlan G, Peel D (2004) Finite mixture models. Wiley, Hoboken

    MATH  Google Scholar 

  49. Lotfi A, Seyedi S, Moradi P (2016) An improved density peaks method for data clustering. In: IEEE 6th international conference on computer and knowledge engineering, pp 263-268

  50. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  51. Bai L, Cheng X, Liang J, Shen H, Guo Y (2017) Fast density clustering strategies based on the k-means algorithm. Pattern Recogn 71:375–386

    Article  Google Scholar 

  52. Vinh N, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854

    MathSciNet  MATH  Google Scholar 

  53. Fowlkes E, Mallows C (1983) A method for comparing two hierarchical clusterings. J Am Stat Asso 78(383):553–569

    Article  MATH  Google Scholar 

  54. Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: The First Pacific-Asia conference on knowledge discovery and data mining, pp 21C34

  55. Cover T, Thomas J (2001) Elements of information theory. Wiley, Hoboken

    MATH  Google Scholar 

  56. Rand W (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Article  Google Scholar 

Download references

Acknowledgements

This research was funded by the National Natural Science Foundation of China under Grants 62076089, 61772176, 61976082, 61976120, and 61402153, the Scientific and Technological Project of Henan Province under Grant 212102210136, the Young Scholar Program of Henan Province under Grant 2017GGJS041, the Natural Science Foundation of Jiangsu Province under Grant BK20191445, and the Six Talent Peaks Project of Jiangsu Province under Grant XYDXXJS-048, sponsored by Qing Lan Project of Jiangsu Province.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Weiping Ding or Jiucheng Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Qin, X., Ding, W. et al. Density peaks clustering based on k-nearest neighbors and self-recommendation. Int. J. Mach. Learn. & Cyber. 12, 1913–1938 (2021). https://doi.org/10.1007/s13042-021-01284-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01284-x

Keywords

Navigation