Density peaks clustering based on k-nearest neighbors and self-recommendation

Sun, Lin; Qin, Xiaoying; Ding, Weiping; Xu, Jiucheng; Zhang, Shiguang

doi:10.1007/s13042-021-01284-x

Density peaks clustering based on k-nearest neighbors and self-recommendation

Original Article
Published: 15 March 2021

Volume 12, pages 1913–1938, (2021)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Lin Sun¹,
Xiaoying Qin¹,
Weiping Ding²,
Jiucheng Xu ORCID: orcid.org/0000-0003-1518-3623¹ &
…
Shiguang Zhang¹

1057 Accesses
25 Citations
2 Altmetric
Explore all metrics

Abstract

Density peaks clustering (DPC) model focuses on searching density peaks and clustering data with arbitrary shapes for machine learning. However, it is difficult for DPC to select a cut-off distance in the calculation of a local density of points, and DPC easily ignores the cluster centers with lower density in datasets with variable densities. In addition, for clusters with complex shapes, DPC selects only one cluster center for a cluster, meaning that the structure of the whole cluster is not fully reflected. To overcome these drawbacks, this paper presents a novel DPC model that merges microclusters based on k-nearest neighbors (kNN) and self-recommendation, called DPC-MC for short. First, the kNN-based neighbourhood of point is defined and the mutual neighbour degree of point is presented in this neighbourhood, and then a new local density based on the mutual neighbour degree is proposed. This local density does not need to set the cut-off distance manually. Second, to address the artificial setting of cluster centers, a self-recommendation strategy for local centers is provided. Third, after the selection of multiple local centers, the binding degree of microclusters is developed to quantify the combination degree between a microcluster and its neighbour clusters. After that, homogeneous clusters are found according to the binding degree of microclusters during the process of deleting boundary points layer by layer. The homologous clusters are merged, the points in the abnormal clusters are reallocated, and then the clustering process ends. Finally, the DPC-MC algorithm is designed, and nine synthetic datasets and twenty-seven real-world datasets are used to verify the effectiveness of our algorithm. The experimental results demonstrate that the presented algorithm outperforms other compared algorithms in terms of several evaluation metrics for clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph

Article 13 November 2019

Jian-cong Fan, Pei-ling Jia & Linqiang Ge

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

Article 03 November 2021

Xiaowei Qin, Xiaoxia Han, … Gang Xie

Density Peak Clustering Based on Cumulative Nearest Neighbors Degree and Micro Cluster Merging

Article 02 August 2019

Lizhong Xu, Jia Zhao, … Zhe Chen

References

Yang M, Changchien S, Nataliani Y (2019) Unsupervised fuzzy model-based Gaussian clustering. Inf Sci 481:1–23
Article MathSciNet MATH Google Scholar
Sun L, Liu R, Xu J, Zhang S, Tian Y (2018) An Affinity propagation clustering method using hybrid kernel function with LLE. IEEE Access 6:68892–68909
Article Google Scholar
Wei S, Li Z, Zhang C (2018) Combined constraint-based with metric-based in semi-supervised clustering ensemble. Int J Mach Learn Cybern 9(7):1085–1100
Article Google Scholar
Seyedi SA, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328
Article Google Scholar
Fan J, Jia P, Ge L (2020) \(\text{ M}_{k-NN}\)G-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph. Int J Mach Learn Cybern 11:1179–1195
Article Google Scholar
Wang Y, Wang D, Zhang X, Pang W, Miao C, Tan A, Zhou Y (2020) McDPC: multi-center density peak clustering. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04754-5
Cheng D, Zhang S, Huang J (2020) Dense members of local cores-based density peaks clustering algorithm. Knowl Based Syst 193:105454
Article Google Scholar
Parmar M, Wang D, Zhang X, Tan AH, Miao C, Jiang J, Zhou Y (2019) REDPC: a residual error-based density peak clustering algorithm. Neurocomputing 348:82–96
Article Google Scholar
Geng Y, Li Q, Zheng R, Zhuang F, He R, Xiong N (2018) RECOME: a new density-based clustering algorithm using relative KNN kernel density. Inf Sci 436:13–30
Article MathSciNet Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: 5-th Berkeley symposium on mathematical statistics and probability, pp 281-297
Chen Y, Hu X, Fan W, Shen L, Zhang Z, Liu X et al (2020) Fast density peak clustering for large scale data based on kNN. Knowl Based Syst 187:104824
Article Google Scholar
Hsu C, Chen C, Su Y et al (2007) Hierarchical clustering of mixed data based on distance hierarchy. Inf Sci 177(20):4474–4492
Article Google Scholar
Li S, Li L, Yan J, He H (2018) SDE: a novel clustering framework based on sparsity-density entropy. IEEE Trans Knowl Data Eng 30(8):1575–1587
Article Google Scholar
Dong S, Liu J, Liu Y, Zeng L, Xu C, Zhou T (2018) Clustering based on grid and local density with priority-based expansion for multi-density data. Inf Sci 468:103–116
Article MATH Google Scholar
Hireche C, Drias H, Moulai H et al (2020) Grid based clustering for satisfiability solving. Appl Soft Comput 88:106069
Article Google Scholar
Chen J, Lin X, Xuan Q et al (2019) FGCH: a fast and grid based clustering algorithm for hybrid data stream. Appl Intell 49(4):1228–1244
Article Google Scholar
Zhang XC (2017) Data Clustering. Science Press, Beijing
Google Scholar
Zhang W, Di Y (2020) Model-based clustering with measurement or estimation errors. Genes 11(2):185
Article Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Heidari S, Alborzi M, Radfar R, Afsharkazemi M, Ghatari A (2019) Big data clustering with varied density based on MapReduce. J Big Data 6(1):77
Article Google Scholar
Sun L, Liu R, Xu J et al (2019) An adaptive density peaks clustering method with fisher linear discriminant. IEEE Access 7:72936–72955
Article Google Scholar
Fang X, Tie Z, Song F et al (2019) Robust subspace clustering via symmetry constrained latent low rank representation with converted nuclear norm. Neurocomputing 340:211–221
Article Google Scholar
Li R, Yang X, Qin X et al (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl Based Syst 184:104905
Article Google Scholar
Sun L, Wang L, Ding W, Qian Y, Xu J (2020) Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst 29(1):19–33
Article Google Scholar
Sun L, Yin T, Ding W, Qian Y, Xu J (2020) Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci 537:401–424
Article MathSciNet Google Scholar
Sun L, Wang L, Ding W, Qian Y, Xu J (2020) Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl Based Syst 192:105373
Article Google Scholar
Xie J, Xiong Z, Zhang Y, Feng Y, Ma J (2018) Density core-based clustering algorithm with dynamic scanning radius. Knowl Based Syst 142:58–70
Article Google Scholar
Angelova M, Beliakov G, Zhu Y (2019) Density-based clustering using approximate natural neighbours. Appl Soft Comput 85:105867
Article Google Scholar
Xie J, Gao H, Xie W, Liu X, Grant P (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(K\)-nearest neighbors. Inf Sci 354:19–40
Article Google Scholar
Du M, Ding S, Xu X, Xue Y (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern 9(8):1335–1349
Article Google Scholar
Hou J, Zhang A (2020) Enhancing density peak clustering via density normalization. IEEE Trans Ind Inf 16(4):2477–2485
Article Google Scholar
Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Article MathSciNet Google Scholar
Jiang Z, Liu X, Sun M (2019) A density peak clustering algorithm based on the K-nearest shannon entropy and tissue-like P system. Math Prob Eng 2019:1–13
MATH Google Scholar
Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220
Article Google Scholar
Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145
Article Google Scholar
Wang S, Wang D, Li C, Li Y (2015) Comment on” Clustering by fast search and find of density peaks”. arXiv preprint arXiv:1501.04267
Zhong J, Peter W, Wei Y (2017) An intelligent and improved density and distance-based clustering approach for industrial survey data classification. Expert Syst Appl 68:21–28
Article Google Scholar
Wu C, Lee J, Isokawa T, Yao J, Xia Y (2019) Efficient clustering method based on density peaks with symmetric neighborhood relationship. IEEE Access 7:60684–60696
Article Google Scholar
Chen Y, Tang S, Zhou L, Wang C, Du J, Wang T, Pei S (2018) Decentralized clustering by finding loose and distributed density cores. Inf Sci 433:510–526
Article MathSciNet Google Scholar
Abbas M, Shoukry A (2012) CMUNE: a clustering using mutual nearest neighbors algorithm. Inf Sci Sig Process Appl 1:1192–1197
Google Scholar
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial Databases with Noise. Knowl Discov Data Min 1:226–231
Google Scholar
Sieranoja S, Franti P (2019) Fast and general density peaks clustering. Pattern Recogn Lett 128:551–558
Article Google Scholar
Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203
Article Google Scholar
Ankerst M, Breunig M, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Record 28(2):49–60
Article Google Scholar
Floros D, Liu T, Pitsianis N, Sun X (2018) Sparse dual of the density peaks algorithm for cluster analysis of high-dimensional data. In: IEEE high performance extreme computing conference, pp 1-14
Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Article MathSciNet MATH Google Scholar
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Article Google Scholar
McLachlan G, Peel D (2004) Finite mixture models. Wiley, Hoboken
MATH Google Scholar
Lotfi A, Seyedi S, Moradi P (2016) An improved density peaks method for data clustering. In: IEEE 6th international conference on computer and knowledge engineering, pp 263-268
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Bai L, Cheng X, Liang J, Shen H, Guo Y (2017) Fast density clustering strategies based on the k-means algorithm. Pattern Recogn 71:375–386
Article Google Scholar
Vinh N, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854
MathSciNet MATH Google Scholar
Fowlkes E, Mallows C (1983) A method for comparing two hierarchical clusterings. J Am Stat Asso 78(383):553–569
Article MATH Google Scholar
Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: The First Pacific-Asia conference on knowledge discovery and data mining, pp 21C34
Cover T, Thomas J (2001) Elements of information theory. Wiley, Hoboken
MATH Google Scholar
Rand W (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Article Google Scholar

Download references

Acknowledgements

This research was funded by the National Natural Science Foundation of China under Grants 62076089, 61772176, 61976082, 61976120, and 61402153, the Scientific and Technological Project of Henan Province under Grant 212102210136, the Young Scholar Program of Henan Province under Grant 2017GGJS041, the Natural Science Foundation of Jiangsu Province under Grant BK20191445, and the Six Talent Peaks Project of Jiangsu Province under Grant XYDXXJS-048, sponsored by Qing Lan Project of Jiangsu Province.

Author information

Authors and Affiliations

College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China
Lin Sun, Xiaoying Qin, Jiucheng Xu & Shiguang Zhang
School of Information Science and Technology, Nantong University, Nantong, 226019, China
Weiping Ding

Authors

Lin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Qin
View author publications
You can also search for this author in PubMed Google Scholar
Weiping Ding
View author publications
You can also search for this author in PubMed Google Scholar
Jiucheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Weiping Ding or Jiucheng Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, L., Qin, X., Ding, W. et al. Density peaks clustering based on k-nearest neighbors and self-recommendation. Int. J. Mach. Learn. & Cyber. 12, 1913–1938 (2021). https://doi.org/10.1007/s13042-021-01284-x

Download citation

Received: 31 March 2020
Accepted: 03 February 2021
Published: 15 March 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s13042-021-01284-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density peaks clustering based on k-nearest neighbors and self-recommendation

Abstract

Access this article

Similar content being viewed by others

Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

Density Peak Clustering Based on Cumulative Nearest Neighbors Degree and Micro Cluster Merging

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Density peaks clustering based on k-nearest neighbors and self-recommendation

Abstract

Access this article

Similar content being viewed by others

Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

Density Peak Clustering Based on Cumulative Nearest Neighbors Degree and Micro Cluster Merging

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation