Abstract
Some particular shaped datasets, such as manifold datasets, have restrictions on density peak clustering (DPC) performance. The main reason of variations in sample densities between clusters of data and uneven densities is not taken into consideration by the DPC algorithm, which could result in the wrong clustering center selection. Additionally, the use of single assignment method is leads to the domino effect of assignment errors. To address these problems, this paper creates a new, improved density peaks clustering method use the similarity assignment strategy with K nearest Neighbors (IDPC-SKNN). Firstly, a new method for defining local density is proposed. Local density is comprehensively consider in the proportion of the average density inside the region, which realize the precise location of low-density clusters. Then, using the samples’ K-nearest neighbors information, a new similarity allocation method is proposed. Allocation strategy successfully address assignment cascading mistakes and improves algorithms robustness. Finally, based on four evaluation indicators, our algorithm outperforms all the comparative clustering algorithm, according to experiments conducted on synthetic, real world and the Olivetti Faces datasets.









Similar content being viewed by others
Availability of data and materials
Data availability Readers can access the experimental data for this paper at the following GitHub link. https://github.com/milaan9/Clustering-Datasets.
References
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967). Oakland, CA, USA
Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clustering algorithm for large databases. ACM SIGMOD Rec. 27(2), 73–84 (1998)
Birch, Z.: An efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD96). ACM, New York, pp. 103–114 (1996)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol. 96, pp. 226–231 (1996)
Sun, J., Du, M., Lew, Z., Dong, Y.: Twstream: Three-way stream clustering. IEEE Transactions on Fuzzy Systems (2024)
Sun, J., Du, M., Sun, C., Dong, Y.: Efficient online stream clustering based on fast peeling of boundary micro-cluster. IEEE Transactions on Neural Networks and Learning Systems (2024)
Wang, W., Yang, J., Muntz, R., et al.: Sting: A statistical information grid approach to spatial data mining. In: Vldb, vol. 97, pp. 186–195 (1997)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–22 (1977)
Sun, L., Guo, C.: Incremental affinity propagation clustering based on message passing. IEEE Trans. Knowl. Data Eng. 26(11), 2731–2744 (2014)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 14, (2001)
Yu, J., Hong, R., Wang, M., You, J.: Image clustering based on sparse patch alignment framework. Pattern Recogn. 47(11), 3512–3519 (2014)
Jan, Z., Ai-Ansari, N., Mousa, O., Abd-Alrazaq, A., Ahmed, A., Alam, T., Househ, M.: The role of machine learning in diagnosing bipolar disorder: scoping review. J. Med. Internet Res. 23(11), 29749 (2021)
Fang, F., Qiu, L., Yuan, S.: Adaptive core fusion-based density peak clustering for complex data with arbitrary shapes and densities. Pattern Recogn. 107, 107452 (2020)
Li, C., Chen, H., Li, T., Yang, X.: A stable community detection approach for complex network based on density peak clustering and label propagation. Appl. Intell. 52(2), 1188–1208 (2022)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Zhang, R., Du, T., Qu, S., Sun, H.: Adaptive density-based clustering algorithm with shared knn conflict game. Inf. Sci. 565, 344–369 (2021)
Lotfi, A., Moradi, P., Beigy, H.: Density peaks clustering based on density backbone and fuzzy neighborhood. Pattern Recogn. 107, 107449 (2020)
Xu, T., Jiang, J.: A graph adaptive density peaks clustering algorithm for automatic centroid selection and effective aggregation. Expert Syst. Appl. 195, 116539 (2022)
Cheng, D., Li, Y., Xia, S., Wang, G., Huang, J., Zhang, S.: A fast granular-ball-based density peaks clustering algorithm for large-scale data. IEEE Trans. Neural Netw. Learn. Syst. (2023). https://doi.org/10.1109/TNNLS.2023.3300916
Qiu, T., Li, Y.-J.: Fast ldp-mst: an efficient density-peak-based clustering method for large-size datasets. IEEE Trans. Knowl. Data Eng. 35(5), 4767–4780 (2022)
Ding, S., Li, C., Xu, X., Ding, L., Zhang, J., Guo, L., Shi, T.: A sampling-based density peaks clustering algorithm for large-scale data. Pattern Recogn. 136, 109238 (2023)
Xu, X., Ding, S., Du, M., Xue, Y.: Dpcg: an efficient density peaks clustering algorithm based on grid. Int. J. Mach. Learn. Cybernetics 9(5), 743–754 (2018)
Niu, X., Zheng, Y., Liu, W., Wu, C.Q.: On a two-stage progressive clustering algorithm with graph-augmented density peak clustering. Eng. Appl. Artif. Intell. 108, 104566 (2022)
Li, C., Ding, S., Xu, X., Du, S., Shi, T.: Fast density peaks clustering algorithm in polar coordinate system. Appl. Intell. 52(12), 14478–14490 (2022)
Laohakiat, S., Sa-Ing, V.: An incremental density-based clustering framework using fuzzy local clustering. Inf. Sci. 547, 404–426 (2021)
Du, M., Ding, S., Jia, H.: Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl.-Based Syst. 99, 135–145 (2016)
Xie, J., Gao, H., Xie, W., Liu, X., Grant, P.W.: Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf. Sci. 354, 19–40 (2016)
Liu, R., Wang, H., Yu, X.: Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf. Sci. 450, 200–226 (2018)
Du, M., Ding, S., Xue, Y., Shi, Z.: A novel density peaks clustering with sensitivity of local density and density-adaptive metric. Knowl. Inf. Syst. 59, 285–309 (2019)
Diao, Q., Dai, Y., An, Q., Li, W., Feng, X., Pan, F.: Clustering by detecting density peaks and assigning points by similarity-first search based on weighted k-nearest neighbors graph. Complexity 2020, 1–17 (2020)
Zhang, R., Miao, Z., Tian, Y., Wang, H.: A novel density peaks clustering algorithm based on hopkins statistic. Expert Syst. Appl. 201, 116892 (2022)
Tong, W., Liu, S., Gao, X.-Z.: A density-peak-based clustering algorithm of automatically determining the number of clusters. Neurocomputing 458, 655–666 (2021)
Wang, Y., Wang, D., Zhou, Y., Zhang, X., Quek, C.: Vdpc: variational density peak clustering algorithm. Inf. Sci. 621, 627–651 (2023)
Li, C., Ding, S., Xu, X., Hou, H., Ding, L.: Fast density peaks clustering algorithm based on improved mutual k-nearest-neighbor and sub-cluster merging. Inf. Sci. 647, 119470 (2023)
Shi, Y., Bai, L.: Density peaks clustering based on candidate center and multi assignment policies. IEEE Access (2023)
Ding, S., Du, W., Xu, X., Shi, T., Wang, Y., Li, C.: An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Inf. Sci. 624, 252–276 (2023)
García-García, J.C., García-Ródenas, R.: A methodology for automatic parameter-tuning and center selection in density-peak clustering methods. Soft. Comput. 25, 1543–1561 (2021)
Wang, Y., Pang, W., Zhou, J.: An improved density peak clustering algorithm guided by pseudo labels. Knowl.-Based Syst. 252, 109374 (2022)
Yu, D., Liu, G., Guo, M., Liu, X., Yao, S.: Density peaks clustering based on weighted local density sequence and nearest neighbor assignment. Ieee Access 7, 34301–34317 (2019)
Cheng, D., Huang, J., Zhang, S., Xia, S., Wang, G., Xie, J.: K-means clustering with natural density peaks for discovering arbitrary-shaped clusters. IEEE Trans. Neural Netw. Learn. Syst. (2023). https://doi.org/10.1109/TNNLS.2023.3248064
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1073–1080 (2009)
Sun, L., Bao, S., Ci, S., Zheng, X., Guo, L., Luo, Y.: Differential privacy-preserving density peaks clustering based on shared near neighbors similarity. IEEE Access 7, 89427–89440 (2019)
Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE Workshop on Applications of Computer Vision, pp. 138–142 (1994). IEEE
Funding
Funding This work is supported by the Science and Technology Project of Chongqing Municipal Education Commission (KJQN201800539), Science and Technology Research Program of Chongqing Municipal Education Commission (No. KJZD-M202300502).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Conflict of interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Consent for publication
Consent for publication All authors have agreed to publish in this journal.
Code availability
Code availability If any scholars need further research please contact the corresponding author.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, W., Feng, J. & Yang, D. An improved density peaks clustering algorithm using similarity assignment strategy with K-nearest neighbors. Cluster Comput 27, 12689–12706 (2024). https://doi.org/10.1007/s10586-024-04592-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-024-04592-3