Skip to main content
Log in

An improved density peaks clustering algorithm using similarity assignment strategy with K-nearest neighbors

  • Published:
Cluster Computing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Some particular shaped datasets, such as manifold datasets, have restrictions on density peak clustering (DPC) performance. The main reason of variations in sample densities between clusters of data and uneven densities is not taken into consideration by the DPC algorithm, which could result in the wrong clustering center selection. Additionally, the use of single assignment method is leads to the domino effect of assignment errors. To address these problems, this paper creates a new, improved density peaks clustering method use the similarity assignment strategy with K nearest Neighbors (IDPC-SKNN). Firstly, a new method for defining local density is proposed. Local density is comprehensively consider in the proportion of the average density inside the region, which realize the precise location of low-density clusters. Then, using the samples’ K-nearest neighbors information, a new similarity allocation method is proposed. Allocation strategy successfully address assignment cascading mistakes and improves algorithms robustness. Finally, based on four evaluation indicators, our algorithm outperforms all the comparative clustering algorithm, according to experiments conducted on synthetic, real world and the Olivetti Faces datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and materials

Data availability Readers can access the experimental data for this paper at the following GitHub link. https://github.com/milaan9/Clustering-Datasets.

References

  1. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967). Oakland, CA, USA

  2. Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clustering algorithm for large databases. ACM SIGMOD Rec. 27(2), 73–84 (1998)

    Article  Google Scholar 

  3. Birch, Z.: An efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD96). ACM, New York, pp. 103–114 (1996)

  4. Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol. 96, pp. 226–231 (1996)

  5. Sun, J., Du, M., Lew, Z., Dong, Y.: Twstream: Three-way stream clustering. IEEE Transactions on Fuzzy Systems (2024)

  6. Sun, J., Du, M., Sun, C., Dong, Y.: Efficient online stream clustering based on fast peeling of boundary micro-cluster. IEEE Transactions on Neural Networks and Learning Systems (2024)

  7. Wang, W., Yang, J., Muntz, R., et al.: Sting: A statistical information grid approach to spatial data mining. In: Vldb, vol. 97, pp. 186–195 (1997)

  8. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–22 (1977)

    Article  MathSciNet  Google Scholar 

  9. Sun, L., Guo, C.: Incremental affinity propagation clustering based on message passing. IEEE Trans. Knowl. Data Eng. 26(11), 2731–2744 (2014)

    Article  Google Scholar 

  10. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems 14, (2001)

  11. Yu, J., Hong, R., Wang, M., You, J.: Image clustering based on sparse patch alignment framework. Pattern Recogn. 47(11), 3512–3519 (2014)

    Article  Google Scholar 

  12. Jan, Z., Ai-Ansari, N., Mousa, O., Abd-Alrazaq, A., Ahmed, A., Alam, T., Househ, M.: The role of machine learning in diagnosing bipolar disorder: scoping review. J. Med. Internet Res. 23(11), 29749 (2021)

    Article  Google Scholar 

  13. Fang, F., Qiu, L., Yuan, S.: Adaptive core fusion-based density peak clustering for complex data with arbitrary shapes and densities. Pattern Recogn. 107, 107452 (2020)

    Article  Google Scholar 

  14. Li, C., Chen, H., Li, T., Yang, X.: A stable community detection approach for complex network based on density peak clustering and label propagation. Appl. Intell. 52(2), 1188–1208 (2022)

    Article  Google Scholar 

  15. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  16. Zhang, R., Du, T., Qu, S., Sun, H.: Adaptive density-based clustering algorithm with shared knn conflict game. Inf. Sci. 565, 344–369 (2021)

    Article  MathSciNet  Google Scholar 

  17. Lotfi, A., Moradi, P., Beigy, H.: Density peaks clustering based on density backbone and fuzzy neighborhood. Pattern Recogn. 107, 107449 (2020)

    Article  Google Scholar 

  18. Xu, T., Jiang, J.: A graph adaptive density peaks clustering algorithm for automatic centroid selection and effective aggregation. Expert Syst. Appl. 195, 116539 (2022)

    Article  Google Scholar 

  19. Cheng, D., Li, Y., Xia, S., Wang, G., Huang, J., Zhang, S.: A fast granular-ball-based density peaks clustering algorithm for large-scale data. IEEE Trans. Neural Netw. Learn. Syst. (2023). https://doi.org/10.1109/TNNLS.2023.3300916

    Article  Google Scholar 

  20. Qiu, T., Li, Y.-J.: Fast ldp-mst: an efficient density-peak-based clustering method for large-size datasets. IEEE Trans. Knowl. Data Eng. 35(5), 4767–4780 (2022)

    Article  Google Scholar 

  21. Ding, S., Li, C., Xu, X., Ding, L., Zhang, J., Guo, L., Shi, T.: A sampling-based density peaks clustering algorithm for large-scale data. Pattern Recogn. 136, 109238 (2023)

    Article  Google Scholar 

  22. Xu, X., Ding, S., Du, M., Xue, Y.: Dpcg: an efficient density peaks clustering algorithm based on grid. Int. J. Mach. Learn. Cybernetics 9(5), 743–754 (2018)

    Article  Google Scholar 

  23. Niu, X., Zheng, Y., Liu, W., Wu, C.Q.: On a two-stage progressive clustering algorithm with graph-augmented density peak clustering. Eng. Appl. Artif. Intell. 108, 104566 (2022)

    Article  Google Scholar 

  24. Li, C., Ding, S., Xu, X., Du, S., Shi, T.: Fast density peaks clustering algorithm in polar coordinate system. Appl. Intell. 52(12), 14478–14490 (2022)

    Article  Google Scholar 

  25. Laohakiat, S., Sa-Ing, V.: An incremental density-based clustering framework using fuzzy local clustering. Inf. Sci. 547, 404–426 (2021)

    Article  MathSciNet  Google Scholar 

  26. Du, M., Ding, S., Jia, H.: Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl.-Based Syst. 99, 135–145 (2016)

    Article  Google Scholar 

  27. Xie, J., Gao, H., Xie, W., Liu, X., Grant, P.W.: Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf. Sci. 354, 19–40 (2016)

    Article  Google Scholar 

  28. Liu, R., Wang, H., Yu, X.: Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf. Sci. 450, 200–226 (2018)

    Article  MathSciNet  Google Scholar 

  29. Du, M., Ding, S., Xue, Y., Shi, Z.: A novel density peaks clustering with sensitivity of local density and density-adaptive metric. Knowl. Inf. Syst. 59, 285–309 (2019)

    Article  Google Scholar 

  30. Diao, Q., Dai, Y., An, Q., Li, W., Feng, X., Pan, F.: Clustering by detecting density peaks and assigning points by similarity-first search based on weighted k-nearest neighbors graph. Complexity 2020, 1–17 (2020)

    Article  Google Scholar 

  31. Zhang, R., Miao, Z., Tian, Y., Wang, H.: A novel density peaks clustering algorithm based on hopkins statistic. Expert Syst. Appl. 201, 116892 (2022)

    Article  Google Scholar 

  32. Tong, W., Liu, S., Gao, X.-Z.: A density-peak-based clustering algorithm of automatically determining the number of clusters. Neurocomputing 458, 655–666 (2021)

    Article  Google Scholar 

  33. Wang, Y., Wang, D., Zhou, Y., Zhang, X., Quek, C.: Vdpc: variational density peak clustering algorithm. Inf. Sci. 621, 627–651 (2023)

    Article  Google Scholar 

  34. Li, C., Ding, S., Xu, X., Hou, H., Ding, L.: Fast density peaks clustering algorithm based on improved mutual k-nearest-neighbor and sub-cluster merging. Inf. Sci. 647, 119470 (2023)

    Article  Google Scholar 

  35. Shi, Y., Bai, L.: Density peaks clustering based on candidate center and multi assignment policies. IEEE Access (2023)

  36. Ding, S., Du, W., Xu, X., Shi, T., Wang, Y., Li, C.: An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Inf. Sci. 624, 252–276 (2023)

    Article  Google Scholar 

  37. García-García, J.C., García-Ródenas, R.: A methodology for automatic parameter-tuning and center selection in density-peak clustering methods. Soft. Comput. 25, 1543–1561 (2021)

    Article  Google Scholar 

  38. Wang, Y., Pang, W., Zhou, J.: An improved density peak clustering algorithm guided by pseudo labels. Knowl.-Based Syst. 252, 109374 (2022)

    Article  Google Scholar 

  39. Yu, D., Liu, G., Guo, M., Liu, X., Yao, S.: Density peaks clustering based on weighted local density sequence and nearest neighbor assignment. Ieee Access 7, 34301–34317 (2019)

    Article  Google Scholar 

  40. Cheng, D., Huang, J., Zhang, S., Xia, S., Wang, G., Xie, J.: K-means clustering with natural density peaks for discovering arbitrary-shaped clusters. IEEE Trans. Neural Netw. Learn. Syst. (2023). https://doi.org/10.1109/TNNLS.2023.3248064

    Article  Google Scholar 

  41. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1073–1080 (2009)

  42. Sun, L., Bao, S., Ci, S., Zheng, X., Guo, L., Luo, Y.: Differential privacy-preserving density peaks clustering based on shared near neighbors similarity. IEEE Access 7, 89427–89440 (2019)

    Article  Google Scholar 

  43. Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face identification. In: Proceedings of 1994 IEEE Workshop on Applications of Computer Vision, pp. 138–142 (1994). IEEE

Download references

Funding

Funding This work is supported by the Science and Technology Project of Chongqing Municipal Education Commission (KJQN201800539), Science and Technology Research Program of Chongqing Municipal Education Commission (No. KJZD-M202300502).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji Feng.

Ethics declarations

Conflict of interest

Conflict of interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Consent for publication

Consent for publication All authors have agreed to publish in this journal.

Code availability

Code availability If any scholars need further research please contact the corresponding author.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, W., Feng, J. & Yang, D. An improved density peaks clustering algorithm using similarity assignment strategy with K-nearest neighbors. Cluster Comput 27, 12689–12706 (2024). https://doi.org/10.1007/s10586-024-04592-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-024-04592-3

Keywords