Skip to main content
Log in

GDPC: generalized density peaks clustering algorithm based on order similarity

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Clustering is a fundamental approach to discover the valuable information in data mining and machine learning. Density peaks clustering is a typical density based clustering and has received increasing attention in recent years. However DPC and most of its improvements still suffer from some drawbacks. For example, it is difficult to find peaks in the sparse cluster regions; assignment for the remaining points tends to cause Domino effect, especially for complicated data. To address the above two problems, we propose generalized density peaks clustering algorithm (GDPC) based on a new order similarity, which is calculated by the order rank of Euclidean distance between two samples. The order similarity can help us to find peaks in the sparse regions. In addition, a two-step assignment is used to weaken Domino effect. In general, GDPC can not only discover clusters in datasets regardless of different sizes, dimensions and shapes, but also address the above two issues. Several experiments on datasets, including Lung, COIL20, ORL, USPS, Mnist, breast and Vote, show that our algorithm is effective in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  2. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM computing surveys (CSUR) 31(3):264–323

    Article  Google Scholar 

  3. Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. Proc Second Int Conf Knowl Discovery Data Min 96(34):226–231

    Google Scholar 

  4. Ankerst M, Breunig MM, Kriegel H, Sander J (1999) Optics: ordering points to identify the clustering structure. Proc ACM Sigmod Rec 28(2):49–60

    Article  Google Scholar 

  5. Xu XW, Ester M, Kriegel HP, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of 14th IEEE international conference data engineering (ICDE),Orlando, Florida, USA, pp 324–331

  6. Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of 23rd international conference on very large data bases(VLDB), Athens, Greece, pp 186–195

  7. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  8. Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145

    Article  Google Scholar 

  9. Mehmood R, Zhang GZ, Bie RF, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217

    Article  Google Scholar 

  10. Liu YH, Ma ZM, Yu F (2017) Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220

    Article  Google Scholar 

  11. Du MJ, Ding SF, Xu X, Xue Y (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern 9(8):1335–1349

    Article  Google Scholar 

  12. Guo ZS, Huang TY, Cai ZL, Zhu W (2018) A new local density for density peak clustering. In: Advances in knowledge discovery and data mining- 22nd Pacific-Asia conference, PAKDD 2018, Melbourne, VIC, Australia. Proceedings, part III (PAKDD ). Lecture notes in computer science, 10939. pp 426–438

  13. Ding JJ, He XX, Yuan JQ, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22(9):2777–2796

    Article  Google Scholar 

  14. Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40

    Article  Google Scholar 

  15. Seyedi AS, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328

    Article  Google Scholar 

  16. Jiang JH, Chen YJ, Meng XQ, Wang LM, Li KQ (2019) A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process. Phys Stat Mech Appl 523(1):702–713

    Article  Google Scholar 

  17. Liu R, Wang H, Yu XM (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226

    Article  MathSciNet  Google Scholar 

  18. Chang H, Yeung D (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203

    Article  Google Scholar 

  19. Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 22(11):1025–1034

    Article  Google Scholar 

  20. Xie B, Li LJ, Mi JS (2016) A novel approach for ranking in interval-valued information systems. J Intell Fuzzy Syst 30(1):523–534

    Article  Google Scholar 

  21. Gionis A, Mannila H, Tsaparas P (2005) Clustering aggregation. In: 21st international conference on data engineering (ICDE’05), Tokoyo, Japan, pp 341–352

  22. Hua Q, Bai LJ, Wang XZ, Liu YC (2012) Local similarity and diversity preserving discriminant projection for face and handwriting digits recognition. Neurocomputing 86:150–157

    Article  Google Scholar 

  23. Tan AH, Wei WZ, Tao YZ (2017) On the belief structures and reductions of multigranulation spaces with decisions. Int J Approx Reason 88:39–52

    Article  MathSciNet  Google Scholar 

  24. Zhu W (2009a) Relationship among basic concepts in covering-based rough sets. Inf Sci 179(14):2478–2486

    Article  MathSciNet  Google Scholar 

  25. Zhu W (2009b) Relationship between generalized rough sets based on binary relation and covering. Inf Sci 179(3):210–225

    Article  MathSciNet  Google Scholar 

  26. Li RJ, Yang XF, Qin XL, Zhu W (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl Based Syst 184(15):104905–104913

    Article  Google Scholar 

  27. Singh D, Febbo PG, Ross K et al (2012) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209

    Article  Google Scholar 

  28. Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Method Softw 1(1):23–34

    Article  Google Scholar 

  29. Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554

    Article  Google Scholar 

  30. Deng L (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142

    Article  Google Scholar 

  31. Nene SA, Nayar SK, Murase H et al (1996) Columbia object image library (COIL-20)

  32. Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. WACV, pp 138–142

  33. Asuncion A, Newman D (2007) Uci machine learning repository

  34. Li ZJ, Tang YC (2018) comparative density peaks clustering. Expert Syst Appl 95:236–247

    Article  Google Scholar 

  35. Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM international conference on data mining, pp 47–58

  36. Nie FP, Wang XQ, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. AAAI, pp 1969–1976

  37. Shi JB, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  38. Zhang W, Wang XG, Zhao DL, Tang XO (2012) Graph degree linkage: agglomerative clustering on a directed graph. In: Proceedings of the 12th European conference on computer vision, pp 28–441

  39. Zheng X, Cai D, He XF, Ma WY, Lin XY (2004) Locality preserving clustering for image database. In: Proceedings of the 12th ACM international conference on multimedia, New York, NY, USA, pp 885–891

  40. Wu MR, Schölkopf B (2006) A local learning approach for clustering. In: Advances in neural information processing systems 19, proceedings of the twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, pp 1529–1536

  41. Chen WY, Song YQ, Bai HJ et al (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586

    Article  Google Scholar 

Download references

Acknowledgements

This work is partly funded by the National Natural Science Foundation of China (nos. 11501435, 61772120 and 11771263).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, X., Cai, Z., Li, R. et al. GDPC: generalized density peaks clustering algorithm based on order similarity. Int. J. Mach. Learn. & Cyber. 12, 719–731 (2021). https://doi.org/10.1007/s13042-020-01198-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01198-0

Keywords

Navigation