Abstract
Clustering is a fundamental approach to discover the valuable information in data mining and machine learning. Density peaks clustering is a typical density based clustering and has received increasing attention in recent years. However DPC and most of its improvements still suffer from some drawbacks. For example, it is difficult to find peaks in the sparse cluster regions; assignment for the remaining points tends to cause Domino effect, especially for complicated data. To address the above two problems, we propose generalized density peaks clustering algorithm (GDPC) based on a new order similarity, which is calculated by the order rank of Euclidean distance between two samples. The order similarity can help us to find peaks in the sparse regions. In addition, a two-step assignment is used to weaken Domino effect. In general, GDPC can not only discover clusters in datasets regardless of different sizes, dimensions and shapes, but also address the above two issues. Several experiments on datasets, including Lung, COIL20, ORL, USPS, Mnist, breast and Vote, show that our algorithm is effective in most cases.
Similar content being viewed by others
References
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM computing surveys (CSUR) 31(3):264–323
Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. Proc Second Int Conf Knowl Discovery Data Min 96(34):226–231
Ankerst M, Breunig MM, Kriegel H, Sander J (1999) Optics: ordering points to identify the clustering structure. Proc ACM Sigmod Rec 28(2):49–60
Xu XW, Ester M, Kriegel HP, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of 14th IEEE international conference data engineering (ICDE),Orlando, Florida, USA, pp 324–331
Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of 23rd international conference on very large data bases(VLDB), Athens, Greece, pp 186–195
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145
Mehmood R, Zhang GZ, Bie RF, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217
Liu YH, Ma ZM, Yu F (2017) Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220
Du MJ, Ding SF, Xu X, Xue Y (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern 9(8):1335–1349
Guo ZS, Huang TY, Cai ZL, Zhu W (2018) A new local density for density peak clustering. In: Advances in knowledge discovery and data mining- 22nd Pacific-Asia conference, PAKDD 2018, Melbourne, VIC, Australia. Proceedings, part III (PAKDD ). Lecture notes in computer science, 10939. pp 426–438
Ding JJ, He XX, Yuan JQ, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22(9):2777–2796
Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40
Seyedi AS, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328
Jiang JH, Chen YJ, Meng XQ, Wang LM, Li KQ (2019) A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process. Phys Stat Mech Appl 523(1):702–713
Liu R, Wang H, Yu XM (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Chang H, Yeung D (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203
Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 22(11):1025–1034
Xie B, Li LJ, Mi JS (2016) A novel approach for ranking in interval-valued information systems. J Intell Fuzzy Syst 30(1):523–534
Gionis A, Mannila H, Tsaparas P (2005) Clustering aggregation. In: 21st international conference on data engineering (ICDE’05), Tokoyo, Japan, pp 341–352
Hua Q, Bai LJ, Wang XZ, Liu YC (2012) Local similarity and diversity preserving discriminant projection for face and handwriting digits recognition. Neurocomputing 86:150–157
Tan AH, Wei WZ, Tao YZ (2017) On the belief structures and reductions of multigranulation spaces with decisions. Int J Approx Reason 88:39–52
Zhu W (2009a) Relationship among basic concepts in covering-based rough sets. Inf Sci 179(14):2478–2486
Zhu W (2009b) Relationship between generalized rough sets based on binary relation and covering. Inf Sci 179(3):210–225
Li RJ, Yang XF, Qin XL, Zhu W (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl Based Syst 184(15):104905–104913
Singh D, Febbo PG, Ross K et al (2012) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Method Softw 1(1):23–34
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Deng L (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142
Nene SA, Nayar SK, Murase H et al (1996) Columbia object image library (COIL-20)
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. WACV, pp 138–142
Asuncion A, Newman D (2007) Uci machine learning repository
Li ZJ, Tang YC (2018) comparative density peaks clustering. Expert Syst Appl 95:236–247
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM international conference on data mining, pp 47–58
Nie FP, Wang XQ, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. AAAI, pp 1969–1976
Shi JB, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Zhang W, Wang XG, Zhao DL, Tang XO (2012) Graph degree linkage: agglomerative clustering on a directed graph. In: Proceedings of the 12th European conference on computer vision, pp 28–441
Zheng X, Cai D, He XF, Ma WY, Lin XY (2004) Locality preserving clustering for image database. In: Proceedings of the 12th ACM international conference on multimedia, New York, NY, USA, pp 885–891
Wu MR, Schölkopf B (2006) A local learning approach for clustering. In: Advances in neural information processing systems 19, proceedings of the twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, pp 1529–1536
Chen WY, Song YQ, Bai HJ et al (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586
Acknowledgements
This work is partly funded by the National Natural Science Foundation of China (nos. 11501435, 61772120 and 11771263).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, X., Cai, Z., Li, R. et al. GDPC: generalized density peaks clustering algorithm based on order similarity. Int. J. Mach. Learn. & Cyber. 12, 719–731 (2021). https://doi.org/10.1007/s13042-020-01198-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01198-0