GDPC: generalized density peaks clustering algorithm based on order similarity

Yang, Xiaofei; Cai, Zhiling; Li, Ruijia; Zhu, William

doi:10.1007/s13042-020-01198-0

GDPC: generalized density peaks clustering algorithm based on order similarity

Original Article
Published: 20 September 2020

Volume 12, pages 719–731, (2021)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Xiaofei Yang^1,2,
Zhiling Cai¹,
Ruijia Li¹ &
…
William Zhu¹

499 Accesses
9 Citations
Explore all metrics

Abstract

Clustering is a fundamental approach to discover the valuable information in data mining and machine learning. Density peaks clustering is a typical density based clustering and has received increasing attention in recent years. However DPC and most of its improvements still suffer from some drawbacks. For example, it is difficult to find peaks in the sparse cluster regions; assignment for the remaining points tends to cause Domino effect, especially for complicated data. To address the above two problems, we propose generalized density peaks clustering algorithm (GDPC) based on a new order similarity, which is calculated by the order rank of Euclidean distance between two samples. The order similarity can help us to find peaks in the sparse regions. In addition, a two-step assignment is used to weaken Domino effect. In general, GDPC can not only discover clusters in datasets regardless of different sizes, dimensions and shapes, but also address the above two issues. Several experiments on datasets, including Lung, COIL20, ORL, USPS, Mnist, breast and Vote, show that our algorithm is effective in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Fig. 10

Fig. 11

Fig. 12

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

K-Means algorithm based on multi-feature-induced order

Article 09 April 2024

References

Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
MATH Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM computing surveys (CSUR) 31(3):264–323
Article Google Scholar
Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. Proc Second Int Conf Knowl Discovery Data Min 96(34):226–231
Google Scholar
Ankerst M, Breunig MM, Kriegel H, Sander J (1999) Optics: ordering points to identify the clustering structure. Proc ACM Sigmod Rec 28(2):49–60
Article Google Scholar
Xu XW, Ester M, Kriegel HP, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of 14th IEEE international conference data engineering (ICDE),Orlando, Florida, USA, pp 324–331
Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of 23rd international conference on very large data bases(VLDB), Athens, Greece, pp 186–195
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145
Article Google Scholar
Mehmood R, Zhang GZ, Bie RF, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217
Article Google Scholar
Liu YH, Ma ZM, Yu F (2017) Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220
Article Google Scholar
Du MJ, Ding SF, Xu X, Xue Y (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cybern 9(8):1335–1349
Article Google Scholar
Guo ZS, Huang TY, Cai ZL, Zhu W (2018) A new local density for density peak clustering. In: Advances in knowledge discovery and data mining- 22nd Pacific-Asia conference, PAKDD 2018, Melbourne, VIC, Australia. Proceedings, part III (PAKDD ). Lecture notes in computer science, 10939. pp 426–438
Ding JJ, He XX, Yuan JQ, Jiang B (2018) Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Comput 22(9):2777–2796
Article Google Scholar
Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40
Article Google Scholar
Seyedi AS, Lotfi A, Moradi P, Qader NN (2019) Dynamic graph-based label propagation for density peaks clustering. Expert Syst Appl 115:314–328
Article Google Scholar
Jiang JH, Chen YJ, Meng XQ, Wang LM, Li KQ (2019) A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process. Phys Stat Mech Appl 523(1):702–713
Article Google Scholar
Liu R, Wang H, Yu XM (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Article MathSciNet Google Scholar
Chang H, Yeung D (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203
Article Google Scholar
Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 22(11):1025–1034
Article Google Scholar
Xie B, Li LJ, Mi JS (2016) A novel approach for ranking in interval-valued information systems. J Intell Fuzzy Syst 30(1):523–534
Article Google Scholar
Gionis A, Mannila H, Tsaparas P (2005) Clustering aggregation. In: 21st international conference on data engineering (ICDE’05), Tokoyo, Japan, pp 341–352
Hua Q, Bai LJ, Wang XZ, Liu YC (2012) Local similarity and diversity preserving discriminant projection for face and handwriting digits recognition. Neurocomputing 86:150–157
Article Google Scholar
Tan AH, Wei WZ, Tao YZ (2017) On the belief structures and reductions of multigranulation spaces with decisions. Int J Approx Reason 88:39–52
Article MathSciNet Google Scholar
Zhu W (2009a) Relationship among basic concepts in covering-based rough sets. Inf Sci 179(14):2478–2486
Article MathSciNet Google Scholar
Zhu W (2009b) Relationship between generalized rough sets based on binary relation and covering. Inf Sci 179(3):210–225
Article MathSciNet Google Scholar
Li RJ, Yang XF, Qin XL, Zhu W (2019) Local gap density for clustering high-dimensional data with varying densities. Knowl Based Syst 184(15):104905–104913
Article Google Scholar
Singh D, Febbo PG, Ross K et al (2012) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Article Google Scholar
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Method Softw 1(1):23–34
Article Google Scholar
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Article Google Scholar
Deng L (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142
Article Google Scholar
Nene SA, Nayar SK, Murase H et al (1996) Columbia object image library (COIL-20)
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. WACV, pp 138–142
Asuncion A, Newman D (2007) Uci machine learning repository
Li ZJ, Tang YC (2018) comparative density peaks clustering. Expert Syst Appl 95:236–247
Article Google Scholar
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM international conference on data mining, pp 47–58
Nie FP, Wang XQ, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. AAAI, pp 1969–1976
Shi JB, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Zhang W, Wang XG, Zhao DL, Tang XO (2012) Graph degree linkage: agglomerative clustering on a directed graph. In: Proceedings of the 12th European conference on computer vision, pp 28–441
Zheng X, Cai D, He XF, Ma WY, Lin XY (2004) Locality preserving clustering for image database. In: Proceedings of the 12th ACM international conference on multimedia, New York, NY, USA, pp 885–891
Wu MR, Schölkopf B (2006) A local learning approach for clustering. In: Advances in neural information processing systems 19, proceedings of the twentieth annual conference on neural information processing systems, Vancouver, British Columbia, Canada, pp 1529–1536
Chen WY, Song YQ, Bai HJ et al (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586
Article Google Scholar

Download references

Acknowledgements

This work is partly funded by the National Natural Science Foundation of China (nos. 11501435, 61772120 and 11771263).

Author information

Authors and Affiliations

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
Xiaofei Yang, Zhiling Cai, Ruijia Li & William Zhu
School of Science, Xi’an Polytechnic University, Xi’an, China
Xiaofei Yang

Authors

Xiaofei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiling Cai
View author publications
You can also search for this author in PubMed Google Scholar
Ruijia Li
View author publications
You can also search for this author in PubMed Google Scholar
William Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, X., Cai, Z., Li, R. et al. GDPC: generalized density peaks clustering algorithm based on order similarity. Int. J. Mach. Learn. & Cyber. 12, 719–731 (2021). https://doi.org/10.1007/s13042-020-01198-0

Download citation

Received: 11 June 2019
Accepted: 06 September 2020
Published: 20 September 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s13042-020-01198-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GDPC: generalized density peaks clustering algorithm based on order similarity

Abstract

Access this article

Similar content being viewed by others

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Clustering graph data: the roadmap to spectral techniques

K-Means algorithm based on multi-feature-induced order

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GDPC: generalized density peaks clustering algorithm based on order similarity

Abstract

Access this article

Similar content being viewed by others

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Clustering graph data: the roadmap to spectral techniques

K-Means algorithm based on multi-feature-induced order

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation