A two-stage density clustering algorithm

Wang, Min; Zhang, Ying-Yi; Min, Fan; Deng, Li-Ping; Gao, Lei

doi:10.1007/s00500-020-05028-x

A two-stage density clustering algorithm

Methodologies and Application
Published: 26 May 2020

Volume 24, pages 17797–17819, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

Min Wang¹,
Ying-Yi Zhang¹,
Fan Min ORCID: orcid.org/0000-0002-3290-1036^2,3,
Li-Ping Deng⁴ &
…
Lei Gao²

380 Accesses
6 Citations
Explore all metrics

Abstract

Clustering by fast search and find of density peaks (CFDP) is a popular density-based algorithm. However, it is criticized because it is inefficient and applicable only to some types of data, and requires the manual setting of the key parameter. In this paper, we propose the two-stage density clustering algorithm, which takes advantage of granular computing to address the aforementioned issues. The new algorithm is highly efficient, adaptive to various types of data, and requires minimal parameter setting. The first stage uses the two-round-means algorithm to obtain \(\sqrt{n}\) small blocks, where n is the number of instances. This stage decreases the data size directly from n to \(\sqrt{n}\). The second stage constructs the master tree and obtains the final blocks. This stage borrows the structure of CFDP, while the cutoff distance parameter is not required. The time complexity of the algorithm is \(O(mn^\frac{3}{2})\), which is lower than \(O (mn^2)\) for CFDP. We report the results of some experiments performed on 21 datasets from various domains to compare a new clustering algorithm with some state-of-the-art clustering algorithms. The results demonstrated that the new algorithm is adaptive to different types of datasets. It is two or more orders of magnitude faster than CFDP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Notes

References

Bai L, Cheng XQ, Liang JY, Shen HW, Guo YK (2017) Fast density clustering strategies based on the \(k\)-means algorithm. Pattern Recogn 71:375–386
Google Scholar
Blake C, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Cattral R, Oppacher F (2007) Discovering rules in the poker hand dataset. In: GECCO, pp 1870–1870
Chang MS, Chen LH, Hung LJ, Rossmanith P, Wu GH (2014) Exact algorithms for problems related to the densest \(k\)-set problem. Inf Process Lett 114(9):510–513
MathSciNet MATH Google Scholar
Chen XL, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: AAAI, p 14
Chen DG, Zhang XX, Li WL (2015) On measurements of covering rough sets based on granules and evidence theory. Inf Sci 317:329–348
MathSciNet MATH Google Scholar
Chen M, Li LJ, Wang B, Chen JJ, Pan LN, Chen XY (2016) Effectively clustering by finding density backbone based-on KNN. Pattern Recongn 60:486–498
Google Scholar
Chiroma H, Gital AY, Abubakar A, Zeki A (2014) Comparing performances of markov blanket and tree augmented Naïve Bayes on the iris dataset. Lect Notes Eng Comput Sci 2209(1):328–331
Google Scholar
Chuang PJ, Yang SH, shin Lin C (2009) An energy-efficient balanced clustering algorithm for wireless sensor networks. In: WiCOM, pp 1–4
Dasgupta S (2002) Performance guarantees for hierarchical clustering. Comput Learn Theory 2375:351–363
MathSciNet MATH Google Scholar
Dengel A, Althoff T, Ulges A (2011) Balanced clustering for content-based image browsing. In: Gi-Informatiktage
Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145
Google Scholar
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp 226–231
Guo LJ, Ai CY, Wang XM, Cai ZP (2009) Real time clustering of sensory data in wireless sensor networks. In: Performance computing & communications conference, pp 33–40
Hu BQ, Wong H, Yiu KFC (2016) The aggregation of multiple three-way decision spaces. Knowl Based Syst 98:241–249
Google Scholar
Huang CC, Li JH, Mei CL, Wu WZ (2017) Three-way concept learning based on cognitive operators: an information fusion viewpoint. Int J Approx Reason 84(1):1–20
MathSciNet MATH Google Scholar
Johnson CS (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
MATH Google Scholar
Kaufman LJP (2008) Rousseeuw: finding groups in data: an introduction to cluster analysis. Wiley, London
Google Scholar
Kriegel HP, Kröger P, Sander J, Zimek A (2011) Density-based clustering. Wiley interdisciplinary reviews: data mining and knowledge discovery 1(3):231–240
Google Scholar
Langford J, Zhang XH, Brown G, Bhattacharya I, Getoor L, Zeugmann T (2010) EM clustering. Springer, Boston, MA
Google Scholar
Leong SH, Ong SH (2017) Similarity measure and domain adaptation in multiple mixture model clustering: an application to image processing. PLOS ONE 12(7):e0180307
Google Scholar
Li YFW, Tsang I, Zhou ZH (2009) Tighter and convex maximum margin clustering. AISTATS 5:344–351
Google Scholar
Li JH, Mei CL, Lv YJ (2011) A heuristic knowledge-reduction method for decision formal contexts. Comput Math Appl 61(4):1096–1106
MathSciNet MATH Google Scholar
Li JH, Huang CC, Qi JJ, Qian YH, Liu WQ (2016) Three-way cognitive concept learning via multi-granularity. Inf Sci 361(1):1–15
MATH Google Scholar
Li XN, Sun BZ, She YH (2017) Generalized matroids based on three-way decision models. Int J Approx Reason 90:192–207
MathSciNet MATH Google Scholar
Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy. Elsevier Science Inc., New York
Google Scholar
Liu D, Liang DC (2017) Three-way decisions in ordered decision system. Knowl Based Syst 137:182–195
Google Scholar
Liu HY, Han JW, Nie FP, Li XL (2017a) Balanced clustering with least square regression. AAA I:2231–2237
Google Scholar
Liu YH, Ma ZM, Yu F (2017b) Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220
Google Scholar
Liu R, Wang H, Yu XM (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
MathSciNet Google Scholar
Lu JY, Zhu QS (2017) An effective algorithm based on density clustering framework. IEEE Access 5(99):4991–5000
Google Scholar
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297
Min F, He HP, Qian YH, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181:4928–4942
Google Scholar
Pappas NT (1992) An adaptive clustering algorithm for image segmentation. IEEE Trans Signal Process 40(4):901–914
Google Scholar
Qian J, Lv P, Yue XD, Liu CH, Jing ZJ (2015) Hierarchical attribute reduction algorithms for big data using mapreduce. Knowl Based Syst 73:18–31
Google Scholar
Reyes O, Altalhi AH, Ventura S (2018) Statistical comparisons of active learning strategies over multiple datasets. Knowl Based Syst 145:274–288
Google Scholar
Rodriguez A, Laio A (2014) Machine learning clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Google Scholar
Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to speed-up the k-means clustering method. Int J Mach Learn Cybern 4(2):107–117
Google Scholar
Shuji S, Masanori K, Takashi I, Yutaka A (2015) Faster sequence homology searches by clustering subsequences. Bioinformatics 31(8):1183–1190
Google Scholar
Stenger J (2011) Disability living allowance. Mental Health Today 1(6):277
Google Scholar
Ugulino W, Vega K, Velloso E (2012) Wearable computing: accelerometers data classification of body postures and movements. Adv Artif Intell SBIA 2012:52–61
Google Scholar
Wang XZ, Musa AB (2014) Advances in neural network based learning. Int J Mach Learn Cybern 5(1):1–2
Google Scholar
Wang Y, Jiang Y, Wu Y, Zhou ZH (2011) Spectral clustering on multiple manifolds. IEEE Trans Neural Netw 22(7):1149–1161
Google Scholar
Wang M, Min F, Wu YX, Zhang ZH (2017) Active learning through density clustering. Exp Syst Appl 85:305–317
Google Scholar
Wilderjans TF, Cariou V (2016) Clv3w : a clustering around latent variables approach to detect panel disagreement in three-way conventional sensory profiling data. Food Qual Prefer 47:45–53
Google Scholar
Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40
Google Scholar
Xu J, Wang GY, Deng WH (2016) Denpehc: density peak based efficient hierarchical clustering. Inf Sci 373(12):200–218
Google Scholar
Xu X, Ding SF, Du MJ, Xue Y (2018a) DPCG: an efficient density peaks clustering algorithm based on grid. Int J Mach Learn Cybern 9(5):743–754
Google Scholar
Xu X, Ding SF, Shi ZZ (2018b) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl Based Syst 158:65–74
Google Scholar
Yao JT, Yao YY (2002) Induction of classification rules by granular computing. Springer, Berlin
MATH Google Scholar
Yu J, Cheng QS (2001) The upper bound of the optimal number of clusters in fuzzy clustering. Inf Sci 44(2):119–125
Google Scholar
Yu H, Jiao P, Yao YY, Wang GY (2016) Detecting and refining overlapping regions in complex networks with three-way decisions. Inf Sci 373:21–41
MATH Google Scholar
Zhang HR, Min F, Shi B (2017) Regression-based three-way recommendation. Inf Sci 378:444–461
Google Scholar
Zhang HR, Min F, Zhang ZH, Wang S (2019) Efficient collaborative filtering recommendations with multi-channel feature vectors. Int J Mach Learn Cybern 10(5):1165–1172
Google Scholar
Zhao H, Wang P, Hu QH (2016a) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci 366:134–149
MathSciNet Google Scholar
Zhao YX, Li JH, Liu WQ, Xu WH (2016b) Cognitive concept learning from incomplete information. Int J Mach Learn Cybern 7(4):1–12
Google Scholar
Zhou ZH (2016) Machine learning. Tsinghua Press, Beijing
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (41604114); the Sichuan Province Youth Science and Technology Innovation Team (2019JDTD0017).

Author information

Authors and Affiliations

School of Electrical Engineering and Information, Southwest Petroleum University, Chengdu, 610500, China
Min Wang & Ying-Yi Zhang
School of Computer Science, Southwest Petroleum University, Chengdu, 610500, China
Fan Min & Lei Gao
Institute for Artificial Intelligence, Southwest Petroleum University, Chengdu, 610500, China
Fan Min
School of Computer Science and Technology, China West Normal University, Nanchong, 637002, China
Li-Ping Deng

Authors

Min Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying-Yi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fan Min
View author publications
You can also search for this author in PubMed Google Scholar
Li-Ping Deng
View author publications
You can also search for this author in PubMed Google Scholar
Lei Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Min.

Ethics declarations

Conflicts of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, M., Zhang, YY., Min, F. et al. A two-stage density clustering algorithm. Soft Comput 24, 17797–17819 (2020). https://doi.org/10.1007/s00500-020-05028-x

Download citation

Published: 26 May 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s00500-020-05028-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A two-stage density clustering algorithm

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Clustering graph data: the roadmap to spectral techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A two-stage density clustering algorithm

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Clustering graph data: the roadmap to spectral techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation