Abstract
Clustering by fast search and find of density peaks (CFDP) is a popular density-based algorithm. However, it is criticized because it is inefficient and applicable only to some types of data, and requires the manual setting of the key parameter. In this paper, we propose the two-stage density clustering algorithm, which takes advantage of granular computing to address the aforementioned issues. The new algorithm is highly efficient, adaptive to various types of data, and requires minimal parameter setting. The first stage uses the two-round-means algorithm to obtain \(\sqrt{n}\) small blocks, where n is the number of instances. This stage decreases the data size directly from n to \(\sqrt{n}\). The second stage constructs the master tree and obtains the final blocks. This stage borrows the structure of CFDP, while the cutoff distance parameter is not required. The time complexity of the algorithm is \(O(mn^\frac{3}{2})\), which is lower than \(O (mn^2)\) for CFDP. We report the results of some experiments performed on 21 datasets from various domains to compare a new clustering algorithm with some state-of-the-art clustering algorithms. The results demonstrated that the new algorithm is adaptive to different types of datasets. It is two or more orders of magnitude faster than CFDP.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bai L, Cheng XQ, Liang JY, Shen HW, Guo YK (2017) Fast density clustering strategies based on the \(k\)-means algorithm. Pattern Recogn 71:375–386
Blake C, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Cattral R, Oppacher F (2007) Discovering rules in the poker hand dataset. In: GECCO, pp 1870–1870
Chang MS, Chen LH, Hung LJ, Rossmanith P, Wu GH (2014) Exact algorithms for problems related to the densest \(k\)-set problem. Inf Process Lett 114(9):510–513
Chen XL, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: AAAI, p 14
Chen DG, Zhang XX, Li WL (2015) On measurements of covering rough sets based on granules and evidence theory. Inf Sci 317:329–348
Chen M, Li LJ, Wang B, Chen JJ, Pan LN, Chen XY (2016) Effectively clustering by finding density backbone based-on KNN. Pattern Recongn 60:486–498
Chiroma H, Gital AY, Abubakar A, Zeki A (2014) Comparing performances of markov blanket and tree augmented Naïve Bayes on the iris dataset. Lect Notes Eng Comput Sci 2209(1):328–331
Chuang PJ, Yang SH, shin Lin C (2009) An energy-efficient balanced clustering algorithm for wireless sensor networks. In: WiCOM, pp 1–4
Dasgupta S (2002) Performance guarantees for hierarchical clustering. Comput Learn Theory 2375:351–363
Dengel A, Althoff T, Ulges A (2011) Balanced clustering for content-based image browsing. In: Gi-Informatiktage
Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp 226–231
Guo LJ, Ai CY, Wang XM, Cai ZP (2009) Real time clustering of sensory data in wireless sensor networks. In: Performance computing & communications conference, pp 33–40
Hu BQ, Wong H, Yiu KFC (2016) The aggregation of multiple three-way decision spaces. Knowl Based Syst 98:241–249
Huang CC, Li JH, Mei CL, Wu WZ (2017) Three-way concept learning based on cognitive operators: an information fusion viewpoint. Int J Approx Reason 84(1):1–20
Johnson CS (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Kaufman LJP (2008) Rousseeuw: finding groups in data: an introduction to cluster analysis. Wiley, London
Kriegel HP, Kröger P, Sander J, Zimek A (2011) Density-based clustering. Wiley interdisciplinary reviews: data mining and knowledge discovery 1(3):231–240
Langford J, Zhang XH, Brown G, Bhattacharya I, Getoor L, Zeugmann T (2010) EM clustering. Springer, Boston, MA
Leong SH, Ong SH (2017) Similarity measure and domain adaptation in multiple mixture model clustering: an application to image processing. PLOS ONE 12(7):e0180307
Li YFW, Tsang I, Zhou ZH (2009) Tighter and convex maximum margin clustering. AISTATS 5:344–351
Li JH, Mei CL, Lv YJ (2011) A heuristic knowledge-reduction method for decision formal contexts. Comput Math Appl 61(4):1096–1106
Li JH, Huang CC, Qi JJ, Qian YH, Liu WQ (2016) Three-way cognitive concept learning via multi-granularity. Inf Sci 361(1):1–15
Li XN, Sun BZ, She YH (2017) Generalized matroids based on three-way decision models. Int J Approx Reason 90:192–207
Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy. Elsevier Science Inc., New York
Liu D, Liang DC (2017) Three-way decisions in ordered decision system. Knowl Based Syst 137:182–195
Liu HY, Han JW, Nie FP, Li XL (2017a) Balanced clustering with least square regression. AAA I:2231–2237
Liu YH, Ma ZM, Yu F (2017b) Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowl Based Syst 133:208–220
Liu R, Wang H, Yu XM (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Lu JY, Zhu QS (2017) An effective algorithm based on density clustering framework. IEEE Access 5(99):4991–5000
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297
Min F, He HP, Qian YH, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181:4928–4942
Pappas NT (1992) An adaptive clustering algorithm for image segmentation. IEEE Trans Signal Process 40(4):901–914
Qian J, Lv P, Yue XD, Liu CH, Jing ZJ (2015) Hierarchical attribute reduction algorithms for big data using mapreduce. Knowl Based Syst 73:18–31
Reyes O, Altalhi AH, Ventura S (2018) Statistical comparisons of active learning strategies over multiple datasets. Knowl Based Syst 145:274–288
Rodriguez A, Laio A (2014) Machine learning clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to speed-up the k-means clustering method. Int J Mach Learn Cybern 4(2):107–117
Shuji S, Masanori K, Takashi I, Yutaka A (2015) Faster sequence homology searches by clustering subsequences. Bioinformatics 31(8):1183–1190
Stenger J (2011) Disability living allowance. Mental Health Today 1(6):277
Ugulino W, Vega K, Velloso E (2012) Wearable computing: accelerometers data classification of body postures and movements. Adv Artif Intell SBIA 2012:52–61
Wang XZ, Musa AB (2014) Advances in neural network based learning. Int J Mach Learn Cybern 5(1):1–2
Wang Y, Jiang Y, Wu Y, Zhou ZH (2011) Spectral clustering on multiple manifolds. IEEE Trans Neural Netw 22(7):1149–1161
Wang M, Min F, Wu YX, Zhang ZH (2017) Active learning through density clustering. Exp Syst Appl 85:305–317
Wilderjans TF, Cariou V (2016) Clv3w : a clustering around latent variables approach to detect panel disagreement in three-way conventional sensory profiling data. Food Qual Prefer 47:45–53
Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40
Xu J, Wang GY, Deng WH (2016) Denpehc: density peak based efficient hierarchical clustering. Inf Sci 373(12):200–218
Xu X, Ding SF, Du MJ, Xue Y (2018a) DPCG: an efficient density peaks clustering algorithm based on grid. Int J Mach Learn Cybern 9(5):743–754
Xu X, Ding SF, Shi ZZ (2018b) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl Based Syst 158:65–74
Yao JT, Yao YY (2002) Induction of classification rules by granular computing. Springer, Berlin
Yu J, Cheng QS (2001) The upper bound of the optimal number of clusters in fuzzy clustering. Inf Sci 44(2):119–125
Yu H, Jiao P, Yao YY, Wang GY (2016) Detecting and refining overlapping regions in complex networks with three-way decisions. Inf Sci 373:21–41
Zhang HR, Min F, Shi B (2017) Regression-based three-way recommendation. Inf Sci 378:444–461
Zhang HR, Min F, Zhang ZH, Wang S (2019) Efficient collaborative filtering recommendations with multi-channel feature vectors. Int J Mach Learn Cybern 10(5):1165–1172
Zhao H, Wang P, Hu QH (2016a) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci 366:134–149
Zhao YX, Li JH, Liu WQ, Xu WH (2016b) Cognitive concept learning from incomplete information. Int J Mach Learn Cybern 7(4):1–12
Zhou ZH (2016) Machine learning. Tsinghua Press, Beijing
Acknowledgements
This work was supported by the National Natural Science Foundation of China (41604114); the Sichuan Province Youth Science and Technology Innovation Team (2019JDTD0017).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declared that they have no conflicts of interest to this work.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, M., Zhang, YY., Min, F. et al. A two-stage density clustering algorithm. Soft Comput 24, 17797–17819 (2020). https://doi.org/10.1007/s00500-020-05028-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05028-x