Skip to main content

Advertisement

Log in

A local cores-based hierarchical clustering algorithm for data sets with complex structures

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Hierarchical clustering is of great importance in data analysis. Although there are a number of hierarchical clustering algorithms including agglomerative methods, divisive methods and hybrid methods, most of them are sensitive to noise points, suffer from high computational cost and cannot effectively discover clusters with complex structures. When recognizing patterns from complex structures, humans intuitively tend to discover obvious clusters in dense regions firstly and then deal with objects on the border. Inspired by this idea, we propose a local cores-based hierarchical clustering algorithm called HCLORE. The proposed method first partitions the data set into several clusters by finding local cores, instead of optimizing an objective function through iteration like K-means; then temporarily removes points with lower local density, so that the boundary between clusters is clearer; after that merges clusters according to a newly defined similarities between clusters; and finally points with lower local density are assigned to the same clusters as their local cores belong to. The experimental results on synthetic data sets and real data sets show that our algorithm is more effective and efficient than existing methods when processing data sets with complex structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The MATLAB code is available upon request.

References

  1. Bouguettaya A, Yu Q, Liu XM, Zhou XM, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797

    Article  Google Scholar 

  2. Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203

    Article  Google Scholar 

  3. Chen WY, Song YQ, Bai HJ, Lin CJ, Chang EY (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586

    Article  Google Scholar 

  4. Chen Y, Tang S, Zhou L, Wang C, Du J, Wang T, Pei S (2016) Decentralized clustering by finding loose and distributed density cores. Inf Sci (in press)

  5. Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123(C):238–253

    Article  Google Scholar 

  6. Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145

    Article  Google Scholar 

  7. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231

    Google Scholar 

  8. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MathSciNet  Google Scholar 

  9. Fu LM, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinform 8:3

    Article  Google Scholar 

  10. Gionis A, Mannila H (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30

    Article  Google Scholar 

  11. Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23

    Article  Google Scholar 

  12. Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) Qcc: a novel clustering algorithm based on quasi-cluster centers. Mach Learn 106(3):337–357

    Article  MathSciNet  Google Scholar 

  13. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666

    Article  Google Scholar 

  14. Jain AK, Law MHC (2005) Data clustering: a user’s dilemma. Pattern Recognit Mach Intell Proc 3776:1–10

    Article  Google Scholar 

  15. Jia HJ, Ding SF, Meng LH, Fan SY (2014) A density-adaptive affinity propagation clustering algorithm based on spectral dimension reduction. Neural Comput Appl 25(7–8):1557–1567

    Article  Google Scholar 

  16. Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75

    Article  Google Scholar 

  17. King B (1967) Step-wise clustering procedures. J Am Stat Assoc 62(317):86–101

    Article  Google Scholar 

  18. Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy: 3dc clustering. Pattern Recognit Lett 73:52–59

    Article  Google Scholar 

  19. Lin CR, Chen MS (2005) Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Trans Knowl Data Eng 17(2):145–159

    Article  Google Scholar 

  20. von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  21. Lv Y, Ma T, Tang M, Cao J, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171(C):9–22

    Article  Google Scholar 

  22. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297

  23. Moss WW, Hendrick JA (1973) Numerical taxonomy. Ann Rev Entomol 18:227–258

    Article  Google Scholar 

  24. Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation, North Chelmsford

    MATH  Google Scholar 

  25. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  26. Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, London

    Google Scholar 

  27. Samaria FS, Hater AC (2014) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision. IEEE, pp 138–142

  28. Shao JM, He X, Bohm C, Yang QL, Plant C (2013) Synchronization-inspired partitioning and hierarchical clustering. IEEE Trans Knowl Data Eng 25(4):893–905

    Article  Google Scholar 

  29. Wang G, Son Q (2016) Automatic clustering via outward statistical testing on density metrics. IEEE Trans Knowl Data Eng 28(8):1971–1985

    Article  Google Scholar 

  30. Schölkopf B, Platt J, Hofmann T (2007) A local learning approach for clustering. In: Advances in neural information processing systems 19: Proceedings of the 2006 conference. MIT Press, pp 1529–1536. ISBN:9780262256919. https://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6287388

  31. Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40

    Article  Google Scholar 

  32. Xu J, Wang G, Deng W (2016) Denpehc: density peak based efficient hierarchical clustering. Inf Sci 373:200–218

    Article  Google Scholar 

  33. Zhang H, Chow TWS, Wu QMJ (2016) Organizing books and authors by multilayer som. IEEE Trans Neural Netw Learn Syst 27(12):2537–2550. https://doi.org/10.1109/TNNLS.2015.2496281

    Article  Google Scholar 

  34. Zhang H, Wang S, Xu X, Chow TWS, Wu QMJ (2018) Tree2vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2018.2797060

    Article  MathSciNet  Google Scholar 

  35. Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: ACM SIGMOD record, vol 25. ACM, pp 103–114

  36. Zhang X, Wang W, Norvag K, Sebag M (2010) K-ap: generating specified k clusters by efficient affinity propagation. In: 2010 IEEE 10th international conference on data mining (ICDM). IEEE, pp 1187–1192

  37. Zhu QS, Feng J, Huang JL (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80:30–36

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by National Nature Science Foundation of China (61502060, 61702060), Project of Chongqing Education Commission (KJZH17104) and Science and Technology Project of Chongqing Municipal Education Commission (KJ15012014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingsheng Zhu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 8316 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, D., Zhu, Q., Huang, J. et al. A local cores-based hierarchical clustering algorithm for data sets with complex structures. Neural Comput & Applic 31, 8051–8068 (2019). https://doi.org/10.1007/s00521-018-3641-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3641-8

Keywords

Navigation