Abstract
Hierarchical clustering is of great importance in data analysis. Although there are a number of hierarchical clustering algorithms including agglomerative methods, divisive methods and hybrid methods, most of them are sensitive to noise points, suffer from high computational cost and cannot effectively discover clusters with complex structures. When recognizing patterns from complex structures, humans intuitively tend to discover obvious clusters in dense regions firstly and then deal with objects on the border. Inspired by this idea, we propose a local cores-based hierarchical clustering algorithm called HCLORE. The proposed method first partitions the data set into several clusters by finding local cores, instead of optimizing an objective function through iteration like K-means; then temporarily removes points with lower local density, so that the boundary between clusters is clearer; after that merges clusters according to a newly defined similarities between clusters; and finally points with lower local density are assigned to the same clusters as their local cores belong to. The experimental results on synthetic data sets and real data sets show that our algorithm is more effective and efficient than existing methods when processing data sets with complex structures.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The MATLAB code is available upon request.
References
Bouguettaya A, Yu Q, Liu XM, Zhou XM, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797
Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203
Chen WY, Song YQ, Bai HJ, Lin CJ, Chang EY (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586
Chen Y, Tang S, Zhou L, Wang C, Du J, Wang T, Pei S (2016) Decentralized clustering by finding loose and distributed density cores. Inf Sci (in press)
Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123(C):238–253
Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Fu LM, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinform 8:3
Gionis A, Mannila H (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30
Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23
Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) Qcc: a novel clustering algorithm based on quasi-cluster centers. Mach Learn 106(3):337–357
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Jain AK, Law MHC (2005) Data clustering: a user’s dilemma. Pattern Recognit Mach Intell Proc 3776:1–10
Jia HJ, Ding SF, Meng LH, Fan SY (2014) A density-adaptive affinity propagation clustering algorithm based on spectral dimension reduction. Neural Comput Appl 25(7–8):1557–1567
Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
King B (1967) Step-wise clustering procedures. J Am Stat Assoc 62(317):86–101
Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy: 3dc clustering. Pattern Recognit Lett 73:52–59
Lin CR, Chen MS (2005) Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Trans Knowl Data Eng 17(2):145–159
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Lv Y, Ma T, Tang M, Cao J, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171(C):9–22
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297
Moss WW, Hendrick JA (1973) Numerical taxonomy. Ann Rev Entomol 18:227–258
Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation, North Chelmsford
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, London
Samaria FS, Hater AC (2014) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision. IEEE, pp 138–142
Shao JM, He X, Bohm C, Yang QL, Plant C (2013) Synchronization-inspired partitioning and hierarchical clustering. IEEE Trans Knowl Data Eng 25(4):893–905
Wang G, Son Q (2016) Automatic clustering via outward statistical testing on density metrics. IEEE Trans Knowl Data Eng 28(8):1971–1985
Schölkopf B, Platt J, Hofmann T (2007) A local learning approach for clustering. In: Advances in neural information processing systems 19: Proceedings of the 2006 conference. MIT Press, pp 1529–1536. ISBN:9780262256919. https://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6287388
Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40
Xu J, Wang G, Deng W (2016) Denpehc: density peak based efficient hierarchical clustering. Inf Sci 373:200–218
Zhang H, Chow TWS, Wu QMJ (2016) Organizing books and authors by multilayer som. IEEE Trans Neural Netw Learn Syst 27(12):2537–2550. https://doi.org/10.1109/TNNLS.2015.2496281
Zhang H, Wang S, Xu X, Chow TWS, Wu QMJ (2018) Tree2vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2018.2797060
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: ACM SIGMOD record, vol 25. ACM, pp 103–114
Zhang X, Wang W, Norvag K, Sebag M (2010) K-ap: generating specified k clusters by efficient affinity propagation. In: 2010 IEEE 10th international conference on data mining (ICDM). IEEE, pp 1187–1192
Zhu QS, Feng J, Huang JL (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80:30–36
Acknowledgements
This work is supported by National Nature Science Foundation of China (61502060, 61702060), Project of Chongqing Education Commission (KJZH17104) and Science and Technology Project of Chongqing Municipal Education Commission (KJ15012014).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Cheng, D., Zhu, Q., Huang, J. et al. A local cores-based hierarchical clustering algorithm for data sets with complex structures. Neural Comput & Applic 31, 8051–8068 (2019). https://doi.org/10.1007/s00521-018-3641-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3641-8