A local cores-based hierarchical clustering algorithm for data sets with complex structures

Cheng, Dongdong; Zhu, Qingsheng; Huang, Jinlong; Wu, Quanwang; Yang, Lijun

doi:10.1007/s00521-018-3641-8

A local cores-based hierarchical clustering algorithm for data sets with complex structures

Original Article
Published: 26 July 2018

Volume 31, pages 8051–8068, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Dongdong Cheng¹,
Qingsheng Zhu ORCID: orcid.org/0000-0002-5454-3719¹,
Jinlong Huang²,
Quanwang Wu¹ &
…
Lijun Yang³

557 Accesses
24 Citations
Explore all metrics

Abstract

Hierarchical clustering is of great importance in data analysis. Although there are a number of hierarchical clustering algorithms including agglomerative methods, divisive methods and hybrid methods, most of them are sensitive to noise points, suffer from high computational cost and cannot effectively discover clusters with complex structures. When recognizing patterns from complex structures, humans intuitively tend to discover obvious clusters in dense regions firstly and then deal with objects on the border. Inspired by this idea, we propose a local cores-based hierarchical clustering algorithm called HCLORE. The proposed method first partitions the data set into several clusters by finding local cores, instead of optimizing an objective function through iteration like K-means; then temporarily removes points with lower local density, so that the boundary between clusters is clearer; after that merges clusters according to a newly defined similarities between clusters; and finally points with lower local density are assigned to the same clusters as their local cores belong to. The experimental results on synthetic data sets and real data sets show that our algorithm is more effective and efficient than existing methods when processing data sets with complex structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comprehensive survey on hierarchical clustering algorithms and the recent developments

Article 26 December 2022

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Article 09 February 2021

Clustering graph data: the roadmap to spectral techniques

Article Open access 22 January 2024

Notes

The MATLAB code is available upon request.

References

Bouguettaya A, Yu Q, Liu XM, Zhou XM, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797
Article Google Scholar
Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recognit 41(1):191–203
Article Google Scholar
Chen WY, Song YQ, Bai HJ, Lin CJ, Chang EY (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586
Article Google Scholar
Chen Y, Tang S, Zhou L, Wang C, Du J, Wang T, Pei S (2016) Decentralized clustering by finding loose and distributed density cores. Inf Sci (in press)
Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123(C):238–253
Article Google Scholar
Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145
Article Google Scholar
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Google Scholar
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Article MathSciNet Google Scholar
Fu LM, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinform 8:3
Article Google Scholar
Gionis A, Mannila H (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30
Article Google Scholar
Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23
Article Google Scholar
Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) Qcc: a novel clustering algorithm based on quasi-cluster centers. Mach Learn 106(3):337–357
Article MathSciNet Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Article Google Scholar
Jain AK, Law MHC (2005) Data clustering: a user’s dilemma. Pattern Recognit Mach Intell Proc 3776:1–10
Article Google Scholar
Jia HJ, Ding SF, Meng LH, Fan SY (2014) A density-adaptive affinity propagation clustering algorithm based on spectral dimension reduction. Neural Comput Appl 25(7–8):1557–1567
Article Google Scholar
Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Article Google Scholar
King B (1967) Step-wise clustering procedures. J Am Stat Assoc 62(317):86–101
Article Google Scholar
Liang Z, Chen P (2016) Delta-density based clustering with a divide-and-conquer strategy: 3dc clustering. Pattern Recognit Lett 73:52–59
Article Google Scholar
Lin CR, Chen MS (2005) Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging. IEEE Trans Knowl Data Eng 17(2):145–159
Article Google Scholar
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
Lv Y, Ma T, Tang M, Cao J, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171(C):9–22
Article Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297
Moss WW, Hendrick JA (1973) Numerical taxonomy. Ann Rev Entomol 18:227–258
Article Google Scholar
Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation, North Chelmsford
MATH Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. Wiley, London
Google Scholar
Samaria FS, Hater AC (2014) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision. IEEE, pp 138–142
Shao JM, He X, Bohm C, Yang QL, Plant C (2013) Synchronization-inspired partitioning and hierarchical clustering. IEEE Trans Knowl Data Eng 25(4):893–905
Article Google Scholar
Wang G, Son Q (2016) Automatic clustering via outward statistical testing on density metrics. IEEE Trans Knowl Data Eng 28(8):1971–1985
Article Google Scholar
Schölkopf B, Platt J, Hofmann T (2007) A local learning approach for clustering. In: Advances in neural information processing systems 19: Proceedings of the 2006 conference. MIT Press, pp 1529–1536. ISBN:9780262256919. https://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6287388
Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40
Article Google Scholar
Xu J, Wang G, Deng W (2016) Denpehc: density peak based efficient hierarchical clustering. Inf Sci 373:200–218
Article Google Scholar
Zhang H, Chow TWS, Wu QMJ (2016) Organizing books and authors by multilayer som. IEEE Trans Neural Netw Learn Syst 27(12):2537–2550. https://doi.org/10.1109/TNNLS.2015.2496281
Article Google Scholar
Zhang H, Wang S, Xu X, Chow TWS, Wu QMJ (2018) Tree2vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2018.2797060
Article MathSciNet Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: ACM SIGMOD record, vol 25. ACM, pp 103–114
Zhang X, Wang W, Norvag K, Sebag M (2010) K-ap: generating specified k clusters by efficient affinity propagation. In: 2010 IEEE 10th international conference on data mining (ICDM). IEEE, pp 1187–1192
Zhu QS, Feng J, Huang JL (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80:30–36
Article Google Scholar

Download references

Acknowledgements

This work is supported by National Nature Science Foundation of China (61502060, 61702060), Project of Chongqing Education Commission (KJZH17104) and Science and Technology Project of Chongqing Municipal Education Commission (KJ15012014).

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, China
Dongdong Cheng, Qingsheng Zhu & Quanwang Wu
College of Computer Engineering, Yangtze Normal University, Chongqing, China
Jinlong Huang
School of Computer Science and Technology, Southwest Minzu University, Chengdu, China
Lijun Yang

Authors

Dongdong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Qingsheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Quanwang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lijun Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingsheng Zhu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 8316 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, D., Zhu, Q., Huang, J. et al. A local cores-based hierarchical clustering algorithm for data sets with complex structures. Neural Comput & Applic 31, 8051–8068 (2019). https://doi.org/10.1007/s00521-018-3641-8

Download citation

Received: 25 February 2018
Accepted: 13 July 2018
Published: 26 July 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00521-018-3641-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A local cores-based hierarchical clustering algorithm for data sets with complex structures

Abstract

Access this article

Similar content being viewed by others

Comprehensive survey on hierarchical clustering algorithms and the recent developments

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Clustering graph data: the roadmap to spectral techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Electronic supplementary material

Supplementary material 1 (PDF 8316 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A local cores-based hierarchical clustering algorithm for data sets with complex structures

Abstract

Access this article

Similar content being viewed by others

Comprehensive survey on hierarchical clustering algorithms and the recent developments

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Clustering graph data: the roadmap to spectral techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Electronic supplementary material

Supplementary material 1 (PDF 8316 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation