A hierarchical clustering algorithm based on noise removal

Cheng, Dongdong; Zhu, Qingsheng; Huang, Jinlong; Wu, Quanwang; Yang, Lijun

doi:10.1007/s13042-018-0836-3

A hierarchical clustering algorithm based on noise removal

Original Article
Published: 29 June 2018

Volume 10, pages 1591–1602, (2019)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Dongdong Cheng¹,
Qingsheng Zhu¹,
Jinlong Huang²,
Quanwang Wu¹ &
…
Lijun Yang³

657 Accesses
12 Citations
Explore all metrics

Abstract

Noise is irrelevant or meaningless data and hinders most types of data analysis. The existing clustering algorithms seldom take the noise points into consideration and cannot detect arbitrary-shaped clusters. This paper presents a Hierarchical Clustering algorithm Based on Noise Removal (HCBNR). It is robust against noise points and good at discovering clusters with arbitrary shapes. In this work, natural neighbor-based density is applied to remove noise points in a data set firstly. Then we construct a saturated neighbor graph on the rest points, and a novel modularity-based graph partitioning algorithm is used to divide the graph into small clusters. Finally, the small clusters are repeatedly merged according to a novel similarity metric between clusters until the desired cluster number is obtained. The experimental results on synthetic data sets and real data sets show that our method can accurately identify noise points and obtain better clustering results than existing clustering algorithms when discovering arbitrary-shaped clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel stratification clustering algorithm based on a new local density estimation method and an improved local inter-cluster distance measure

Article 23 June 2023

A local cores-based hierarchical clustering algorithm for data sets with complex structures

Article 26 July 2018

A novel clustering algorithm based on the gravity-mass-square ratio and density core with a dynamic denoising radius

Article 11 November 2021

References

Breunig MM, Kriegel HP, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. Acm Sigmod Record 29(2):93–104
Article Google Scholar
Chen WY, Song Y, Bai H, Lin CJ, Chang EY (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586
Article Google Scholar
Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl Based Syst 123C:238–253
Article Google Scholar
Ester M, Kriegel HP, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, pp 226–231
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Article MathSciNet MATH Google Scholar
Guha S, Rastogi R, Shim K (2000) Rock: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366
Article Google Scholar
Guha S, Rastogi R, Shim K (2001) Cure: an efficient clustering algorithm for large databases. Inf Syst 26(1):35–58
Article MATH Google Scholar
Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl Based Syst 63(2):15–23
Article Google Scholar
Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) Qcc: a novel clustering algorithm based on quasi-cluster centers. Mach Learn 106(3):337–357
Article MathSciNet MATH Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. Acm Comput Surv 31(3):264–323
Article Google Scholar
Karypis G, Aggarwal R, Kumar V, Shekhar S (2002) Multilevel hypergraph partitioning: applications in vlsi domain. IEEE Trans Very Large Scale Integr Syst 7(1):69–79
Article Google Scholar
Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Computer Society Press
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. John Wiley, Hoboken
MATH Google Scholar
King B (1967) Step-wise clustering procedures. J Am Stat Assoc 62(317):86–101
Article Google Scholar
Lv Y, Ma T, Tang M, Cao J, Tian Y, Al-Rodhaan M (2015) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171C:9–22
Google Scholar
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of Berkeley Symposium on Mathematical Statistics and Probability, pp 281–297
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E Stat Nonlinear Soft Matter Phys . https://doi.org/10.1103/PhysRevE.69.026113
Newman MEJ (2004) Analysis of weighted networks. Phys Rev E Stat Nonlinear Soft Matter Phys 70(5):1–9
Google Scholar
Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577–8582
Article Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492
Article Google Scholar
Sneath PH, Sokal RR (1962) Numerical taxonomy. Nature 193:855–860
Article Google Scholar
Veenman CJ, Reinders MJT, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280
Article Google Scholar
Wang G, Song Q (2016) Automatic clustering via outward statistical testing on density metrics. IEEE Trans Knowl Data Eng 28(8):1971–1985
Article Google Scholar
Wang X, Wang XL, Chen C, Wilkes DM (2013) Enhancing minimum spanning tree-based clustering by removing density-based outliers. Digital Signal Process 23(5):1523–1538
Article MathSciNet Google Scholar
Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf Sci 354:19–40
Article Google Scholar
Xiong H, Pandey G, Steinbach M, Kumar V (2006) Enhancing data analysis with noise removal. IEEE Trans Knowl Data Eng 18(3):304–319
Article Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: ACM SIGMOD International Conference on Management of Data, pp 103–114
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80:30–36
Article Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China, No. 61702060 and No. 61502060, Science and Technology Project of Chongqing Municipal Education Commission, No. KJ15012014 and Project of Chongqing Education Commission, No. KJZH17104.

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, China
Dongdong Cheng, Qingsheng Zhu & Quanwang Wu
College of Computer Engineering, Yangtze Normal University, Chongqing, China
Jinlong Huang
School of Computer Science and Technology, Southwest Minzu University, Chengdu, China
Lijun Yang

Authors

Dongdong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Qingsheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Quanwang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lijun Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingsheng Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, D., Zhu, Q., Huang, J. et al. A hierarchical clustering algorithm based on noise removal. Int. J. Mach. Learn. & Cyber. 10, 1591–1602 (2019). https://doi.org/10.1007/s13042-018-0836-3

Download citation

Received: 20 September 2017
Accepted: 31 May 2018
Published: 29 June 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s13042-018-0836-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hierarchical clustering algorithm based on noise removal

Abstract

Access this article

Similar content being viewed by others

A novel stratification clustering algorithm based on a new local density estimation method and an improved local inter-cluster distance measure

A local cores-based hierarchical clustering algorithm for data sets with complex structures

A novel clustering algorithm based on the gravity-mass-square ratio and density core with a dynamic denoising radius

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hierarchical clustering algorithm based on noise removal

Abstract

Access this article

Similar content being viewed by others

A novel stratification clustering algorithm based on a new local density estimation method and an improved local inter-cluster distance measure

A local cores-based hierarchical clustering algorithm for data sets with complex structures

A novel clustering algorithm based on the gravity-mass-square ratio and density core with a dynamic denoising radius

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation