A neighborhood-based three-stage hierarchical clustering algorithm

Wang, Yan; Ma, Yan; Huang, Hui

doi:10.1007/s11042-021-11171-w

A neighborhood-based three-stage hierarchical clustering algorithm

Published: 29 July 2021

Volume 80, pages 32379–32407, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

400 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Many neighborhood-based clustering algorithms have been proposed to measure the similarity between data points or subclusters with their neighborhood information. However, most of them are vulnerable to the different cluster sizes, shapes and densities. In this paper, we propose a neighborhood-based three-stage hierarchical clustering algorithm (NTHC) which is robust to the difference. Three concepts, i.e., the stability of data point pair, the linked representatives, and the expanded representatives, are defined. Furthermore, a new measure of intercluster distance based on representatives is designed. In Stage 1, the outliers are detected and removed from the data set using reverse nearest neighbors. In Stage 2, small clusters are formed by merging the data points with stable connection on 1-nearest neighbor graph. In Stage 3, the final partitions are obtained by iteratively merging the closest pair of clusters based on the new measure of intercluster distance. Tests are carried out to compare the proposal with 15 other clustering algorithms. The experimental results on synthetic and real data sets demonstrate the proposed method is effective. In addition, we test the statistically significant differences among the sixteen clustering algorithms using the Friedman test. And the average rank value of the proposed algorithm is 4.19, which is superior to the other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Notes

https://github.com/yian158/clustering-analysis.

References

Abbas M, El-Zoghabi A, Shoukry A (2021) DenMune: density peak based clustering using mutual nearest neighbors. Pattern Recogn 109:107589
Article Google Scholar
Ali A, Zhu Y, Chen Q, Yu J, Cai H (2019) Leveraging spatio-temporal patterns for predicting citywide traffic crowd flows using deep hybrid neural networks. In Proceedings of the 2019 IEEE 25th International Conference on Parallel and Distributed Systems, pp. 125–132.
Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimed Tools Appl 2:1–33
Google Scholar
Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60
Article Google Scholar
Asuncion A, Newman D (2007) UCI machine learning repository
Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles, in Proceedings of the 1990 ACM SIGMOD international conference on Management of data, 322–331
Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm–a survey. Algorithm Eng 9220:81–116
Article MathSciNet Google Scholar
Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797
Article Google Scholar
Bryant A, Cios K (2018) RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121
Article Google Scholar
Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: parameter reduction and outlier detection. Inf Syst 38(3):317–330
Article Google Scholar
Chang H, Yeung D-Y (2008) Robust path-based spectral clustering. Pattern Recogn 41(1):191–203
Article MATH Google Scholar
Chen Y, Zhou L, Tang Y, Singh JP, Bouguila N, Wang C, Wang H, Du J (2019) Fast neighbor search by using revised kd tree. Inf Sci 472:145–162
Article MATH Google Scholar
Chowdhary CL, Acharjya D (2016) A hybrid scheme for breast cancer detection using intuitionistic fuzzy rough set technique. Intl J Healthcare Inf Syst Inf (IJHISI) 11(2):38–61
Article Google Scholar
Chowdhary CL, Acharjya D (2017) Clustering algorithm in possibilistic exponential fuzzy c-mean segmenting medical images. J Biomim, Biomater Biomed Eng 30:12–23
Google Scholar
Chowdhary CL, Acharjya D (2017) Segmentation of mammograms using a novel intuitionistic possibilistic fuzzy c-mean clustering algorithm, Nature Inspired Computing, vol. 652, pp. 75–82: Springer
Chowdhary CL, Acharjya D (2020) Segmentation and feature extraction in medical imaging: a systematic review. Procedia Computer Science 167:26–36
Article Google Scholar
Chowdhary CL, Sai GVK, Acharjya D (2016) Decrease in false assumption for detection using digital mammography, Computational Intelligence in Data Mining—Volume 2, pp. 325–333: Springer
Dahal S (2015) Effect of different distance measures in result of cluster analysis
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet MATH Google Scholar
Ding S, Xu X, Fan S, Xue Y (2018) Locally adaptive multiple kernel k-means algorithm based on shared nearest neighbors. Soft Comput 22(14):4573–4583
Article Google Scholar
Ding S, Cong L, Hu Q, Jia H, Shi Z (2019) A multiway p-spectral clustering algorithm. Knowl-Based Syst 164:371–377
Article Google Scholar
Dolatshah M, Hadian A, Minaei-Bidgoli B (2015) Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces, Computer Science, arXiv preprint arXiv:1511.00628
Dong S (2021) Multi class SVM algorithm with active learning for network traffic classification. Expert Syst Appl 176:114885
Article Google Scholar
Dong S, Zhou D, Ding W, Gong J (2013) Flow cluster algorithm based on improved K-means method. IETE J Res 59(4):326–333
Article Google Scholar
Dong S, Zhang X, Li Y (2018) Microblog sentiment analysis method based on spectral clustering. J Inf Process Syst 14(3):727–739
Google Scholar
Dong S, Wang P, Abbas K (2021) A survey on deep learning and its applications. Comput Sci Rev 40:100379
Article MathSciNet Google Scholar
Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145
Article Google Scholar
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data, in Proceedings of the 2003 SIAM international conference on data mining, 47–58
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise, AAAI press
Fan J-c, Jia P-l, Ge L (2019) M k-NN G-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph. Int J Mach Learn Cybern 11(6):1–17
Google Scholar
Franti P, Virmajoki O, Hautamaki V (2006) Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans Pattern Anal Mach Intell 28(11):1875–1881
Article Google Scholar
Gowda KC, Krishna G (1978) Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern Recogn 10(2):105–112
Article MATH Google Scholar
Güngör E, Özmen A (2017) Distance and density based clustering algorithm using Gaussian kernel. Expert Syst Appl 69:10–20
Article Google Scholar
Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108
MATH Google Scholar
İnkaya T (2015) A parameter-free similarity graph for spectral clustering. Expert Syst Appl 42(24):9489–9498
Article Google Scholar
İnkaya T, Kayalıgil S, Özdemirel NE (2015) An adaptive neighbourhood construction algorithm based on density and connectivity. Pattern Recogn Lett 52:17–24
Article Google Scholar
Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 100(11):1025–1034
Article Google Scholar
Jeon Y, Yoo J, Lee J, Yoon S (2017) Nc-link: a new linkage method for efficient hierarchical clustering of large-scale data. IEEE Access 5:5594–5608
Google Scholar
Karypis G, Han E-H, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Article Google Scholar
Lai JZ, Huang T-J (2011) An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list. Inf Sci 181(9):1722–1734
Article Google Scholar
Li H, Liu X, Li T, Gan R (2020) A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recogn 102:1–13
Article Google Scholar
Li J, Huang G, Zhou Y (2020) A sentiment classification approach of sentences clustering in webcast barrages. J Inf Process Syst 16(3):718–732
Google Scholar
Li X, Lv J, Yi Z (2018) Outlier detection using structural scores in a high-dimensional space. IEEE Trans Cyberne 50(5):2302–2310
Article Google Scholar
Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Article MathSciNet Google Scholar
Lv X, Ma Y, He X, Huang H, Yang J (2018) CciMST: a clustering algorithm based on minimum spanning tree and cluster centers. Math Probl Eng 2018:1–14
MathSciNet MATH Google Scholar
Lv Y, Ma T, Tang M, Cao J, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171:9–22
Article Google Scholar
Lv Y, Liu M, Xiang Y (2020) Fast Searching Density Peak Clustering Algorithm Based on Shared Nearest Neighbor and Adaptive Clustering Center. Symmetry 12(12):2014
Article Google Scholar
Ma Y, Lin H, Wang Y, Huang H, He X (2021) A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint. Inf Sci 557:194–219
Article MathSciNet Google Scholar
Maier M, Hein M, Von Luxburg U (2009) Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theor Comput Sci 410(19):1749–1764
Article MathSciNet MATH Google Scholar
Murtagh F, Contreras P (2017) Algorithms for hierarchical clustering: an overview, II, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 7, no. 6, pp. e1219
Qin Y, Yu ZL, Wang C-D, Gu Z, Li Y (2018) A novel clustering method based on hybrid K-nearest-neighbor graph. Pattern Recogn 74(1):1–14
Article Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Ros F, Guillaume S (2019) Munec: a mutual neighbor-based clustering algorithm. Inf Sci 486:148–170
Article MathSciNet Google Scholar
Ros F, Guillaume S, El Hajji M, Riad R (2020) KdMutual: a novel clustering algorithm combining mutual neighboring and hierarchical approaches using a new selection criterion. Knowl-Based Syst 204:106220
Article Google Scholar
Sarfraz S, Sharma V, Stiefelhagen R (2019) Efficient parameter-free clustering using first neighbor relations, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8934–8943
Vadapalli S, Valluri SR, Karlapalem K (2006) A simple yet effective data clustering algorithm, in Sixth International Conference on Data Mining (ICDM'06), 1108–1112
Xie W-B, Lee Y-L, Wang C, Chen D-B, Zhou T (2020) Hierarchical clustering supported by reciprocal nearest neighbors. Inf Sci 527:279–292
Article MathSciNet Google Scholar
Yang J, Ma Y, Zhang X, Li S, Zhang Y (2017) An initialization method based on hybrid distance for k-means algorithm. Neural Comput 29(11):3094–3117
Article MathSciNet MATH Google Scholar
Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl-Based Syst 133:208–220
Article Google Scholar
Ye H, Lv H, Sun Q (2016) An improved clustering algorithm based on density and shared nearest neighbor, in 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, pp. 37–40
Yu M, Hillebrand A, Tewarie P, Meier J, van Dijk B, Van Mieghem P, Stam CJ (2015) Hierarchical clustering in minimum spanning trees. Chaos: Interdisc J Nonlinear ScI 25(2):023107
Article MathSciNet Google Scholar
Zhong C, Miao D, Wang R (2010) A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recogn 43(3):752–766
Article MATH Google Scholar
Zhong C, Miao D, Fränti P (2011) Minimum spanning tree based split-and-merge: a hierarchical clustering method. Inf Sci 181(16):3397–3410
Article Google Scholar
Zhou Q (2018) Traffic flow data analysis and mining method based on clustering recognition algorithm. Adv Transport Stud 3:101–108
Google Scholar

Download references

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (Grant no. 61373004, 61501297).

Author information

Authors and Affiliations

College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai, China
Yan Wang, Yan Ma & Hui Huang

Authors

Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Hui Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Ma.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Ma, Y. & Huang, H. A neighborhood-based three-stage hierarchical clustering algorithm. Multimed Tools Appl 80, 32379–32407 (2021). https://doi.org/10.1007/s11042-021-11171-w

Download citation

Received: 22 October 2020
Revised: 29 May 2021
Accepted: 22 June 2021
Published: 29 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11042-021-11171-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A neighborhood-based three-stage hierarchical clustering algorithm

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A neighborhood-based three-stage hierarchical clustering algorithm

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation