Abstract
Many neighborhood-based clustering algorithms have been proposed to measure the similarity between data points or subclusters with their neighborhood information. However, most of them are vulnerable to the different cluster sizes, shapes and densities. In this paper, we propose a neighborhood-based three-stage hierarchical clustering algorithm (NTHC) which is robust to the difference. Three concepts, i.e., the stability of data point pair, the linked representatives, and the expanded representatives, are defined. Furthermore, a new measure of intercluster distance based on representatives is designed. In Stage 1, the outliers are detected and removed from the data set using reverse nearest neighbors. In Stage 2, small clusters are formed by merging the data points with stable connection on 1-nearest neighbor graph. In Stage 3, the final partitions are obtained by iteratively merging the closest pair of clusters based on the new measure of intercluster distance. Tests are carried out to compare the proposal with 15 other clustering algorithms. The experimental results on synthetic and real data sets demonstrate the proposed method is effective. In addition, we test the statistically significant differences among the sixteen clustering algorithms using the Friedman test. And the average rank value of the proposed algorithm is 4.19, which is superior to the other algorithms.
Similar content being viewed by others
References
Abbas M, El-Zoghabi A, Shoukry A (2021) DenMune: density peak based clustering using mutual nearest neighbors. Pattern Recogn 109:107589
Ali A, Zhu Y, Chen Q, Yu J, Cai H (2019) Leveraging spatio-temporal patterns for predicting citywide traffic crowd flows using deep hybrid neural networks. In Proceedings of the 2019 IEEE 25th International Conference on Parallel and Distributed Systems, pp. 125–132.
Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimed Tools Appl 2:1–33
Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60
Asuncion A, Newman D (2007) UCI machine learning repository
Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles, in Proceedings of the 1990 ACM SIGMOD international conference on Management of data, 322–331
Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm–a survey. Algorithm Eng 9220:81–116
Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797
Bryant A, Cios K (2018) RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121
Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: parameter reduction and outlier detection. Inf Syst 38(3):317–330
Chang H, Yeung D-Y (2008) Robust path-based spectral clustering. Pattern Recogn 41(1):191–203
Chen Y, Zhou L, Tang Y, Singh JP, Bouguila N, Wang C, Wang H, Du J (2019) Fast neighbor search by using revised kd tree. Inf Sci 472:145–162
Chowdhary CL, Acharjya D (2016) A hybrid scheme for breast cancer detection using intuitionistic fuzzy rough set technique. Intl J Healthcare Inf Syst Inf (IJHISI) 11(2):38–61
Chowdhary CL, Acharjya D (2017) Clustering algorithm in possibilistic exponential fuzzy c-mean segmenting medical images. J Biomim, Biomater Biomed Eng 30:12–23
Chowdhary CL, Acharjya D (2017) Segmentation of mammograms using a novel intuitionistic possibilistic fuzzy c-mean clustering algorithm, Nature Inspired Computing, vol. 652, pp. 75–82: Springer
Chowdhary CL, Acharjya D (2020) Segmentation and feature extraction in medical imaging: a systematic review. Procedia Computer Science 167:26–36
Chowdhary CL, Sai GVK, Acharjya D (2016) Decrease in false assumption for detection using digital mammography, Computational Intelligence in Data Mining—Volume 2, pp. 325–333: Springer
Dahal S (2015) Effect of different distance measures in result of cluster analysis
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Ding S, Xu X, Fan S, Xue Y (2018) Locally adaptive multiple kernel k-means algorithm based on shared nearest neighbors. Soft Comput 22(14):4573–4583
Ding S, Cong L, Hu Q, Jia H, Shi Z (2019) A multiway p-spectral clustering algorithm. Knowl-Based Syst 164:371–377
Dolatshah M, Hadian A, Minaei-Bidgoli B (2015) Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces, Computer Science, arXiv preprint arXiv:1511.00628
Dong S (2021) Multi class SVM algorithm with active learning for network traffic classification. Expert Syst Appl 176:114885
Dong S, Zhou D, Ding W, Gong J (2013) Flow cluster algorithm based on improved K-means method. IETE J Res 59(4):326–333
Dong S, Zhang X, Li Y (2018) Microblog sentiment analysis method based on spectral clustering. J Inf Process Syst 14(3):727–739
Dong S, Wang P, Abbas K (2021) A survey on deep learning and its applications. Comput Sci Rev 40:100379
Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data, in Proceedings of the 2003 SIAM international conference on data mining, 47–58
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise, AAAI press
Fan J-c, Jia P-l, Ge L (2019) M k-NN G-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph. Int J Mach Learn Cybern 11(6):1–17
Franti P, Virmajoki O, Hautamaki V (2006) Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans Pattern Anal Mach Intell 28(11):1875–1881
Gowda KC, Krishna G (1978) Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern Recogn 10(2):105–112
Güngör E, Özmen A (2017) Distance and density based clustering algorithm using Gaussian kernel. Expert Syst Appl 69:10–20
Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108
İnkaya T (2015) A parameter-free similarity graph for spectral clustering. Expert Syst Appl 42(24):9489–9498
İnkaya T, Kayalıgil S, Özdemirel NE (2015) An adaptive neighbourhood construction algorithm based on density and connectivity. Pattern Recogn Lett 52:17–24
Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 100(11):1025–1034
Jeon Y, Yoo J, Lee J, Yoon S (2017) Nc-link: a new linkage method for efficient hierarchical clustering of large-scale data. IEEE Access 5:5594–5608
Karypis G, Han E-H, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Lai JZ, Huang T-J (2011) An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list. Inf Sci 181(9):1722–1734
Li H, Liu X, Li T, Gan R (2020) A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recogn 102:1–13
Li J, Huang G, Zhou Y (2020) A sentiment classification approach of sentences clustering in webcast barrages. J Inf Process Syst 16(3):718–732
Li X, Lv J, Yi Z (2018) Outlier detection using structural scores in a high-dimensional space. IEEE Trans Cyberne 50(5):2302–2310
Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Lv X, Ma Y, He X, Huang H, Yang J (2018) CciMST: a clustering algorithm based on minimum spanning tree and cluster centers. Math Probl Eng 2018:1–14
Lv Y, Ma T, Tang M, Cao J, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171:9–22
Lv Y, Liu M, Xiang Y (2020) Fast Searching Density Peak Clustering Algorithm Based on Shared Nearest Neighbor and Adaptive Clustering Center. Symmetry 12(12):2014
Ma Y, Lin H, Wang Y, Huang H, He X (2021) A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint. Inf Sci 557:194–219
Maier M, Hein M, Von Luxburg U (2009) Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theor Comput Sci 410(19):1749–1764
Murtagh F, Contreras P (2017) Algorithms for hierarchical clustering: an overview, II, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 7, no. 6, pp. e1219
Qin Y, Yu ZL, Wang C-D, Gu Z, Li Y (2018) A novel clustering method based on hybrid K-nearest-neighbor graph. Pattern Recogn 74(1):1–14
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Ros F, Guillaume S (2019) Munec: a mutual neighbor-based clustering algorithm. Inf Sci 486:148–170
Ros F, Guillaume S, El Hajji M, Riad R (2020) KdMutual: a novel clustering algorithm combining mutual neighboring and hierarchical approaches using a new selection criterion. Knowl-Based Syst 204:106220
Sarfraz S, Sharma V, Stiefelhagen R (2019) Efficient parameter-free clustering using first neighbor relations, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8934–8943
Vadapalli S, Valluri SR, Karlapalem K (2006) A simple yet effective data clustering algorithm, in Sixth International Conference on Data Mining (ICDM'06), 1108–1112
Xie W-B, Lee Y-L, Wang C, Chen D-B, Zhou T (2020) Hierarchical clustering supported by reciprocal nearest neighbors. Inf Sci 527:279–292
Yang J, Ma Y, Zhang X, Li S, Zhang Y (2017) An initialization method based on hybrid distance for k-means algorithm. Neural Comput 29(11):3094–3117
Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl-Based Syst 133:208–220
Ye H, Lv H, Sun Q (2016) An improved clustering algorithm based on density and shared nearest neighbor, in 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, pp. 37–40
Yu M, Hillebrand A, Tewarie P, Meier J, van Dijk B, Van Mieghem P, Stam CJ (2015) Hierarchical clustering in minimum spanning trees. Chaos: Interdisc J Nonlinear ScI 25(2):023107
Zhong C, Miao D, Wang R (2010) A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recogn 43(3):752–766
Zhong C, Miao D, Fränti P (2011) Minimum spanning tree based split-and-merge: a hierarchical clustering method. Inf Sci 181(16):3397–3410
Zhou Q (2018) Traffic flow data analysis and mining method based on clustering recognition algorithm. Adv Transport Stud 3:101–108
Acknowledgments
This work is partially supported by the National Natural Science Foundation of China (Grant no. 61373004, 61501297).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Y., Ma, Y. & Huang, H. A neighborhood-based three-stage hierarchical clustering algorithm. Multimed Tools Appl 80, 32379–32407 (2021). https://doi.org/10.1007/s11042-021-11171-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11171-w