Skip to main content
Log in

A neighborhood-based three-stage hierarchical clustering algorithm

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Many neighborhood-based clustering algorithms have been proposed to measure the similarity between data points or subclusters with their neighborhood information. However, most of them are vulnerable to the different cluster sizes, shapes and densities. In this paper, we propose a neighborhood-based three-stage hierarchical clustering algorithm (NTHC) which is robust to the difference. Three concepts, i.e., the stability of data point pair, the linked representatives, and the expanded representatives, are defined. Furthermore, a new measure of intercluster distance based on representatives is designed. In Stage 1, the outliers are detected and removed from the data set using reverse nearest neighbors. In Stage 2, small clusters are formed by merging the data points with stable connection on 1-nearest neighbor graph. In Stage 3, the final partitions are obtained by iteratively merging the closest pair of clusters based on the new measure of intercluster distance. Tests are carried out to compare the proposal with 15 other clustering algorithms. The experimental results on synthetic and real data sets demonstrate the proposed method is effective. In addition, we test the statistically significant differences among the sixteen clustering algorithms using the Friedman test. And the average rank value of the proposed algorithm is 4.19, which is superior to the other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://github.com/yian158/clustering-analysis.

References

  1. Abbas M, El-Zoghabi A, Shoukry A (2021) DenMune: density peak based clustering using mutual nearest neighbors. Pattern Recogn 109:107589

    Article  Google Scholar 

  2. Ali A, Zhu Y, Chen Q, Yu J, Cai H (2019) Leveraging spatio-temporal patterns for predicting citywide traffic crowd flows using deep hybrid neural networks. In Proceedings of the 2019 IEEE 25th International Conference on Parallel and Distributed Systems, pp. 125–132.

  3. Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimed Tools Appl 2:1–33

    Google Scholar 

  4. Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60

    Article  Google Scholar 

  5. Asuncion A, Newman D (2007) UCI machine learning repository

  6. Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles, in Proceedings of the 1990 ACM SIGMOD international conference on Management of data, 322–331

  7. Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm–a survey. Algorithm Eng 9220:81–116

    Article  MathSciNet  Google Scholar 

  8. Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797

    Article  Google Scholar 

  9. Bryant A, Cios K (2018) RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121

    Article  Google Scholar 

  10. Cassisi C, Ferro A, Giugno R, Pigola G, Pulvirenti A (2013) Enhancing density-based clustering: parameter reduction and outlier detection. Inf Syst 38(3):317–330

    Article  Google Scholar 

  11. Chang H, Yeung D-Y (2008) Robust path-based spectral clustering. Pattern Recogn 41(1):191–203

    Article  MATH  Google Scholar 

  12. Chen Y, Zhou L, Tang Y, Singh JP, Bouguila N, Wang C, Wang H, Du J (2019) Fast neighbor search by using revised kd tree. Inf Sci 472:145–162

    Article  MATH  Google Scholar 

  13. Chowdhary CL, Acharjya D (2016) A hybrid scheme for breast cancer detection using intuitionistic fuzzy rough set technique. Intl J Healthcare Inf Syst Inf (IJHISI) 11(2):38–61

    Article  Google Scholar 

  14. Chowdhary CL, Acharjya D (2017) Clustering algorithm in possibilistic exponential fuzzy c-mean segmenting medical images. J Biomim, Biomater Biomed Eng 30:12–23

    Google Scholar 

  15. Chowdhary CL, Acharjya D (2017) Segmentation of mammograms using a novel intuitionistic possibilistic fuzzy c-mean clustering algorithm, Nature Inspired Computing, vol. 652, pp. 75–82: Springer

  16. Chowdhary CL, Acharjya D (2020) Segmentation and feature extraction in medical imaging: a systematic review. Procedia Computer Science 167:26–36

    Article  Google Scholar 

  17. Chowdhary CL, Sai GVK, Acharjya D (2016) Decrease in false assumption for detection using digital mammography, Computational Intelligence in Data Mining—Volume 2, pp. 325–333: Springer

  18. Dahal S (2015) Effect of different distance measures in result of cluster analysis

  19. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  MATH  Google Scholar 

  20. Ding S, Xu X, Fan S, Xue Y (2018) Locally adaptive multiple kernel k-means algorithm based on shared nearest neighbors. Soft Comput 22(14):4573–4583

    Article  Google Scholar 

  21. Ding S, Cong L, Hu Q, Jia H, Shi Z (2019) A multiway p-spectral clustering algorithm. Knowl-Based Syst 164:371–377

    Article  Google Scholar 

  22. Dolatshah M, Hadian A, Minaei-Bidgoli B (2015) Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search in metric spaces, Computer Science, arXiv preprint arXiv:1511.00628

  23. Dong S (2021) Multi class SVM algorithm with active learning for network traffic classification. Expert Syst Appl 176:114885

    Article  Google Scholar 

  24. Dong S, Zhou D, Ding W, Gong J (2013) Flow cluster algorithm based on improved K-means method. IETE J Res 59(4):326–333

    Article  Google Scholar 

  25. Dong S, Zhang X, Li Y (2018) Microblog sentiment analysis method based on spectral clustering. J Inf Process Syst 14(3):727–739

    Google Scholar 

  26. Dong S, Wang P, Abbas K (2021) A survey on deep learning and its applications. Comput Sci Rev 40:100379

    Article  MathSciNet  Google Scholar 

  27. Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145

    Article  Google Scholar 

  28. Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data, in Proceedings of the 2003 SIAM international conference on data mining, 47–58

  29. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise, AAAI press

  30. Fan J-c, Jia P-l, Ge L (2019) M k-NN G-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph. Int J Mach Learn Cybern 11(6):1–17

    Google Scholar 

  31. Franti P, Virmajoki O, Hautamaki V (2006) Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans Pattern Anal Mach Intell 28(11):1875–1881

    Article  Google Scholar 

  32. Gowda KC, Krishna G (1978) Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern Recogn 10(2):105–112

    Article  MATH  Google Scholar 

  33. Güngör E, Özmen A (2017) Distance and density based clustering algorithm using Gaussian kernel. Expert Syst Appl 69:10–20

    Article  Google Scholar 

  34. Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108

    MATH  Google Scholar 

  35. İnkaya T (2015) A parameter-free similarity graph for spectral clustering. Expert Syst Appl 42(24):9489–9498

    Article  Google Scholar 

  36. İnkaya T, Kayalıgil S, Özdemirel NE (2015) An adaptive neighbourhood construction algorithm based on density and connectivity. Pattern Recogn Lett 52:17–24

    Article  Google Scholar 

  37. Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 100(11):1025–1034

    Article  Google Scholar 

  38. Jeon Y, Yoo J, Lee J, Yoon S (2017) Nc-link: a new linkage method for efficient hierarchical clustering of large-scale data. IEEE Access 5:5594–5608

    Google Scholar 

  39. Karypis G, Han E-H, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75

    Article  Google Scholar 

  40. Lai JZ, Huang T-J (2011) An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list. Inf Sci 181(9):1722–1734

    Article  Google Scholar 

  41. Li H, Liu X, Li T, Gan R (2020) A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recogn 102:1–13

    Article  Google Scholar 

  42. Li J, Huang G, Zhou Y (2020) A sentiment classification approach of sentences clustering in webcast barrages. J Inf Process Syst 16(3):718–732

    Google Scholar 

  43. Li X, Lv J, Yi Z (2018) Outlier detection using structural scores in a high-dimensional space. IEEE Trans Cyberne 50(5):2302–2310

    Article  Google Scholar 

  44. Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226

    Article  MathSciNet  Google Scholar 

  45. Lv X, Ma Y, He X, Huang H, Yang J (2018) CciMST: a clustering algorithm based on minimum spanning tree and cluster centers. Math Probl Eng 2018:1–14

    MathSciNet  MATH  Google Scholar 

  46. Lv Y, Ma T, Tang M, Cao J, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2016) An efficient and scalable density-based clustering algorithm for datasets with complex structures. Neurocomputing 171:9–22

    Article  Google Scholar 

  47. Lv Y, Liu M, Xiang Y (2020) Fast Searching Density Peak Clustering Algorithm Based on Shared Nearest Neighbor and Adaptive Clustering Center. Symmetry 12(12):2014

    Article  Google Scholar 

  48. Ma Y, Lin H, Wang Y, Huang H, He X (2021) A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint. Inf Sci 557:194–219

    Article  MathSciNet  Google Scholar 

  49. Maier M, Hein M, Von Luxburg U (2009) Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theor Comput Sci 410(19):1749–1764

    Article  MathSciNet  MATH  Google Scholar 

  50. Murtagh F, Contreras P (2017) Algorithms for hierarchical clustering: an overview, II, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 7, no. 6, pp. e1219

  51. Qin Y, Yu ZL, Wang C-D, Gu Z, Li Y (2018) A novel clustering method based on hybrid K-nearest-neighbor graph. Pattern Recogn 74(1):1–14

    Article  Google Scholar 

  52. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  53. Ros F, Guillaume S (2019) Munec: a mutual neighbor-based clustering algorithm. Inf Sci 486:148–170

    Article  MathSciNet  Google Scholar 

  54. Ros F, Guillaume S, El Hajji M, Riad R (2020) KdMutual: a novel clustering algorithm combining mutual neighboring and hierarchical approaches using a new selection criterion. Knowl-Based Syst 204:106220

    Article  Google Scholar 

  55. Sarfraz S, Sharma V, Stiefelhagen R (2019) Efficient parameter-free clustering using first neighbor relations, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8934–8943

  56. Vadapalli S, Valluri SR, Karlapalem K (2006) A simple yet effective data clustering algorithm, in Sixth International Conference on Data Mining (ICDM'06), 1108–1112

  57. Xie W-B, Lee Y-L, Wang C, Chen D-B, Zhou T (2020) Hierarchical clustering supported by reciprocal nearest neighbors. Inf Sci 527:279–292

    Article  MathSciNet  Google Scholar 

  58. Yang J, Ma Y, Zhang X, Li S, Zhang Y (2017) An initialization method based on hybrid distance for k-means algorithm. Neural Comput 29(11):3094–3117

    Article  MathSciNet  MATH  Google Scholar 

  59. Yaohui L, Zhengming M, Fang Y (2017) Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowl-Based Syst 133:208–220

    Article  Google Scholar 

  60. Ye H, Lv H, Sun Q (2016) An improved clustering algorithm based on density and shared nearest neighbor, in 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, pp. 37–40

  61. Yu M, Hillebrand A, Tewarie P, Meier J, van Dijk B, Van Mieghem P, Stam CJ (2015) Hierarchical clustering in minimum spanning trees. Chaos: Interdisc J Nonlinear ScI 25(2):023107

    Article  MathSciNet  Google Scholar 

  62. Zhong C, Miao D, Wang R (2010) A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recogn 43(3):752–766

    Article  MATH  Google Scholar 

  63. Zhong C, Miao D, Fränti P (2011) Minimum spanning tree based split-and-merge: a hierarchical clustering method. Inf Sci 181(16):3397–3410

    Article  Google Scholar 

  64. Zhou Q (2018) Traffic flow data analysis and mining method based on clustering recognition algorithm. Adv Transport Stud 3:101–108

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (Grant no. 61373004, 61501297).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Ma.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Ma, Y. & Huang, H. A neighborhood-based three-stage hierarchical clustering algorithm. Multimed Tools Appl 80, 32379–32407 (2021). https://doi.org/10.1007/s11042-021-11171-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11171-w

Keywords

Navigation