Abstract
Data clustering is a commonly used data processing technique in many fields, which divides objects into different clusters in terms of some similarity measure between data points. Comparing to partitioning clustering methods which give a flat partition of the data, hierarchical clustering methods can give multiple consistent partitions of the data at different levels for the same data without rerunning clustering, it can be used to better analyze the complex structure of the data. There are usually two kinds of hierarchical clustering methods: divisive and agglomerative. For the divisive clustering, the key issue is how to select a cluster for the next splitting procedure according to dissimilarity and how to divide the selected cluster. For agglomerative hierarchical clustering, the key issue is the similarity measure that is used to select the two most similar clusters for the next merge. Although both types of the methods produce the dendrogram of the data as output, the clustering results may be very different depending on the dissimilarity or similarity measure used in the clustering, and different types of methods should be selected according to different types of the data and different application scenarios. So, we have reviewed various hierarchical clustering methods comprehensively, especially the most recently developed methods, in this work. The similarity measure plays a crucial role during hierarchical clustering process, we have reviewed different types of the similarity measure along with the hierarchical clustering. More specifically, different types of hierarchical clustering methods are comprehensively reviewed from six aspects, and their advantages and drawbacks are analyzed. The application of some methods in real life is also discussed. Furthermore, we have also included some recent works in combining deep learning techniques and hierarchical clustering, which is worth serious attention and may improve the hierarchical clustering significantly in the future.


Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Abdi H, Valentin D (2007) Multiple correspondence analysis. Encycl Meas Stat 2(4):651–657
Ackerman M, Dasgupta S (2014) Incremental clustering: the case for extra clusters. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, December 8–13 2014, Montreal, QC. pp 307–315
Agarwal S, Lim J, Zelnik-Manor L et al (2005) Beyond pairwise clustering. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), 20–26 June 2005, San Diego, CA. IEEE Computer Society, pp 838–845. https://doi.org/10.1109/CVPR.2005.89
Agha GA (1990) ACTORS—a model of concurrent computation in distributed systems. MIT Press series in artificial intelligence. MIT Press, Cambridge
Alalyan F, Zamzami N, Bouguila N (2019) Model-based hierarchical clustering for categorical data. In: 28th IEEE international symposium on industrial electronics, ISIE 2019, Vancouver, BC, June 12–14, 2019. IEEE, pp 1424–1429. https://doi.org/10.1109/ISIE.2019.8781307
Altinigneli MC, Miklautz L, Böhm C et al (2020) Hierarchical quick shift guided recurrent clustering. In: 2020 IEEE 36th international conference on data engineering (ICDE). pp 1842–1845. https://doi.org/10.1109/ICDE48307.2020.00184
Alzate C, Suykens JA (2012) Hierarchical kernel spectral clustering. Neural Netw 35(C):21–30. https://doi.org/10.1016/j.neunet.2012.06.007
Anderberg MR (1973) Chapter 6–hierarchical clustering methods, probability and mathematical statistics: a series of monographs and textbooks, vol 19. Academic Press, Cambridge. https://doi.org/10.1016/B978-0-12-057650-0.50012-0
Averbuch-Elor H, Bar N, Cohen-Or D (2020) Border-peeling clustering. IEEE Trans Pattern Anal Mach Intell 42(7):1791–1797. https://doi.org/10.1109/TPAMI.2019.2924953
Barton T, Bruna T, Kordík P (2016) Mocham: robust hierarchical clustering based on multi-objective optimization. In: IEEE international conference on data mining workshops, ICDM workshops 2016, December 12–15, 2016, Barcelona. IEEE Computer Society, pp 831–838. https://doi.org/10.1109/ICDMW.2016.0123
Barton T, Bruna T, Kordík P (2019) Chameleon 2: an improved graph-based clustering algorithm. ACM Trans Knowl Discov Data 13(1):10. https://doi.org/10.1145/3299876
Batagelj V (1988) Generalized ward and related clustering problems. Classification and related methods of data analysis. Jun: 67–74
Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas CK, Teboulle M (eds) Grouping multidimensional data–recent advances in clustering. Springer, Berlin, pp 25–71. https://doi.org/10.1007/3-540-28349-8_2
Boley D (1998) Principal direction divisive partitioning. Data Min Knowl Disc 2(4):325–344
Bouguettaya A, Yu Q, Liu X et al (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797
Brans JP, Vincke P (1985) Note—a preference ranking organisation method: (the PROMETHEE method for multiple criteria decision-making). Manag Sci 31(6):647–656
Brans JP, Vincke P, Mareschal B (1986) How to select and how to rank projects: the PROMETHEE method. Eur J Oper Res 24(2):228–238
Cai D, Chen X (2015) Large scale spectral clustering via landmark-based sparse representation. IEEE Trans Cybern 45(8):1669–1680. https://doi.org/10.1109/TCYB.2014.2358564
Cai Q, Liu J (2020) Hierarchical clustering of bipartite networks based on multiobjective optimization. IEEE Trans Netw Sci Eng 7(1):421–434. https://doi.org/10.1109/TNSE.2018.2830822
Cai Y, Sun Y (2011) ESPRIT-tree: hierarchical clustering analysis of millions of 16s rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res 39(14):e95–e95. https://doi.org/10.1093/nar/gkr349
Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng VS, Cao L et al (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 160–172
Cao X, Su T, Wang P et al (2018) An optimized chameleon algorithm based on local features. In: Proceedings of the 10th international conference on machine learning and computing, ICMLC 2018, Macau, February 26–28, 2018. ACM, pp 184–192
Carpineto C, Romano G (1996) A lattice conceptual clustering system and its application to browsing retrieval. Mach Learn 24(2):95–122. https://doi.org/10.1007/BF00058654
Chakraborty S, Paul D, Das S (2020) Hierarchical clustering with optimal transport. Stat Probab Lett 163(108):781. https://doi.org/10.1016/j.spl.2020.108781
Chen D, Cui DW, Wang CX et al (2006) A rough set-based hierarchical clustering algorithm for categorical data. Int J Inf Technol 12(3):149–159
Chen W, Song Y, Bai H et al (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33(3):568–586. https://doi.org/10.1109/TPAMI.2010.88
Cheng Q, Liu Z, Huang J et al (2012) Hierarchical clustering based on hyper-edge similarity for community detection. In: 2012 IEEE/WIC/ACM international conferences on web intelligence, WI 2012, Macau, December 4–7, 2012. IEEE Computer Society, pp 238–242. https://doi.org/10.1109/WI-IAT.2012.9
Cheng D, Zhu Q, Huang J et al (2019a) A hierarchical clustering algorithm based on noise removal. Int J Mach Learn Cybern 10(7):1591–1602. https://doi.org/10.1007/s13042-018-0836-3
Cheng D, Zhu Q, Huang J et al (2019b) A local cores-based hierarchical clustering algorithm for data sets with complex structures. Neural Comput Appl 31(11):8051–8068. https://doi.org/10.1007/s00521-018-3641-8
Cherng J, Lo M (2001) A hypergraph based clustering algorithm for spatial data sets. In: Proceedings of the 2001 IEEE international conference on data mining, 29 November–2 December 2001, San Jose, CA. IEEE Computer Society, pp 83–90. https://doi.org/10.1109/ICDM.2001.989504
Cheung Y, Zhang Y (2019) Fast and accurate hierarchical clustering based on growing multilayer topology training. IEEE Trans Neural Netw Learn Syst 30(3):876–890. https://doi.org/10.1109/TNNLS.2018.2853407
Cho M, Lee J, Lee KM (2009) Feature correspondence and deformable object matching via agglomerative correspondence clustering. In: IEEE 12th international conference on computer vision, ICCV 2009, Kyoto, September 27–October 4, 2009. IEEE Computer Society, pp 1280–1287. https://doi.org/10.1109/ICCV.2009.5459322
Courty N, Flamary R, Tuia D et al (2017) Optimal transport for domain adaptation. IEEE Trans Pattern Anal Mach Intell 39(9):1853–1865. https://doi.org/10.1109/TPAMI.2016.2615921
Day WH, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1(1):7–24
Dixit V (2022) GCFI++: embedding and frequent itemset based incremental hierarchical clustering with labels and outliers. In: CODS-COMAD 2022: 5th joint international conference on data science & management of data (9th ACM IKDD CODS and 27th COMAD), Bangalore, January 8–10, 2022. ACM, pp 135–143. https://doi.org/10.1145/3493700.3493727
Dong Y, Wang Y, Jiang K (2018) Improvement of partitioning and merging phase in chameleon clustering algorithm. In: 2018 3rd international conference on computer and communication systems (ICCCS). IEEE, pp 29–32
Duran BS, Odell PL (2013) Cluster analysis: a survey, vol 100. Springer Science & Business Media, Berlin
D’Urso P, Vitale V (2020) A robust hierarchical clustering for georeferenced data. Spat Stat 35(100):407. https://doi.org/10.1016/j.spasta.2020.100407
Endo Y, Haruyama H, Okubo T (2004) On some hierarchical clustering algorithms using kernel functions. In: 2004 IEEE international conference on fuzzy systems (IEEE Cat. No.04CH37542), vol 3. pp 1513–1518. https://doi.org/10.1109/FUZZY.2004.1375399
Estivill-Castro V, Lee I (2000) AMOEBA: hierarchical clustering based on spatial proximity using delaunay diagram. In: Proceedings of the 9th international symposium on spatial data handling. Beijing, pp 1–16
Everitt B, Landau S, Leese M (2001) Cluster analysis. Arnold, London
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172. https://doi.org/10.1007/BF00114265
Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769
Fouedjio F (2016) A hierarchical clustering method for multivariate geostatistical data. Spat Stat 18:333–351. https://doi.org/10.1016/j.spasta.2016.07.003
Fränti P, Virmajoki O, Hautamäki V (2006) Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans Pattern Anal Mach Intell 28(11):1875–1881. https://doi.org/10.1109/TPAMI.2006.227
Frigui H, Krishnapuram R (1999) A robust competitive clustering algorithm with applications in computer vision. IEEE Trans Pattern Anal Mach Intell 21(5):450–465. https://doi.org/10.1109/34.765656
Galdino SML, Maciel PRM (2019) Hierarchical cluster analysis of interval-valued data using width of range Euclidean distance. In: IEEE Latin American conference on computational intelligence, LA-CCI 2019, Guayaquil, Ecuador, November 11–15, 2019. IEEE, pp 1–6. https://doi.org/10.1109/LA-CCI47412.2019.9036754
Geng H, Ali HH (2005) A new clustering strategy with stochastic merging and removing based on kernel functions. In: Fourth international IEEE computer society computational systems bioinformatics conference workshops & poster abstracts, CSB 2005 workshops, Stanford, CA, August 8–11, 2005. IEEE Computer Society, pp 41–42. https://doi.org/10.1109/CSBW.2005.10
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826
Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
Govindu VM (2005) A tensor decomposition for geometric grouping and segmentation. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR 2005), 20–26 June 2005, San Diego, CA. IEEE Computer Society, pp 1150–1157. https://doi.org/10.1109/CVPR.2005.50
Gracia C, Binefa X (2011) On hierarchical clustering for speech phonetic segmentation. In: Proceedings of the 19th European signal processing conference, EUSIPCO 2011, Barcelona, August 29–September 2, 2011. IEEE, pp 2128–2132
Guan X, Du L (1998) Domain identification by clustering sequence alignments. Bioinformatics 14(9):783–788. https://doi.org/10.1093/bioinformatics/14.9.783
Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. In: Haas LM, Tiwary A (eds) SIGMOD 1998, proceedings ACM SIGMOD international conference on management of data, June 2–4, 1998, Seattle, Washington. ACM Press, pp 73–84. https://doi.org/10.1145/276304.276312
Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366. https://doi.org/10.1016/S0306-4379(00)00022-3
Gullo F, Ponti G, Tagarelli A et al (2008) A hierarchical algorithm for clustering uncertain data via an information-theoretic approach. In: Proceedings of the 8th IEEE international conference on data mining (ICDM 2008), December 15–19, 2008, Pisa. IEEE Computer Society, pp 821–826. https://doi.org/10.1109/ICDM.2008.115
Guo JF, Zhao YY, Li J (2007) A multi-relational hierarchical clustering algorithm based on shared nearest neighbor similarity. In: 2007 international conference on machine learning and cybernetics. IEEE, pp 3951–3955. https://doi.org/10.1109/ICMLC.2007.4370836
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2–3):107–145. https://doi.org/10.1023/A:1012801612483
Han J, Kamber M, Pei J (2011) Data mining concepts and techniques: third edition. Morgan Kaufmann Ser Data Manag Syst 5(4):83–124
Han X, Zhu Y, Ting KM et al (2022) Streaming hierarchical clustering based on point-set kernel. In: KDD ’22: the 28th ACM SIGKDD conference on knowledge discovery and data mining, Washington, DC, August 14–18, 2022. ACM, pp 525–533. https://doi.org/10.1145/3534678.3539323
He Z, Xu X, Deng S (2002) Squeezer: an efficient algorithm for clustering categorical data. J Comput Sci Technol 17(5):611–624. https://doi.org/10.1007/BF02948829
He L, Ray N, Guan Y et al (2019) Fast large-scale spectral clustering via explicit feature mapping. IEEE Trans Cybern 49(3):1058–1071. https://doi.org/10.1109/TCYB.2018.2794998
Heller KA, Ghahramani Z (2005) Bayesian hierarchical clustering. In: Raedt LD, Wrobel S (eds) Machine learning, proceedings of the twenty-second international conference (ICML 2005), Bonn, August 7–11, 2005, ACM international conference proceeding series, vol 119. ACM, pp 297–304. https://doi.org/10.1145/1102351.1102389
Huang D, Wang C, Wu J et al (2020) Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng 32(6):1212–1226. https://doi.org/10.1109/TKDE.2019.2903410
Hubert L (1973) Monotone invariant clustering procedures. Psychometrika 38(1):47–62
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Hulot A, Chiquet J, Jaffrézic F et al (2020) Fast tree aggregation for consensus hierarchical clustering. BMC Bioinform 21(1):120. https://doi.org/10.1186/s12859-020-3453-6
Ishizaka A, Lokman B, Tasiou M (2020) A stochastic multi-criteria divisive hierarchical clustering algorithm. Omega 103:102370
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Hoboken
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37. https://doi.org/10.1109/34.824819
Jalalat-evakilkandi M, Mirzaei A (2010) A new hierarchical-clustering combination scheme based on scatter matrices and nearest neighbor criterion. In: 2010 5th international symposium on telecommunications, IEEE, pp 904–908. https://doi.org/10.1109/ISTEL.2010.5734151
Jambu M, Tan SH, Stern D (1989) Exploration informatique et statistique des données. Dunod, Paris
Jeantet I, Miklós Z, Gross-Amblard D (2020) Overlapping hierarchical clustering (OHC). In: Advances in intelligent data analysis XVIII—18th international symposium on intelligent data analysis, IDA 2020, Konstanz, April 27–29, 2020, proceedings, lecture notes in computer science, vol 12080. Springer, pp 261–273. https://doi.org/10.1007/978-3-030-44584-3_21
Jeon Y, Yoon S (2015) Multi-threaded hierarchical clustering by parallel nearest-neighbor chaining. IEEE Trans Parallel Distrib Syst 26(9):2534–2548. https://doi.org/10.1109/TPDS.2014.2355205
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Judd D, McKinley PK, Jain AK (1998) Large-scale parallel data clustering. IEEE Trans Pattern Anal Mach Intell 20(8):871–876. https://doi.org/10.1109/34.709614
Karypis G, Aggarwal R, Kumar V et al (1999a) Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans Very Large Scale Integr Syst 7(1):69–79. https://doi.org/10.1109/92.748202
Karypis G, Han E, Kumar V (1999b) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75. https://doi.org/10.1109/2.781637
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken. https://doi.org/10.1002/9780470316801
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, Hoboken
Kaur PJ et al (2015) Cluster quality based performance evaluation of hierarchical clustering method. In: 2015 1st international conference on next generation computing technologies (NGCT). IEEE, pp 649–653
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(2):291–307. https://doi.org/10.1002/j.1538-7305.1970.tb01770.x
Kobren A, Monath N, Krishnamurthy A et al (2017) A hierarchical algorithm for extreme clustering. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, Halifax, NS, August 13–17, 2017. ACM, pp 255–264. https://doi.org/10.1145/3097983.3098079
Kohonen T (2001) Self-organizing maps, third edition. Springer series in information sciences. Springer, Cham. https://doi.org/10.1007/978-3-642-56927-2
Kotsiantis S, Pintelas P (2004) Recent advances in clustering: a brief survey. WSEAS Trans Inf Sci Appl 1(1):73–81
Kumar P, Tripathy B (2009) MMeR: an algorithm for clustering heterogeneous data using rough set theory. Int J Rapid Manuf 1(2):189–207
Kumar T, Vaidyanathan S, Ananthapadmanabhan H et al (2018) Hypergraph clustering: a modularity maximization approach. CoRR. arXiv:abs/1812.10869
Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies: 1. Hierarchical systems. Comput J 9(4):373–380. https://doi.org/10.1093/comjnl/9.4.373
Lerato L, Niesler T (2015) Clustering acoustic segments using multi-stage agglomerative hierarchical clustering. PLoS ONE 10(10):e0141756
Lewis-Beck M, Bryman AE, Liao TF (2003) The Sage encyclopedia of social science research methods. Sage Publications, Thousand Oaks
Li M, Deng S, Wang L et al (2014) Hierarchical clustering algorithm for categorical data using a probabilistic rough set model. Knowl Based Syst 65:60–71. https://doi.org/10.1016/j.knosys.2014.04.008
Li S, Li W, Qiu J (2017) A novel divisive hierarchical clustering algorithm for geospatial analysis. ISPRS Int J Geo-Inf 6(1):30. https://doi.org/10.3390/ijgi6010030
Li Y, Hong Z, Feng W et al (2019) A hierarchical clustering based feature word extraction method. In: 2019 IEEE 3rd advanced information management, communicates, electronic and automation control conference (IMCEC). IEEE, pp 883–887
Lin Y, Dong X, Zheng L et al (2019) A bottom-up clustering approach to unsupervised person re-identification. In: The thirty-third AAAI conference on artificial intelligence, AAAI 2019, the thirty-first innovative applications of artificial intelligence conference, IAAI 2019, the ninth AAAI symposium on educational advances in artificial intelligence, EAAI 2019, Honolulu, Hawaii, January 27–February 1, 2019. AAAI Press, pp 8738–8745. https://doi.org/10.1609/aaai.v33i01.33018738
Liu H, Latecki LJ, Yan S (2015) Dense subgraph partition of positive hypergraphs. IEEE Trans Pattern Anal Mach Intell 37(3):541–554. https://doi.org/10.1109/TPAMI.2014.2346173
Liu J, Liu X, Yang Y et al (2021) Hierarchical multiple kernel clustering. In: Thirty-fifth AAAI conference on artificial intelligence. AAAI, pp 2–9
Lu Y, Wan Y (2013) PHA: a fast potential-based hierarchical agglomerative clustering method. Pattern Recognit 46(5):1227–1239. https://doi.org/10.1016/j.patcog.2012.11.017
Lu Y, Hou X, Chen X (2016) A novel travel-time based similarity measure for hierarchical clustering. Neurocomputing 173:3–8. https://doi.org/10.1016/j.neucom.2015.01.090
Ma X, Dhavala S (2018) Hierarchical clustering with prior knowledge. CoRR. arXiv:abs/1806.03432
Macnaughton-Smith P, Williams W, Dale M et al (1964) Dissimilarity analysis: a new technique of hierarchical sub-division. Nature 202(4936):1034–1035
Mao Q, Zheng W, Wang L et al (2015) Parallel hierarchical clustering in linearithmic time for large-scale sequence analysis. In: 2015 IEEE international conference on data mining, ICDM 2015, Atlantic City, NJ, November 14–17, 2015. IEEE Computer Society, pp 310–319. https://doi.org/10.1109/ICDM.2015.90
Monath N, Kobren A, Krishnamurthy A et al (2019) Scalable hierarchical clustering with tree grafting. pp 1438–1448. https://doi.org/10.1145/3292500.3330929
Monath N, Dubey KA, Guruganesh G et al (2021) Scalable hierarchical agglomerative clustering. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. Association for Computing Machinery, New York, NY, KDD ’21, pp 1245–1255. https://doi.org/10.1145/3447548.3467404
Muhr M, Sabol V, Granitzer M (2010) Scalable recursive top-down hierarchical clustering approach with implicit model selection for textual data sets. In: Database and expert systems applications, DEXA, international workshops, Bilbao, August 30–September 3, 2010. IEEE Computer Society, pp 15–19. https://doi.org/10.1109/DEXA.2010.25
Mulinka P, Casas P, Fukuda K et al (2020) HUMAN—hierarchical clustering for unsupervised anomaly detection & interpretation. In: 11th international conference on network of the future, NoF 2020, Bordeaux, October 12–14, 2020. IEEE, pp 132–140. https://doi.org/10.1109/NoF50125.2020.9249194
Müllner D (2011) Modern hierarchical, agglomerative clustering algorithms. CoRR. arXiv:abs/1109.2378
Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359. https://doi.org/10.1093/comjnl/26.4.354
Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 2(1):86–97. https://doi.org/10.1002/widm.53
Murtagh F, Contreras P (2017) Algorithms for hierarchical clustering: an overview, II. Wiley Interdiscip Rev Data Min Knowl Discov. https://doi.org/10.1002/widm.1219
Murtagh F, Legendre P (2014) Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J Classif 31(3):274–295. https://doi.org/10.1007/s00357-014-9161-z
Myers C, Rabiner L, Rosenberg A (1980) Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans Acoust Speech Signal Process 28(6):623–635
Narita K, Hochin T, Nomiya H (2018) Incremental clustering for hierarchical clustering. In: 5th international conference on computational science/intelligence and applied informatics, CSII 2018, Yonago, July 10–12, 2018. IEEE, pp 102–107. https://doi.org/10.1109/CSII.2018.00025
Narita K, Hochin T, Hayashi Y et al (2020) Incremental hierarchical clustering for data insertion and its evaluation. Int J Softw Innov 8(2):1–22. https://doi.org/10.4018/IJSI.2020040101
Nasiriani N, Squicciarini AC, Saldanha Z et al (2019) Hierarchical clustering for discrimination discovery: a top-down approach. In: 2nd IEEE international conference on artificial intelligence and knowledge engineering, AIKE 2019, Sardinia, June 3–5, 2019. IEEE, pp 187–194. https://doi.org/10.1109/AIKE.2019.00041
Nazari Z, Kang D (2018) A new hierarchical clustering algorithm with intersection points. In: 2018 5th IEEE Uttar Pradesh section international conference on electrical, electronics and computer engineering (UPCON). IEEE, pp 1–5
Neto ACA, Sander J, Campello RJGB et al (2021) Efficient computation and visualization of multiple density-based clustering hierarchies. IEEE Trans Knowl Data Eng 33(8):3075–3089. https://doi.org/10.1109/TKDE.2019.2962412
Nikpour S, Asadi S (2022) A dynamic hierarchical incremental learning-based supervised clustering for data stream with considering concept drift. J Ambient Intell Humaniz Comput 13(6):2983–3003. https://doi.org/10.1007/s12652-021-03673-0
Núñez-Valdéz ER, Solanki VK, Balakrishna S et al (2020) Incremental hierarchical clustering driven automatic annotations for unifying IoT streaming data. Int J Interact Multim Artif Intell 6(2):1–15. https://doi.org/10.9781/ijimai.2020.03.001
Omran MGH, Engelbrecht AP, Salman AA (2007) An overview of clustering methods. Intell Data Anal 11(6):583–605
Pang N, Zhang J, Zhang C et al (2019) Parallel hierarchical subspace clustering of categorical data. IEEE Trans Comput 68(4):542–555. https://doi.org/10.1109/TC.2018.2879332
Parmar D, Wu T, Blackhurst J (2007) MMR: an algorithm for clustering categorical data using rough set theory. Data Knowl Eng 63(3):879–893. https://doi.org/10.1016/j.datak.2007.05.005
Qin J, Lewis DP, Noble WS (2003) Kernel hierarchical gene clustering from microarray expression data. Bioinformatics 19(16):2097–2104. https://doi.org/10.1093/bioinformatics/btg288
Qin H, Ma X, Herawan T et al (2014) MGR: an information theory based hierarchical divisive clustering algorithm for categorical data. Knowl Based Syst 67:401–411. https://doi.org/10.1016/j.knosys.2014.03.013
Rabin J, Ferradans S, Papadakis N (2014) Adaptive color transfer with relaxed optimal transport. In: 2014 IEEE international conference on image processing, ICIP 2014, Paris, October 27–30, 2014. IEEE, pp 4852–4856. https://doi.org/10.1109/ICIP.2014.7025983
Rahman MA, Rahman MM, Mollah MNH et al (2018) Robust hierarchical clustering for metabolomics data analysis in presence of cell-wise and case-wise outliers. In: 2018 international conference on computer, communication, chemical, material and electronic engineering (IC4ME2). IEEE, pp 1–4. https://doi.org/10.1109/IC4ME2.2018.8465616
Reddy CK, Vinzamuri B (2013) A survey of partitional and hierarchical clustering algorithms. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. CRC Press, Boca Raton, pp 87–110
Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26(2):195–239
Rocha C, Dias LC (2013) MPOC: an agglomerative algorithm for multicriteria partially ordered clustering. 4OR 11(3):253–273. https://doi.org/10.1007/s10288-013-0228-1
Ros F, Guillaume S (2019) A hierarchical clustering algorithm and an improvement of the single linkage criterion to deal with noise. Expert Syst Appl 128:96–108
Ros F, Guillaume S, Hajji ME et al (2020) KdMutual: a novel clustering algorithm combining mutual neighboring and hierarchical approaches using a new selection criterion. Knowl Based Syst 204(106):220. https://doi.org/10.1016/j.knosys.2020.106220
Roux M (2018) A comparative study of divisive and agglomerative hierarchical clustering algorithms. J Classif 35(2):345–366. https://doi.org/10.1007/s00357-018-9259-9
Sabarish B, Karthi R, Kumar TG (2020) Graph similarity-based hierarchical clustering of trajectory data. Procedia Comput Sci 171:32–41. https://doi.org/10.1016/j.procs.2020.04.004
Sahoo N, Callan J, Krishnan R et al (2006) Incremental hierarchical clustering of text documents. In: Proceedings of the 2006 ACM CIKM international conference on information and knowledge management, Arlington, VA, November 6–11, 2006. ACM, pp 357–366. https://doi.org/10.1145/1183614.1183667
Salton G (1975) A vector space model for information retrieval. J ASIS 18(11): 613–620
Sander J, Qin X, Lu Z et al (2003) Automatic extraction of clusters from hierarchical clustering representations. In: Whang KY, Jeon J, Shim K et al (eds) Advances in knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 75–87
Saunders A, Ashlock DA, Houghten SK (2018) Hierarchical clustering and tree stability. In: 2018 IEEE conference on computational intelligence in bioinformatics and computational biology, CIBCB 2018, Saint Louis, MO, May 30–June 2, 2018. IEEE, pp 1–8. https://doi.org/10.1109/CIBCB.2018.8404978
Sharan R, Shamir R (2000) Center CLICK: a clustering algorithm with applications to gene expression analysis. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, August 19–23, 2000, La Jolla/San Diego, CA. AAAI, pp 307–316
Sharma S, Batra N et al (2019) Comparative study of single linkage, complete linkage, and ward method of agglomerative clustering. In: 2019 international conference on machine learning, big data, cloud and parallel computing (COMITCon). IEEE, pp 568–573
Shimizu T, Sakurai K (2018) Comprehensive data tree by actor messaging for incremental hierarchical clustering. In: 2018 IEEE 42nd annual computer software and applications conference, COMPSAC 2018, Tokyo, 23–27 July 2018, vol 1. IEEE Computer Society, pp 801–802. https://doi.org/10.1109/COMPSAC.2018.00127
Sisodia D, Singh L, Sisodia S et al (2012) Clustering techniques: a brief survey of different clustering algorithms. Int J Latest Trends Eng Technol 1(3):82–87
Sneath PH, Sokal RR (1975) Numerical taxonomy. The principles and practice of numerical classification, vol 50. Williams WT published in association with Stony Brook University. https://doi.org/10.1086/408956
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. https://hdl.handle.net/11299/215421, May 23, 2000
Székely GJ, Rizzo ML (2005) Hierarchical clustering via joint between-within distances: extending Ward’s minimum variance method. J Classif 22(2):151–183. https://doi.org/10.1007/s00357-005-0012-9
Takumi S, Miyamoto S (2012) Top-down vs bottom-up methods of linkage for asymmetric agglomerative hierarchical clustering. In: 2012 IEEE international conference on granular computing, GrC 2012, Hangzhou, August 11–13, 2012. IEEE Computer Society, pp 459–464. https://doi.org/10.1109/GrC.2012.6468689
Tan P, Steinbach M, Karpatne A et al (2019) Introduction to data mining, Second Edition. Pearson, Harlow
Toujani R, Akaichi J (2018) GHHP: genetic hybrid hierarchical partitioning for community structure in social medias networks. In: 2018 IEEE smartWorld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation, SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI 2018, Guangzhou, October 8–12, 2018. IEEE, pp 1146–1153. https://doi.org/10.1109/SmartWorld.2018.00199
Tripathy B, Ghosh A (2011a) SDR: an algorithm for clustering categorical data using rough set theory. In: 2011 IEEE recent advances in intelligent computational systems. IEEE, pp 867–872
Tripathy B, Ghosh A (2011b) SSDR: an algorithm for clustering categorical data using rough set theory. Adv Appl Sci Res 2(3):314–326
Tripathy B, Goyal A, Chowdhury R et al (2017) MMeMeR: an algorithm for clustering heterogeneous data using rough set theory. Int J Intell Syst Appl 9(8):25
Tsekouras G, Kotoulas P, Tsirekis C et al (2008) A pattern recognition methodology for evaluation of load profiles and typical days of large electricity customers. Electr Power Syst Res 78(9):1494–1510
Turi R (2001) Clustering-based colour image segmentation. PhD Thesis, Monash University
Varshney AK, Muhuri PK, Lohani QMD (2022) PIFHC: the probabilistic intuitionistic fuzzy hierarchical clustering algorithm. Appl Soft Comput 120(108):584. https://doi.org/10.1016/j.asoc.2022.108584
Veldt N, Benson AR, Kleinberg JM (2020) Localized flow-based clustering in hypergraphs. CoRR. arXiv:abs/2002.09441
Vidal E, Granitto PM, Bayá A (2014) Discussing a new divisive hierarchical clustering algorithm. In: XLIII Jornadas Argentinas de Informática e Investigación Operativa (43JAIIO)-XV Argentine symposium on artificial intelligence (ASAI)(Buenos Aires, 2014)
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416. https://doi.org/10.1007/s11222-007-9033-z
Wang T, Lu Y, Han Y (2017) Clustering of high dimensional handwritten data by an improved hypergraph partition method. In: Intelligent computing methodologies—13th international conference, ICIC 2017, Liverpool, August 7–10, 2017, proceedings, part III, lecture notes in computer science, vol 10363. Springer, pp 323–334. https://doi.org/10.1007/978-3-319-63315-2_28
Wishart D (1969) An algorithm for hierarchical classifications. Biometrics 25:165–170
Xi Y, Lu Y (2020) Multi-stage hierarchical clustering method based on hypergraph. In: Intelligent computing methodologies—16th international conference, ICIC 2020, Bari, October 2–5, 2020, proceedings, part III, lecture notes in computer science, vol 12465. Springer, pp 432–443. https://doi.org/10.1007/978-3-030-60796-8_37
Xiong T, Wang S, Mayers A et al (2012) DHCC: divisive hierarchical clustering of categorical data. Data Min Knowl Discov 24(1):103–135. https://doi.org/10.1007/s10618-011-0221-2
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193
Xu R, Wunsch DC II (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678. https://doi.org/10.1109/TNN.2005.845141
Yamada Y, Masuyama N, Amako N et al (2020) Divisive hierarchical clustering based on adaptive resonance theory. In: International symposium on community-centric systems, CcS 2020, Hachioji, Tokyo, September 23–26, 2020. IEEE, pp 1–6. https://doi.org/10.1109/CcS49175.2020.9231474
Yang J, Grunsky E, Cheng Q (2019) A novel hierarchical clustering analysis method based on Kullback–Leibler divergence and application on dalaimiao geochemical exploration data. Comput Geosci 123:10–19. https://doi.org/10.1016/j.cageo.2018.11.003
Yu F, Dong K, Chen F et al (2007) Clustering time series with granular dynamic time warping method. In: 2007 IEEE international conference on granular computing, GrC 2007, San Jose, CA, 2–4 November 2007. IEEE Computer Society, pp 393–398. https://doi.org/10.1109/GrC.2007.34
Yu M, Hillebrand A, Tewarie P et al (2015) Hierarchical clustering in minimum spanning trees. Chaos 25(2):023107
Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 20(1):68–86. https://doi.org/10.1109/T-C.1971.223083
Zeng J, Gong L, Wang Q et al (2009) Hierarchical clustering for topic analysis based on variable feature selection. In: 2009 sixth international conference on fuzzy systems and knowledge discovery. IEEE, pp 477–481
Zeng K, Ning M, Wang Y et al (2020) Hierarchical clustering with hard-batch triplet loss for person re-identification. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, June 13–19, 2020. IEEE, pp 13654–13662. https://doi.org/10.1109/CVPR42600.2020.01367
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, Montreal, QC, June 4–6, 1996. ACM Press, pp 103–114. https://doi.org/10.1145/233269.233324
Zhang W, Wang X, Zhao D et al (2012) Graph degree linkage: agglomerative clustering on a directed graph. In: Computer vision—ECCV 2012—12th European conference on computer vision, Florence, October 7–13, 2012, proceedings, part I, lecture notes in computer science, vol 7572. Springer, pp 428–441. https://doi.org/10.1007/978-3-642-33718-5_31
Zhang W, Zhao D, Wang X (2013) Agglomerative clustering via maximum incremental path integral. Pattern Recogn 46(11):3056–3065. https://doi.org/10.1016/j.patcog.2013.04.013
Zhao Y, Karypis G (2002) Evaluation of hierarchical clustering algorithms for document datasets. In: Proceedings of the 2002 ACM CIKM international conference on information and knowledge management, McLean, VA, November 4–9, 2002. ACM, pp 515–524. https://doi.org/10.1145/584792.584877
Zhao H, Qi Z (2010) Hierarchical agglomerative clustering with ordering constraints. In: Third international conference on knowledge discovery and data mining, WKDD 2010, Phuket, 9–10 January 2010. IEEE Computer Society, pp 195–199. https://doi.org/10.1109/WKDD.2010.123
Zhao D, Tang X (2008) Cyclizing clusters via zeta function of a graph. In: Advances in neural information processing systems 21, proceedings of the twenty-second annual conference on neural information processing systems, Vancouver, BC, December 8–11, 2008. Curran Associates, Inc., pp 1953–1960
Zhao Y, Karypis G, Fayyad UM (2005) Hierarchical clustering algorithms for document datasets. Data Min Knowl Discov 10(2):141–168. https://doi.org/10.1007/s10618-005-0361-3
Zhao W, Li B, Gu Q et al (2020) Improved hierarchical clustering with non-locally enhanced features for unsupervised person re-identification. In: 2020 international joint conference on neural networks, IJCNN 2020, Glasgow, July 19–24, 2020. IEEE, pp 1–8. https://doi.org/10.1109/IJCNN48605.2020.9206722
Zhou D, Huang J, Schölkopf B (2006) Learning with hypergraphs: clustering, classification, and embedding. In: Advances in neural information processing systems 19, proceedings of the twentieth annual conference on neural information processing systems, Vancouver, BC, December 4–7, 2006. MIT Press, pp 1601–1608
Zhou R, Zhang Y, Feng S et al (2018) A novel hierarchical clustering algorithm based on density peaks for complex datasets. Complex. https://doi.org/10.1155/2018/2032461
Zhu Y, Ting KM, Jin Y et al (2022) Hierarchical clustering that takes advantage of both density-peak and density-connectivity. Inf Syst 103(C):101871. https://doi.org/10.1016/j.is.2021.101871
Funding
This work is supported by the National Key R&D Program of China (Grant No. 2017YFE0111900, 2018YFB1003205), the Higher Education Innovation Fund of Gansu Province (Grants No. 2020B-214), and the Natural Science Foundation of Gansu Province (Grants No. 20JR10RG304).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ran, X., Xi, Y., Lu, Y. et al. Comprehensive survey on hierarchical clustering algorithms and the recent developments. Artif Intell Rev 56, 8219–8264 (2023). https://doi.org/10.1007/s10462-022-10366-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-022-10366-3