Abstract
Hierarchical clustering algorithms that provide tree-shaped results can be regarded as data summarization and thus play an important role in the application of knowledge discovery and data mining. However, such structured result also brings a challenge, i.e., a difficult trade-off between complexity (time and space) and quality. To tackle of this issue, we propose a newly designed agglomerative algorithm for hierarchical clustering in this paper, which merges data points into tree-shaped sub-clusters via the operations of nearest-neighbor chain searching and determines the proxy of each sub-cluster by the process of local density peak detection. Extensive experimental studies on real-world and synthetic datasets show that our method performs well by outperforming other baselines in accuracy, response time, and memory footprint. Meanwhile, our method can scale to half a million data points on a personal computer, further verifying its cost-effectiveness.
Corresponding author at: School of Computer Science, Southwest Petroleum University, Chengdu 610500, China. E-mail: wenboxie@swpu.edu.cn (Wen-Bo Xie). This work is supported by the Young Scholars Development Fund of SWPU under Grant No. 202199010142.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gagolewski, M., Bartoszuk, M., Cena, A.: Genie: a new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 363, 8–23 (2016)
Monath, N., Kobren, A., Krishnamurthy, A., Glass, M.R., McCallum, A.: Scalable hierarchical clustering with tree grafting. In: Proceedings of the 25th ACM SIGKDD, pp. 143–1448, New York, NY, USA (2019)
Kobren, A., Monath, N., Krishnamurthy, A., McCallum, A.: A hierarchical algorithm for extreme clustering. In: Proceedings of the 23rd ACM SIGKDD, pp. 255–264 (2017)
Xie, W.-B., Lee, Y.-L., Wang, C., Chen, D.-B., Zhou, T.: Hierarchical clustering supported by reciprocal nearest neighbors. Inf. Sci. 527, 279–292 (2020)
Dua, D., Graff, C.: UCI Machine Learning Repository (2019)
Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.: The Amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005). https://doi.org/10.1023/B:VISI.0000042993.50813.60
Bouguettaya, A., Qi, Yu., Liu, X., Zhou, X., Song, A.: Efficient agglomerative hierarchical clustering. Expert Syst. Appl. 42(5), 2785–2797 (2015)
Monath, N., et al.: Scalable hierarchical agglomerative clustering. In: Proceedings of the 27th ACM SIGKDD, pp. 1245–1255 (2021)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Yang, Y., Shen, F., Huang, Z., Shen, H.T., Li, X.: Discrete nonnegative spectral clustering. IEEE Trans. Knowl. Data Eng. 29(9), 1834–1845 (2017)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Feng, L., Qiu, M.-H., Wang, Y.-X., Xiang, Q.-L., Yang, Y.-F., Liu, K.: A fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recognit. Lett. 31(11), 1216–1225 (2010)
Han, X., Zhu, Y., Ting, K.M., Zhan, D.C., Li, G.: Streaming hierarchical clustering based on point-set kernel. In: Proceedings of the 28th ACM SIGKDD, pp. 525–533. Association for Computing Machinery (2022)
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. ACM SIGMOD Rec. 27(2), 73–84 (1998)
Dogan, A., Birant, D.: K-centroid link: a novel hierarchical clustering linkage method. Appl. Intell. 52, 5537–5560 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xie, WB., Chen, B., Shi, JH., Lee, YL., Wang, X., Fu, X. (2023). Cost-Effective Clustering by Aggregating Local Density Peaks. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13946. Springer, Cham. https://doi.org/10.1007/978-3-031-30678-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-30678-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30677-8
Online ISBN: 978-3-031-30678-5
eBook Packages: Computer ScienceComputer Science (R0)