Cost-Effective Clustering by Aggregating Local Density Peaks

Xie, Wen-Bo; Chen, Bin; Shi, Jun-Hao; Lee, Yan-Li; Wang, Xin; Fu, Xun

doi:10.1007/978-3-031-30678-5_5

Wen-Bo Xie¹⁵,
Bin Chen¹⁵,
Jun-Hao Shi¹⁵,
Yan-Li Lee¹⁶,
Xin Wang¹⁵ &
…
Xun Fu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13946))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1468 Accesses

Abstract

Hierarchical clustering algorithms that provide tree-shaped results can be regarded as data summarization and thus play an important role in the application of knowledge discovery and data mining. However, such structured result also brings a challenge, i.e., a difficult trade-off between complexity (time and space) and quality. To tackle of this issue, we propose a newly designed agglomerative algorithm for hierarchical clustering in this paper, which merges data points into tree-shaped sub-clusters via the operations of nearest-neighbor chain searching and determines the proxy of each sub-cluster by the process of local density peak detection. Extensive experimental studies on real-world and synthetic datasets show that our method performs well by outperforming other baselines in accuracy, response time, and memory footprint. Meanwhile, our method can scale to half a million data points on a personal computer, further verifying its cost-effectiveness.

Corresponding author at: School of Computer Science, Southwest Petroleum University, Chengdu 610500, China. E-mail: wenboxie@swpu.edu.cn (Wen-Bo Xie). This work is supported by the Young Scholars Development Fund of SWPU under Grant No. 202199010142.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gagolewski, M., Bartoszuk, M., Cena, A.: Genie: a new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 363, 8–23 (2016)
Article Google Scholar
Monath, N., Kobren, A., Krishnamurthy, A., Glass, M.R., McCallum, A.: Scalable hierarchical clustering with tree grafting. In: Proceedings of the 25th ACM SIGKDD, pp. 143–1448, New York, NY, USA (2019)
Google Scholar
Kobren, A., Monath, N., Krishnamurthy, A., McCallum, A.: A hierarchical algorithm for extreme clustering. In: Proceedings of the 23rd ACM SIGKDD, pp. 255–264 (2017)
Google Scholar
Xie, W.-B., Lee, Y.-L., Wang, C., Chen, D.-B., Zhou, T.: Hierarchical clustering supported by reciprocal nearest neighbors. Inf. Sci. 527, 279–292 (2020)
Article MathSciNet Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository (2019)
Google Scholar
Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.: The Amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005). https://doi.org/10.1023/B:VISI.0000042993.50813.60
Article Google Scholar
Bouguettaya, A., Qi, Yu., Liu, X., Zhou, X., Song, A.: Efficient agglomerative hierarchical clustering. Expert Syst. Appl. 42(5), 2785–2797 (2015)
Article Google Scholar
Monath, N., et al.: Scalable hierarchical agglomerative clustering. In: Proceedings of the 27th ACM SIGKDD, pp. 1245–1255 (2021)
Google Scholar
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Article Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Article Google Scholar
Yang, Y., Shen, F., Huang, Z., Shen, H.T., Li, X.: Discrete nonnegative spectral clustering. IEEE Trans. Knowl. Data Eng. 29(9), 1834–1845 (2017)
Article Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Article MATH Google Scholar
Feng, L., Qiu, M.-H., Wang, Y.-X., Xiang, Q.-L., Yang, Y.-F., Liu, K.: A fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recognit. Lett. 31(11), 1216–1225 (2010)
Article Google Scholar
Han, X., Zhu, Y., Ting, K.M., Zhan, D.C., Li, G.: Streaming hierarchical clustering based on point-set kernel. In: Proceedings of the 28th ACM SIGKDD, pp. 525–533. Association for Computing Machinery (2022)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. ACM SIGMOD Rec. 27(2), 73–84 (1998)
Article MATH Google Scholar
Dogan, A., Birant, D.: K-centroid link: a novel hierarchical clustering linkage method. Appl. Intell. 52, 5537–5560 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Southwest Petroleum University, Chengdu, 610500, China
Wen-Bo Xie, Bin Chen, Jun-Hao Shi, Xin Wang & Xun Fu
School of Computer and Software Engineering, Xihua University, Chengdu, 610039, China
Yan-Li Lee

Authors

Wen-Bo Xie
View author publications
You can also search for this author in PubMed Google Scholar
Bin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Hao Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Li Lee
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xun Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen-Bo Xie .

Editor information

Editors and Affiliations

Tianjin University, Tianjin, China
Xin Wang
University of Torino, Turin, Italy
Maria Luisa Sapino
POSTECH, Pohang, Korea (Republic of)
Wook-Shin Han
University of California Santa Barbara, Santa Barbara, CA, USA
Amr El Abbadi
University of Auckland, Auckland, New Zealand
Gill Dobbie
Tianjin University, Tianjin, China
Zhiyong Feng
Beijing University of Posts and Telecommunications, Beijing, China
Yingxiao Shao
The University of Queensland, Brisbane, QLD, Australia
Hongzhi Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, WB., Chen, B., Shi, JH., Lee, YL., Wang, X., Fu, X. (2023). Cost-Effective Clustering by Aggregating Local Density Peaks. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13946. Springer, Cham. https://doi.org/10.1007/978-3-031-30678-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-30678-5_5
Published: 14 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30677-8
Online ISBN: 978-3-031-30678-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cost-Effective Clustering by Aggregating Local Density Peaks