FastDEC: Clustering by Fast Dominance Estimation

Yang, Geping; Lv, Hongzhang; Yang, Yiyang; Gong, Zhiguo; Chen, Xiang; Hao, Zhifeng

doi:10.1007/978-3-031-26387-3_9

Geping Yang¹³,
Hongzhang Lv¹³,
Yiyang Yang¹³,
Zhiguo Gong¹⁴,
Xiang Chen¹⁵ &
…
Zhifeng Hao¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13713))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

927 Accesses

Abstract

k-Nearest Neighbors (k-NN) graph is essential for the various graph mining tasks. In this work, we study the density-based clustering on the k-NN graph and propose FastDEC, a clustering framework by fast dominance estimation. The nearest density higher (NDH) relation and dominance-component (DC), more specifically their integration with the k-NN graph, are formally defined and theoretically analyzed. FastDEC includes two extensions to satisfy different clustering scenarios: FastDEC\(_D\) for partitioning data into clusters with arbitrary shapes, and FastDEC\(_K\) for K-Way partition. Firstly, a set of DCs is detected as the results of FastDEC\(_D\) by segmenting the given k-NN graph. Then, the K-Way partition is generated by selecting the top-K DCs in terms of the inter-dominance (ID) as the seeds, and assigning the remaining DCs to their nearest dominators.

FastDEC can be viewed as a much faster, more robust, and k-NN based variant of the classical density-based clustering algorithm: Density Peak Clustering (DPC). DPC estimates the significance of data points from the density and geometric distance factors, while FastDEC innovatively uses the global rank of the dominator as an additional factor in the significance estimation. FastDEC naturally holds several critical characteristics: (1) excellent clustering performance; (2) easy to interpret and implement; (3) efficiency and robustness. Experiments on both the artificial and real datasets demonstrate that FastDEC outperforms the state-of-the-art density methods including DPC.

G. Yang and H. Lv—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
FastDEC is released on https://github.com/gepingyang/FastDEC.
2.
Density-Reachable (DR) in DBSCAN [15] is equivalent to \(\tau \) based Flat Kernel. For the sake of comparison, we use a k-NN based one.
3.
https://numpy.org/.

References

Amagata, D., Hara, T.: Fast density-peaks clustering: multicore-based parallelization approach. In: SIGMOD 2021: International Conference on Management of Data, Virtual Event, China, 20–25 Jun 2021, pp. 49–61. ACM (2021)
Google Scholar
Angelino, C.V., Debreuve, E., Barlaud, M.: Image restoration using a kNN-variant of the mean-shift. In: 2008 15th IEEE International Conference on Image Processing (ICIP), pp. 573–576. IEEE (2008)
Google Scholar
Cai, J., Wei, H., Yang, H., Zhao, X.: A novel clustering algorithm based on DPC and PSO. IEEE Access 8, 88200–88214 (2020)
Article Google Scholar
Carreira-Perpiñán, M.Á., Wang, W.: The k-modes algorithm for clustering. arXiv preprint arXiv:1304.6478 (2013)
Chang, H., Yeung, D.: Robust path-based spectral clustering. Pattern Recognit. 41(1), 191–203 (2008)
Article MATH Google Scholar
Chaudhuri, K., Dasgupta, S.: Rates of convergence for the cluster tree. In: Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems (NIPS), pp. 343–351. Curran Associates, Inc. (2010)
Google Scholar
Chaudhuri, K., Dasgupta, S., Kpotufe, S., von Luxburg, U.: Consistent procedures for cluster tree estimation and pruning. IEEE Trans. Inf. Theory 60(12), 7900–7912 (2014)
Article MathSciNet MATH Google Scholar
Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995)
Article Google Scholar
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Article Google Scholar
Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: Proceedings of the Annual ACM Symposium on Theory of Computing (STOC), pp. 537–546 (2008)
Google Scholar
Davidson, I., Ravi, S.S.: Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 59–70. Springer, Heidelberg (2005). https://doi.org/10.1007/11564126_11
Chapter Google Scholar
Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web (WWW), pp. 577–586. ACM (2011)
Google Scholar
Du, M., Ding, S., Jia, H.: Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl. Based Syst. 99, 135–145 (2016)
Article Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)
Google Scholar
Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recognit. 39(5), 761–775 (2006)
Article MATH Google Scholar
Fu, L., Medico, E.: FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform. 8, 3 (2007)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS, Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Knowledge Discovery and Data Mining (KDD), pp. 58–65 (1998)
Google Scholar
Jiang, H., Jang, J., Kpotufe, S.: Quickshift++: Provably good initializations for sample-based mean shift. In: International Conference on Machine Learning (ICML), vol. 80, pp. 2299–2308. PMLR (2018)
Google Scholar
Jiang, H., Kpotufe, S.: Modal-set estimation with an application to clustering. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 54, pp. 1197–1206. PMLR (2017)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Liu, R., Wang, H., Yu, X.: Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf. Sci. 450, 200–226 (2018)
Article MathSciNet Google Scholar
Myhre, J.N., Mikalsen, K.Ø., Løkse, S., Jenssen, R.: Robust clustering using a kNN mode seeking ensemble. Pattern Recognit. 76, 491–505 (2018)
Article Google Scholar
Rasool, Z., Zhou, R., Chen, L., Liu, C., Xu, J.: Index-based solutions for efficient density peak clustering (extended abstract). In: 37th IEEE International Conference on Data Engineering, ICDE 2021, Chania, Greece, 19–22 Apr 2021, pp. 2342–2343. IEEE (2021)
Google Scholar
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Article Google Scholar
Sarfraz, M.S., Sharma, V., Stiefelhagen, R.: Efficient parameter-free clustering using first neighbor relations. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2019)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Vedaldi, A., Soatto, S.: Quick shift and kernel methods for mode seeking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 705–718. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_52
Chapter Google Scholar
Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)
Article Google Scholar
Wang, W., Carreira-Perpiñán, M.Á.: The laplacian k-modes algorithm for clustering. arXiv preprint arXiv:1406.3895 (2014)
Xie, J., Gao, H., Xie, W., Liu, X., Grant, P.W.: Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf. Sci. 354, 19–40 (2016)
Article Google Scholar
Yang, Y., et al.: GraphLSHC: towards large scale spectral hypergraph clustering. Inf. Sci. 544, 117–134 (2021)
Google Scholar
Yang, Y., Gong, Z., Li, Q., U, L.H., Cai, R., Hao, Z.: A robust noise resistant algorithm for POI identification from flickr data. In: Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), pp. 3294–3300. ijcai.org (2017)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: SIGMOD, pp. 103–114. ACM Press, New York (1996)
Google Scholar
Zheng, X., Ren, C., Yang, Y., Gong, Z., Chen, X., Hao, Z.: QuickDSC: clustering by quick density subgraph estimation. Inf. Sci. 581, 403–427 (2021)
Article MathSciNet Google Scholar

Download references

Acknowledgment

We thank the anonymous reviewers for their constructive comments and thoughtful suggestions. This work was supported in part by: National Key D &R Program of China (019YFB1600704, 2021ZD0111501), NSFC (61603101, 61876043, 61976052), NSF of Guangdong Province (2021A1515011941), State’s Key Project of Research and Development Plan (2019YFE0196400), NSF for Excellent Young Scholars (62122022), Guangzhou STIC (EF005/FST-GZG/2019/GSTIC), NSFC-Guangdong Joint Fund (U1501254), the Science and Technology Development Fund, Macau SAR (0068/2020/AGJ, 0045/2019/A1, SKL-IOTSC(UM)-2021-2023, GDST (2020B1212030003).

Author information

Authors and Affiliations

Faculty of Computer, Guangdong University of Technology, Guangzhou, China
Geping Yang, Hongzhang Lv & Yiyang Yang
State Key Laboratory of Internet of Things for Smart City and Department of Computer and Information Science, University of Macau, Macau, China
Zhiguo Gong
School of Electronics and Information Technology, Sun Yat-Sen University, Guangzhou, China
Xiang Chen
College of Engineering, Shantou University, shantou, China
Zhifeng Hao

Authors

Geping Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhang Lv
View author publications
You can also search for this author in PubMed Google Scholar
Yiyang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguo Gong
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yiyang Yang or Zhiguo Gong .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 149 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, G., Lv, H., Yang, Y., Gong, Z., Chen, X., Hao, Z. (2023). FastDEC: Clustering by Fast Dominance Estimation. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13713. Springer, Cham. https://doi.org/10.1007/978-3-031-26387-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-26387-3_9
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26386-6
Online ISBN: 978-3-031-26387-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

FastDEC: Clustering by Fast Dominance Estimation