Skip to main content

FastDEC: Clustering by Fast Dominance Estimation

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13713))

  • 927 Accesses

Abstract

k-Nearest Neighbors (k-NN) graph is essential for the various graph mining tasks. In this work, we study the density-based clustering on the k-NN graph and propose FastDEC, a clustering framework by fast dominance estimation. The nearest density higher (NDH) relation and dominance-component (DC), more specifically their integration with the k-NN graph, are formally defined and theoretically analyzed. FastDEC includes two extensions to satisfy different clustering scenarios: FastDEC\(_D\) for partitioning data into clusters with arbitrary shapes, and FastDEC\(_K\) for K-Way partition. Firstly, a set of DCs is detected as the results of FastDEC\(_D\) by segmenting the given k-NN graph. Then, the K-Way partition is generated by selecting the top-K DCs in terms of the inter-dominance (ID) as the seeds, and assigning the remaining DCs to their nearest dominators.

FastDEC can be viewed as a much faster, more robust, and k-NN based variant of the classical density-based clustering algorithm: Density Peak Clustering (DPC). DPC estimates the significance of data points from the density and geometric distance factors, while FastDEC innovatively uses the global rank of the dominator as an additional factor in the significance estimation. FastDEC naturally holds several critical characteristics: (1) excellent clustering performance; (2) easy to interpret and implement; (3) efficiency and robustness. Experiments on both the artificial and real datasets demonstrate that FastDEC outperforms the state-of-the-art density methods including DPC.

G. Yang and H. Lv—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    FastDEC is released on https://github.com/gepingyang/FastDEC.

  2. 2.

    Density-Reachable (DR) in DBSCAN [15] is equivalent to \(\tau \) based Flat Kernel. For the sake of comparison, we use a k-NN based one.

  3. 3.

    https://numpy.org/.

References

  1. Amagata, D., Hara, T.: Fast density-peaks clustering: multicore-based parallelization approach. In: SIGMOD 2021: International Conference on Management of Data, Virtual Event, China, 20–25 Jun 2021, pp. 49–61. ACM (2021)

    Google Scholar 

  2. Angelino, C.V., Debreuve, E., Barlaud, M.: Image restoration using a kNN-variant of the mean-shift. In: 2008 15th IEEE International Conference on Image Processing (ICIP), pp. 573–576. IEEE (2008)

    Google Scholar 

  3. Cai, J., Wei, H., Yang, H., Zhao, X.: A novel clustering algorithm based on DPC and PSO. IEEE Access 8, 88200–88214 (2020)

    Article  Google Scholar 

  4. Carreira-Perpiñán, M.Á., Wang, W.: The k-modes algorithm for clustering. arXiv preprint arXiv:1304.6478 (2013)

  5. Chang, H., Yeung, D.: Robust path-based spectral clustering. Pattern Recognit. 41(1), 191–203 (2008)

    Article  MATH  Google Scholar 

  6. Chaudhuri, K., Dasgupta, S.: Rates of convergence for the cluster tree. In: Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems (NIPS), pp. 343–351. Curran Associates, Inc. (2010)

    Google Scholar 

  7. Chaudhuri, K., Dasgupta, S., Kpotufe, S., von Luxburg, U.: Consistent procedures for cluster tree estimation and pruning. IEEE Trans. Inf. Theory 60(12), 7900–7912 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  8. Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995)

    Article  Google Scholar 

  9. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)

    Article  Google Scholar 

  10. Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: Proceedings of the Annual ACM Symposium on Theory of Computing (STOC), pp. 537–546 (2008)

    Google Scholar 

  11. Davidson, I., Ravi, S.S.: Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 59–70. Springer, Heidelberg (2005). https://doi.org/10.1007/11564126_11

    Chapter  Google Scholar 

  12. Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th International Conference on World Wide Web (WWW), pp. 577–586. ACM (2011)

    Google Scholar 

  13. Du, M., Ding, S., Jia, H.: Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl. Based Syst. 99, 135–145 (2016)

    Article  Google Scholar 

  14. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  15. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Knowledge Discovery and Data Mining (KDD), pp. 226–231 (1996)

    Google Scholar 

  16. Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recognit. 39(5), 761–775 (2006)

    Article  MATH  Google Scholar 

  17. Fu, L., Medico, E.: FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform. 8, 3 (2007)

    Article  Google Scholar 

  18. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS, Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  19. Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Knowledge Discovery and Data Mining (KDD), pp. 58–65 (1998)

    Google Scholar 

  20. Jiang, H., Jang, J., Kpotufe, S.: Quickshift++: Provably good initializations for sample-based mean shift. In: International Conference on Machine Learning (ICML), vol. 80, pp. 2299–2308. PMLR (2018)

    Google Scholar 

  21. Jiang, H., Kpotufe, S.: Modal-set estimation with an application to clustering. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 54, pp. 1197–1206. PMLR (2017)

    Google Scholar 

  22. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  23. Liu, R., Wang, H., Yu, X.: Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf. Sci. 450, 200–226 (2018)

    Article  MathSciNet  Google Scholar 

  24. Myhre, J.N., Mikalsen, K.Ø., Løkse, S., Jenssen, R.: Robust clustering using a kNN mode seeking ensemble. Pattern Recognit. 76, 491–505 (2018)

    Article  Google Scholar 

  25. Rasool, Z., Zhou, R., Chen, L., Liu, C., Xu, J.: Index-based solutions for efficient density peak clustering (extended abstract). In: 37th IEEE International Conference on Data Engineering, ICDE 2021, Chania, Greece, 19–22 Apr 2021, pp. 2342–2343. IEEE (2021)

    Google Scholar 

  26. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  Google Scholar 

  27. Sarfraz, M.S., Sharma, V., Stiefelhagen, R.: Efficient parameter-free clustering using first neighbor relations. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8934–8943 (2019)

    Google Scholar 

  28. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  29. Vedaldi, A., Soatto, S.: Quick shift and kernel methods for mode seeking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 705–718. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_52

    Chapter  Google Scholar 

  30. Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)

    Article  Google Scholar 

  31. Wang, W., Carreira-Perpiñán, M.Á.: The laplacian k-modes algorithm for clustering. arXiv preprint arXiv:1406.3895 (2014)

  32. Xie, J., Gao, H., Xie, W., Liu, X., Grant, P.W.: Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Inf. Sci. 354, 19–40 (2016)

    Article  Google Scholar 

  33. Yang, Y., et al.: GraphLSHC: towards large scale spectral hypergraph clustering. Inf. Sci. 544, 117–134 (2021)

    Google Scholar 

  34. Yang, Y., Gong, Z., Li, Q., U, L.H., Cai, R., Hao, Z.: A robust noise resistant algorithm for POI identification from flickr data. In: Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), pp. 3294–3300. ijcai.org (2017)

    Google Scholar 

  35. Zhang, T., Ramakrishnan, R., Livny, M.: SIGMOD, pp. 103–114. ACM Press, New York (1996)

    Google Scholar 

  36. Zheng, X., Ren, C., Yang, Y., Gong, Z., Chen, X., Hao, Z.: QuickDSC: clustering by quick density subgraph estimation. Inf. Sci. 581, 403–427 (2021)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgment

We thank the anonymous reviewers for their constructive comments and thoughtful suggestions. This work was supported in part by: National Key D &R Program of China (019YFB1600704, 2021ZD0111501), NSFC (61603101, 61876043, 61976052), NSF of Guangdong Province (2021A1515011941), State’s Key Project of Research and Development Plan (2019YFE0196400), NSF for Excellent Young Scholars (62122022), Guangzhou STIC (EF005/FST-GZG/2019/GSTIC), NSFC-Guangdong Joint Fund (U1501254), the Science and Technology Development Fund, Macau SAR (0068/2020/AGJ, 0045/2019/A1, SKL-IOTSC(UM)-2021-2023, GDST (2020B1212030003).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yiyang Yang or Zhiguo Gong .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 149 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, G., Lv, H., Yang, Y., Gong, Z., Chen, X., Hao, Z. (2023). FastDEC: Clustering by Fast Dominance Estimation. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13713. Springer, Cham. https://doi.org/10.1007/978-3-031-26387-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26387-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26386-6

  • Online ISBN: 978-3-031-26387-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics