Skip to main content
Log in

A self-adaptive graph-based clustering method with noise identification

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Graph-based clustering methods offer competitive performance in dealing with complex and nonlinear data patterns. The outstanding characteristic of such methods is the capability to mine the internal topological structure of a dataset. However, most graph-based clustering algorithms are vulnerable to parameters. In this paper, we propose a self-adaptive graph-based clustering method (SAGC) with noise identification based on directed natural neighbor graph to auto identify the desired number of clusters and simultaneously obtain reliable clustering results without prior knowledge and parameter setting. This method adopts parameter adaptive process to deal with specific data patterns and can identify clusters with diverse shapes and detect noises. We use synthetic and UCI real-world datasets to prove the validity of the innovatory method by comparing it to k-means, DBSCAN, OPTICS, AP, SC, CutPC, and WC algorithms in terms of clustering Accuracy, Adjusted Rand index, Normalized Mutual Information and Fowlkes–Mallows index. The experimental results confirm that the proposed method contributes to the progress of graph-based clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Data availability

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

References

  1. Stevens S (1951) Mathematics measurement and psychophysics. Handbook of experimental psychology

  2. Vargas Muñoz J, Gonçalves MA, Dias Z et al (2019) Hierarchical clustering-based graphs for large scale approximate nearest neighbor search. Pattern Recogn 96(106):970

    Google Scholar 

  3. Qin Y, Yu ZL, Wang CD et al (2018) A novel clustering method based on hybrid k-nearest-neighbor graph. Pattern Recogn 74:1–14

    Article  Google Scholar 

  4. Kim Y, Do H, Kim SB (2020) Outer-points shaver: robust graph-based clustering via node cutting. Pattern Recogn 97(107):001

    Google Scholar 

  5. Xia J, Zhang J, Wang Y et al (2022) WC-KNNG-PC: watershed clustering based on k-nearest-neighbor graph and Pauta criterion. Pattern Recogn 121(108):177

    Google Scholar 

  6. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  MATH  Google Scholar 

  7. Guo G, Wang H, Bell D, Bi Y, Greer K (2003) KNN model-based approach in classification. In: Meersman R, Tari Z, Schmidt DC (eds) On the move to meaningful internet systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, vol 2888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39964-3_62

    Chapter  Google Scholar 

  8. Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80:30–36

    Article  Google Scholar 

  9. Li LT, Xiong ZY, Dai QZ et al (2020) A novel graph-based clustering method using noise cutting. Inf Syst 91(101):504

    Google Scholar 

  10. Yan D, Wang Y, Wang J et al (2021) K-nearest neighbor search by random projection forests. IEEE Trans Big Data 7(1):147–157

    Article  Google Scholar 

  11. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Article  MATH  Google Scholar 

  12. Tarjan R (1971) Depth-first search and linear graph algorithms. In: 12th Annual symposium on switching and automata theory (SWAT 1971), pp 114–121

  13. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Sympos. Math. Statist. and probability (Berkeley, Calif., 1965/66). Univ. California Press, Berkeley, Calif., pp Vol. I: Statistics, pp 281–297

  14. Ester M, Kriegel H, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad UM (eds) Proceedings of the second international conference on knowledge discovery and data mining (KDD-96), Portland, Oregon, USA. AAAI Press, pp 226–231

  15. Ankerst M, Breunig MM, Kriegel H et al (1999) OPTICS: ordering points to identify the clustering structure. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds) SIGMOD 1999, Proceedings ACM SIGMOD international conference on management of data, June 1–3, 1999. ACM Press, Philadelphia, Pennsylvania, USA, pp 49–60

  16. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science (New York, NY) 315(5814):972–976

    Article  MathSciNet  MATH  Google Scholar 

  17. Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Proceedings of the 14th international conference on neural information processing systems: natural and synthetic, NIPS’01. MIT Press, Cambridge, pp 849–856

  18. Schölkopf B, Platt J, Hofmann T (2007) A local learning approach for clustering, pp 1529–1536

  19. McInnes L, Healy J (2017) Accelerated hierarchical density based clustering. In: 2017 IEEE International conference on data mining workshops (ICDMW), pp 33–42

  20. Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of China (No. 41804112), the Youth Project of Science and Technology Research Program of Chongqing Education Commission of China (No. KJQN202001143) and the High Quality Development Plan of Graduate Education of Chongqing University of Technology (No. gzlcx20223216).

Funding

The research leading to these results received funding from [the Natural Science Foundation of China] under Grant Agreement No. [41804112]. The research leading to these results received funding from [the Youth Project of Science and Technology Research Program of Chongqing Education Commission of China] under Grant Agreement No. [KJQN202001143]. The research leading to these results received funding from [the High Quality Development Plan of Graduate Education of Chongqing University of Technology] under Grant Agreement No. [gzlcx20223216].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chengyun Song.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, L., Chen, X. & Song, C. A self-adaptive graph-based clustering method with noise identification. Pattern Anal Applic 26, 907–916 (2023). https://doi.org/10.1007/s10044-023-01160-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-023-01160-0

Keywords

Navigation