Abstract
Graph-based clustering methods offer competitive performance in dealing with complex and nonlinear data patterns. The outstanding characteristic of such methods is the capability to mine the internal topological structure of a dataset. However, most graph-based clustering algorithms are vulnerable to parameters. In this paper, we propose a self-adaptive graph-based clustering method (SAGC) with noise identification based on directed natural neighbor graph to auto identify the desired number of clusters and simultaneously obtain reliable clustering results without prior knowledge and parameter setting. This method adopts parameter adaptive process to deal with specific data patterns and can identify clusters with diverse shapes and detect noises. We use synthetic and UCI real-world datasets to prove the validity of the innovatory method by comparing it to k-means, DBSCAN, OPTICS, AP, SC, CutPC, and WC algorithms in terms of clustering Accuracy, Adjusted Rand index, Normalized Mutual Information and Fowlkes–Mallows index. The experimental results confirm that the proposed method contributes to the progress of graph-based clustering algorithms.
Data availability
The data that support the findings of this study are available from the corresponding author, upon reasonable request.
References
Stevens S (1951) Mathematics measurement and psychophysics. Handbook of experimental psychology
Vargas Muñoz J, Gonçalves MA, Dias Z et al (2019) Hierarchical clustering-based graphs for large scale approximate nearest neighbor search. Pattern Recogn 96(106):970
Qin Y, Yu ZL, Wang CD et al (2018) A novel clustering method based on hybrid k-nearest-neighbor graph. Pattern Recogn 74:1–14
Kim Y, Do H, Kim SB (2020) Outer-points shaver: robust graph-based clustering via node cutting. Pattern Recogn 97(107):001
Xia J, Zhang J, Wang Y et al (2022) WC-KNNG-PC: watershed clustering based on k-nearest-neighbor graph and Pauta criterion. Pattern Recogn 121(108):177
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Guo G, Wang H, Bell D, Bi Y, Greer K (2003) KNN model-based approach in classification. In: Meersman R, Tari Z, Schmidt DC (eds) On the move to meaningful internet systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, vol 2888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39964-3_62
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80:30–36
Li LT, Xiong ZY, Dai QZ et al (2020) A novel graph-based clustering method using noise cutting. Inf Syst 91(101):504
Yan D, Wang Y, Wang J et al (2021) K-nearest neighbor search by random projection forests. IEEE Trans Big Data 7(1):147–157
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Tarjan R (1971) Depth-first search and linear graph algorithms. In: 12th Annual symposium on switching and automata theory (SWAT 1971), pp 114–121
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Sympos. Math. Statist. and probability (Berkeley, Calif., 1965/66). Univ. California Press, Berkeley, Calif., pp Vol. I: Statistics, pp 281–297
Ester M, Kriegel H, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad UM (eds) Proceedings of the second international conference on knowledge discovery and data mining (KDD-96), Portland, Oregon, USA. AAAI Press, pp 226–231
Ankerst M, Breunig MM, Kriegel H et al (1999) OPTICS: ordering points to identify the clustering structure. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds) SIGMOD 1999, Proceedings ACM SIGMOD international conference on management of data, June 1–3, 1999. ACM Press, Philadelphia, Pennsylvania, USA, pp 49–60
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science (New York, NY) 315(5814):972–976
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Proceedings of the 14th international conference on neural information processing systems: natural and synthetic, NIPS’01. MIT Press, Cambridge, pp 849–856
Schölkopf B, Platt J, Hofmann T (2007) A local learning approach for clustering, pp 1529–1536
McInnes L, Healy J (2017) Accelerated hierarchical density based clustering. In: 2017 IEEE International conference on data mining workshops (ICDMW), pp 33–42
Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854
Acknowledgements
This work is supported by the Natural Science Foundation of China (No. 41804112), the Youth Project of Science and Technology Research Program of Chongqing Education Commission of China (No. KJQN202001143) and the High Quality Development Plan of Graduate Education of Chongqing University of Technology (No. gzlcx20223216).
Funding
The research leading to these results received funding from [the Natural Science Foundation of China] under Grant Agreement No. [41804112]. The research leading to these results received funding from [the Youth Project of Science and Technology Research Program of Chongqing Education Commission of China] under Grant Agreement No. [KJQN202001143]. The research leading to these results received funding from [the High Quality Development Plan of Graduate Education of Chongqing University of Technology] under Grant Agreement No. [gzlcx20223216].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, L., Chen, X. & Song, C. A self-adaptive graph-based clustering method with noise identification. Pattern Anal Applic 26, 907–916 (2023). https://doi.org/10.1007/s10044-023-01160-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-023-01160-0