Abstract
Clustering analysis has been widely used in image segmentation, face recognition, protein identification, intrusion detection, document clustering and so on. Most of the previous clustering algorithms are not suitable for complex situations with manifold structure and large variations in density. Clustering by density core (DCore) turns out to be a very effective clustering method for complex structure. However, DCore must set too many parameters for better results, which often fails when the shape of data is complex and the density of data varies too much. Inspired by universal gravitation, we propose a novel clustering algorithm (called DCLRF) based on density core and local resultant force. In this algorithm, each data point is viewed as an object with a local resultant force (LRF) generated by its neighbors and a local measure named centrality is proposed based on LRF and natural neighbors. Firstly, we extract core points using the CE value. Then, we use the natural neighbor structure information of core points to get the final clustering results. Excluding the influence of noise, core points can well represent the structure of clusters. Therefore, DCLRF can obtain the optimal cluster numbers for the datasets which contain clusters of arbitrary shapes. Both synthetic datasets and real datasets are used for experiments to verify the efficiency and accuracy of the DCLRF.
Similar content being viewed by others
References
Aljarrah OY, Alhussein O, Yoo PD, Muhaidat S, Taha K, Kim K (2016) Data randomization and cluster-based partitioning for botnet intrusion detection. IEEE Trans Syst Man Cybern 46(8):1796–1806
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. In: SIGMOD 1999, Proceedings ACM SIGMOD international conference on management of data, June 1–3, 1999, Philadelphia, Pennsylvania, USA
Bache K, Lichman M (2013) Uci machine learning repository (2013). http://archive.ics.uci.edu/ml
Ballaarabe S, Gao X, Ginhac D, Brost V, Yang F (2016) Architecture-driven level set optimization: from clustering to subpixel image segmentation. IEEE Trans Syst Man Cybern 46(12):3181–3194
Birant D, Kut A (2007) St-dbscan: an algorithm for clustering spatial-temporal data. Data Knowl Eng 60(1):208–221
Bradley PS, Mangasarian OL, Street WN (1996) Clustering via concave minimization. In: Proceedings of the international conference on neural information processing systems. Denver, CO, USA, pp 368–374
Brown RA (2014) Building a balanced kd tree in o(kn log n) time. arXiv preprint arXiv:1410.5420
Chen Y, Tang S, Zhou L, Wang C, Du J, Wang T, Pei S (2016) Decentralized clustering by finding loose and distributed density cores. Inf Sci 433:510–526
Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl Based Syst 123:238–253
Dai Q, Xiong Z, Xie J, Wang X, Zhang Y, Shang J (2019) A novel clustering algorithm based on the natural reverse nearest neighbor structure. Inf Syst 84:1–16
Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96:226–231
Feldman D, Schmidt M, Sohler C (2013) Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering. In: Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, pp 1434–1453
Gan J, Tao Y (2015) Dbscan revisited: mis-claim, un-fixability, and approximation. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 519–530
Gómez J, Dasgupta D, Nasraoui O (2003) A new gravitational clustering algorithm. In: Proceedings of the 2003 SIAM international conference on data mining. San Francisco, CA, USA, pp 83–94
Gómez J, Nasraoui O, León E (2004) RAIN: data clustering using randomized interactions between data points. In: 2004 International Conference on Machine Learning and Applications, 2004. Proceedings. IEEE, Louisville, KY, USA, pp 250–255
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108
Hatamlou A (2013) Black hole: a new heuristic optimization approach for data clustering. Inf Sci 222:175–184
Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) A novel outlier cluster detection algorithm without top-n parameter. Knowl Based Syst 121:32–40
Jain AK (2010) Data clustering: 50 years beyond k-means. Int Conf Pattern Recognit 31(8):651–666
King B (1967) Step-wise clustering procedures. J Am Stat Assoc 62(317):86–101
Kundu S (1999) Gravitational clustering: a new approach based on the spatial distribution of the points. Pattern Recognit 32(7):1149–1160
Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Moss W, Ja H (1973) Numerical taxonomy. Annu Rev Entomol 18:227–258
Pei X, Wu T, Chen C (2014) Automated graph regularized projective nonnegative matrix factorization for document clustering. IEEE Trans Syst Man Cybern 44(10):1821–1831
Pei X, Lyu Z, Chen C, Chen C (2015) Manifold adaptive label propagation for face clustering. IEEE Trans Syst Man Cybern 45(8):1681–1691
Qi Y, Balem F, Faloutsos C, Kleinseetharaman J, Barjoseph Z (2008) Protein complex identification by supervised graph local clustering. Intell Syst Mol Biol 24(13):250–268
Rashedi E, Nezamabadipour H, Saryazdi S (2009) Gsa: a gravitational search algorithm. Inf Sci 179(13):2232–2248
Rijsbergen V (1979) Information Retrieval. Butterworths, London
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Rui X, Chunhong D (2008) An improved clustering algorithm. In: 2008 International symposium on computational intelligence and design. Wuhan, pp 394–397
Rui L, Hong W, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Sanchez MA, Castillo O, Castro JR, Melin P (2014) Fuzzy granular gravitational clustering algorithm for multivariate data. Inf Sci 279:498–511
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin C (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Sheikholeslami G, Chatterjee S, Zhang A (2000) Wavecluster: a wavelet-based clustering approach for spatial data in very large databases. Very Large Data Bases 8(3):289–304
Tan P, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Education India
Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 1073–1080
Wang Z, Yu Z, Chen CLP, You J, Gu T, Wong H, Zhang J (2018) Clustering by local gravitation. IEEE Trans Cybern 48(5):1383–1396
Wu M, Schölkopf B (2006) A local learning approach for clustering. In: Advances in neural information processing systems, pp 1529–1536
Xie J, Xiong Z, Zhang Y, Feng Y, Ma J (2017) Density core-based clustering algorithm with dynamic scanning radius. Knowl Based Syst 142:58–70
Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230:427–433
Zhou K, Yang S (2016) Exploring the uniform effect of fcm clustering: a data distribution perspective. Knowl Based Syst 96:76–83
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80:30–36
Acknowledgements
The authors would like to thank the Associate Editor and anonymous reviewers for their valuable comments and suggestions. This work is partly funded by the National Natural Science Foundation of China (No. 51608070).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Communicated by A. Di Nola.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, XX., Zhang, YF., Xie, J. et al. A density-core-based clustering algorithm with local resultant force. Soft Comput 24, 6571–6590 (2020). https://doi.org/10.1007/s00500-020-04777-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-04777-z