Abstract
Different from general spherical datasets, manifold datasets have a more complex spatial manifold structure, which makes it difficult to distinguish sample points on different manifold structures by Euclidean distance. Although the density peak clustering (DPC, two parameters: the cut-off ratio \(\mathrm{dc}\) and the number of class centers \(C\)) algorithm can search for density peaks quickly and assign sample points, it cannot identify clusters effectively with complex manifold structures due to the sample similarity measurement only based on Euclidean distance. To solve these problems, this paper proposes a Manifold Clustering optimized by Adaptive Aggregation Strategy (MC-AAS, two parameters: the number of nearest neighbors \(k\) and the threshold ratio of core points \(p\)). Firstly, it introduces a novel manifold similarity measurement based on the shared nearest neighbors and redefines the local density of sample points by summing the manifold similarity. Secondly, the core points are determined by the statistical characteristics of local density, and the local sub-clusters of manifold structural datasets are obtained by means of the nearest neighbor connection of the core points. And then, the initial clusters are merged on the basis of the statistical test of boundary density and the silhouette coefficient of adjacent subclass to realize the identification of manifold structural datasets. Finally, based on three evaluation metrics: Adjusted Mutual Information, Adjusted Rand Index and Fowlkes-Mallows Index, we conduct extensive experiments on synthetic datasets and real-world datasets. The experimental results indicate that, compared with current methods, the MC-AAS algorithm achieves a better clustering effect in identifying complex manifold datasets and has better robustness.








Similar content being viewed by others
Notes
References
Akbar S, Khan MNA, Zulfikar S, Bhutto A (2014) Critical analysis of density-based spatial clustering of applications with noise (DBSCAN) techniques. Int J Database Theory Appl 7:17–28
Omar M, Al-akash S, Sakinah S, Ahmad M, Sanusi A (2018) Fuzzy Distance measure based affinity propagation clustering. Int J Appl Eng Res 13:1501–1505
Wang LJ, Ding SF, Jia HJ (2019) An improvement of spectral clustering via message passing and density sensitive similarity. IEEE Access 7:101054–101062
Cohen-Addad V, Kanade V, Mallmann-Trenn F, Mathieu C (2019) Hierarchical clustering: objective functions and algorithms. J ACM 66:1–42
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496
Sun J, Liu J, Zhao L (2008) Clustering algorithms research. J Softw 19:48–61
Ohyver M, Moniaga JV, Sungkawa I, Subagyo BE, Chandra IA (2019) The comparison firebase realtime database and MySQL database performance using Wilcoxon signed-rank test. Procedia Comput Sci 157:396–405
Wang X, Xu Y (2019) An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. IOP Conf Ser Mater Sci Eng 569:052024
Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors. Inform Sci 354:19–40
Liu YZ, Cheng RF, Liang YQ (2018) A density peak clustering algorithm based on shared neighborhood. Comput Sci 45:125–129+146
Jiang JH, Chen YJ, Meng XQ, Wang LM, Li KQ (2019) A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process. Physica A 523:702–713
Du MJ, Ding SF, Jia HJ (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl Based Syst 99:135–145
Liu R, Wang H, Yu XM (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Diao Q, Dai YP, An QC, Li WX, Feng XX, Pan F (2020) Clustering by detecting density peaks and assigning points by similarity-first search based on weighted K-nearest neighbors graph. Complexity 2020:1–17
Wang FY, Zhang DS, Zhang X (2021) Adaptive density peaks clustering algorithm combining with whale optimization algorithm. Comput Eng Appl 57:94–102
Mirjalili SM, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
Abualigah L, Diabat A, Mirjalili S, Elaziz MA, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609
Abualigah L, Yousri D, Elaziz MA, Ewees EG, Gandomi AH (2021) Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput Ind Eng 157:107250
Abualigah L, Elaziz MA, Sumari P, Zong WG, Gandomi AH (2021) Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Expert Syst Appl 191:116158
Xie JY, Gao HC, Xie WX (2016) K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset. Sci Sin Inf 46:258–280
Li T, Ge HW, Su SZ (2017) Research on density peak clustering based on density adaptive distance. J Chin Comput Syst 38:1347–1352
Ye XL, Zhao JY (2019) Multi-manifold clustering: a graph-constrained deep nonparametric method. Pattern Recogn 93:215–227
Cheng DD, Zhang SL, Huang JL (2020) Dense members of local cores-based density peaks clustering algorithm. Knowl Based Syst 193:105454
Xu XH, Ju YS, Liang YL, He P (2015) Manifold density peaks clustering algorithm. In: 2015 Third international conference on advanced cloud and big data, pp 311–318
Zhang J, Pechenizkiy M, Pei Y, Efremova J (2016) A robust density-based clustering algorithm for multi-manifold structure. In: Proceedings of the 31st annual ACM symposium on applied computing, pp 832–838
Chen JF, Zhang M, Zhao JC (2020) Clustering algorithm by fast search and find of density peaks for complex high-dimensional data. Comput Sci 47:79–86
Liu LN, Yu DH (2020) Density peaks clustering algorithm based on weighted k-nearest neighbors and geodesic distance. IEEE Access 8:168282–168296
Wang XX, Zhang YF, Xie J, Dai QZ, Xiong ZY, Dan JP (2020) A density-core-based clustering algorithm with local resultant force. Soft Comput 24:6571–6590
Bai XY, Yang PL, Shi XH (2017) An overlapping community detection algorithm based on density peaks. Neurocomputing 226:7–15
Shi Y, Chen Z, Qi Z, Meng F, Cui L (2016) A novel clustering-based image segmentation via density peaks algorithm with mid-level feature. Neural Comput Appl 28:29–39
Wu J, Zhong SH, Jiang JM, Yang YY (2016) A novel clustering method for static video summarization. Multimed Tools Appl 76:9625–9641
Shen YP, Gu SH, Zheng LX (2019) Bionic optimized clustering data mining algorithm based on cloud computing platform. Comput Sci 46:247–250
Su YJ (2019) Clustering scheduling algorithm for large data in optical fiber communication based on cloud computing. Laser J 40:168–172
Wang L, Yu SB, Qin T (2017) Application of improved DBSCAN clustering algorithm in task scheduling of cloud computing. J Beijing Univ Posts Telecommun 40:68–71
Rajavel R, Ravichandran SK, Nagappan P, Gobichettipalayam KR (2021) Cloud service negotiation framework for real-time E-commerce application using game theory decision system. J Intell Fuzzy Syst 41:5617–5628
Bendechache M, Tari K, Kechadi MT (2019) Parallel and distributed clustering framework for big spatial data mining. Int J Parallel Emergent Distrib Syst 34:671–689
Baalamurugan KM, Bhanu SV (2018) An efficient clustering scheme for cloud computing problems using metaheuristic algorithms. Clust Comput 22:12917–12927
Rajavel R, Thangarathanam M (2021) Agent-based automated dynamic SLA negotiation framework in the cloud using the stochastic optimization approach. Appl Soft Comput 101:107040
Rajavel R, Ravichandran SK, Harimoorthy K, Nagappan P, Gobichettipalayam KR (2022) IoT-based smart healthcare video surveillance system using edge computing. J Ambient Intell Humaniz Comput 13:3195–3207
Shooshtarian L, Lan D, Taherkordi A (2019) A clustering-based approach to efficient resource allocation in fog computing. In: I-SPAN, pp 207–224
Zou Y, Zhao Z, Shi S, Wang L, Peng Y, Ping Y, Wang B (2020) Highly secure privacy-preserving outsourced k-means clustering under multiple keys in cloud computing. Secur Commun Netw 1238505(1238501–1238505):1238511
Dua D, Taniskidou EK (2017) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1:4
Jain AK, Law MH (2005) Data clustering: a user's dilemma. In: International conference on pattern recognition and machine intelligence, pp 1–10
Hong C, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recogn 41:191–203
Goldgof DB (1993) Nuclear feature extraction for breast tumor diagnosis. Proc Spie 861–870
Dias DB, Madeo RC, Rocha T, Bȡscaro HH, Peres SM (2009) Hand movement recognition for Brazilian sign language: a study using distance-based neural networks. In: International joint conference on neural networks. IEEE, pp 697–704
Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A novel cluster validity index based on local cores. In: IEEE transactions on neural networks and learning systems, pp 1–15
Sigillito VG, Wing SP, Hutton LV, Baker KB (1989) Classification of radar returns from the ionosphere using neural networks. J Hopkins APL Tech Dig 10:262–266
Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl Based Syst 63:15–23
Funding
This work is supported by the National Key Research and Development Program of China (No.2021YFC3300602).
Author information
Authors and Affiliations
Contributions
YZ Conceptualization, Methodology, Code, Writing—original draft. XW Supervision, Writing—review and editing, Funding acquisition. CL Supervision, Writing—review and editing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethics approval
Not Applicable.
Consent to participate
Not Applicable.
Consent for publication
Not Applicable.
Availability of data and material
The experimental datasets include two major categories in this paper, six two-dimensional manifold datasets are artificially constructed, and the other six real datasets come from the UCI Machine Learning Repository. These datasets are available upon request to the corresponding author.
Code availability
The python code written to get the simulation results is available upon request to the corresponding author.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Wei, X. & Li, C. Manifold clustering optimized by adaptive aggregation strategy. Knowl Inf Syst 65, 379–408 (2023). https://doi.org/10.1007/s10115-022-01769-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01769-3