Abstract
Classic k-means clustering algorithm randomly selects centroids for initialization to possibly output unstable clustering results. Moreover, random initialization makes the clustering result hard to reproduce. Spectral clustering algorithm is a two-step strategy, which first generates a similarity matrix and then conducts eigenvalue decomposition on the Laplacian matrix of the similarity matrix to obtain the spectral representation. However, the goal of the first step in the spectral clustering algorithm does not guarantee the best clustering result. To address the above issues, this paper proposes an Initialization-Similarity (IS) algorithm which learns the similarity matrix and the new representation in a unified way and fixes initialization using the sum-of-norms regularization to make the clustering more robust. The experimental results on ten real-world benchmark datasets demonstrate that our IS clustering algorithm outperforms the comparison clustering algorithms in terms of three evaluation metrics for clustering algorithm including accuracy (ACC), normalized mutual information (NMI), and Purity.
Similar content being viewed by others
References
Ahmed T, Sarma M (2018) Locality sensitive hashing based space partitioning approach for indexing multidimensional feature vectors of fingerprint image data. IET Image Process 12(6):1056–1064
Ankerst M, et al (1999) OPTICS: ordering points to identify the clustering structure. in ACM Sigmod record. p. 49–60
Barron JT (2017) A more general robust loss function. arXiv preprint arXiv:1701.03077
Bian Z, Ishibuchi H, Wang S (2019) Joint learning of spectral clustering structure and fuzzy similarity matrix of data. IEEE Trans Fuzzy Syst 27(1):31–44
Bin Y et al (2018) Describing video with attention-based bidirectional LSTM. IEEE transactions on cybernetics. https://doi.org/10.1109/TCYB.2018.2831447
Black MJ, Rangarajan A (1996) On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. Int J Comput Vis 19(1):57–91
Bu Z et al (2018) GLEAM: a graph clustering framework based on potential game optimization for large-scale social networks. Knowl Inf Syst 55(3):741–770
Cherng JS, Lo MJ (2001) A hypergraph based clustering algorithm for spatial data sets. in ICDM, p. 83–90
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Das A, Panigrahi P (2018) Normalized Laplacian spectrum of some subdivision-joins and R-joins of two regular graphs. AKCE International Journal of Graphs and Combinatorics 15(3):261–270
Deelers S, Auwatanamongkol S (2007) Enhancing K-means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance. Int J Comput Sci 2(4):247–252
Doad PK, Mahip MB (2013) Survey on Clustering Algorithm & Diagnosing Unsupervised Anomalies for Network Security. International Journal of Current Engineering and Technology ISSN, p. 2277–410
Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Transactions on Knowledge Discovery from Data (TKDD) 2(4):17
Duan Y, Liu Q, Xia S (2018) An improved initialization center k-means clustering algorithm based on distance and density in AIP: 1955(1), p. 040–046
Estivill-Castro V, Lee I (2000) Amoeba: Hierarchical clustering based on spatial proximity using delaunay diagram. in ISSDH, p. 1–16
Geman S, McClure DE (1987) Statistical methods for tomographic image reconstruction. Bulletin of the International statistical Institute 52(4):5–21
Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366
Guha S, Rastogi R, Shim K (2001) Cure: an efficient clustering algorithm for large databases. Inf Syst 26(1):35–58
Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108
Hu H, et al (2014) Smooth representation clustering. in CV PR. p. 3834–3841
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
Kang Z et al (2019) Low-rank kernel learning for graph-based clustering. Knowl-Based Syst 163:510–517
Karypis G, Han E-H, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Kuncheva LI, Vetrov DP (2006) Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans Pattern Anal Mach Intell 28(11):1798–1808
Lakshmi MA, Daniel GV, Rao DS (2019) Initial Centroids for K-Means Using Nearest Neighbors and Feature Means, in SCSP, p. 27–34
Lei C, Zhu X (2018) Unsupervised feature selection via local structure learning and sparse learning. Multimed Tools Appl 77(22):29605–29622
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461
Lindsten F, Ohlsson H, Ljung L (2011) Clustering using sum-of-norms regularization: With application to particle filter output computation. in SSP, p. 201–201
Liu G et al (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Lu CY, et al (2012) Robust and efficient subspace segmentation via least squares regression. in ECCV. p. 347–360
Moftah HM et al (2014) Adaptive k-means clustering algorithm for MR breast image segmentation. Neural Comput & Applic 24(7–8):1917–1928
Motwani M, Arora N, Gupta A (2019) A Study on Initial Centroids Selection for Partitional Clustering Algorithms, in Software Engineering. p. 211–220
Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. in SIGKDD, p. 977–986
Park S, Zhao H (2018) Spectral clustering based on learning similarity matrix. Bioinformatics 34(12):2069–2076
Pavan KK, Rao AD, Sridhar G (2010) Single pass seed selection algorithm for k-means. J Comput Sci 6(1):60–66
Radhakrishna V et al (2018) A novel fuzzy similarity measure and prevalence estimation approach for similarity profiled temporal association pattern mining. Futur Gener Comput Syst 83:582–595
Rasmussen CE (2000) The infinite Gaussian mixture model. in NIPS, p.554–560
Rong H et al (2018) A novel subgraph K+-isomorphism method in social network based on graph similarity detection. Soft Comput 22(8):2583–2601
Satsiou A, Vrochidis S, Kompatsiaris I (2018) A Hybrid Recommendation System Based on Density-Based Clustering. in INSCI 2018
Saxena A et al (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Shah SA, Koltun V (2017) Robust continuous clustering. Proc Natl Acad Sci 114(37):9814–9819
Sharan R, Shamir R (2000) CLICK: a clustering algorithm with applications to gene expression analysis. in ICISMB. 8(307), p. 307–316
Silva FB et al (2018) Graph-based bag-of-words for classification. Pattern Recogn 74:266–285
Singh A, A Yadav, Rana A (2013) K-means with Three different Distance Metrics. International Journal of Computer Applications, 67(10)
Song J et al (2018) From deterministic to generative: multimodal stochastic RNNs for video captioning. IEEE transactions on neural networks and learning systems. https://doi.org/10.1109/TNNLS.2018.2851077
Voloshinov VV (2018) A generalization of the Karush–Kuhn–Tucker theorem for approximate solutions of mathematical programming problems based on quadratic approximation. Comput Math Math Phys 58(3):364–377
Wang J, et al (2015) Fast Approximate K-Means via Cluster Closures, in MDMA. p. 373–395
Wang C et al (2018) Multiple kernel clustering with global and local structure alignment. IEEE Access 6:77911–77920
Wong KC (2015) A short survey on data clustering algorithms. in ISCMI
Wu S, Feng X, Zhou W (2014) Spectral clustering of high-dimensional data exploiting sparse representation vectors. Neurocomputing 135:229–239
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Annals of Data Science 2(2):165–193
Xu X, et al. (1998) A distribution-based clustering algorithm for mining in large spatial databases. in ICDE, p. 324–331
Yan Q et al (2019) A discriminated similarity matrix construction based on sparse subspace clustering algorithm for hyperspectral imagery. Cogn Syst Res 53:98–110
Zahra S et al (2015) Novel centroid selection approaches for KMeans-clustering based recommender systems. Inf Sci 320:156–189
Zheng W et al (2018) Unsupervised feature selection by self-paced learning regularization. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2018.06.029
Zheng W et al (2018) Dynamic graph learning for spectral feature selection. Multimed Tools Appl 77(22):29739–29755
Zhou X et al (2018) Graph convolutional network hashing. IEEE transactions on cybernetics. https://doi.org/10.1109/TCYB.2018.2883970
Zhu X et al (2017) Graph PCA hashing for similarity search. IEEE Transactions on Multimedia 19(9):2033–2044
Zhu X et al (2018) Low-rank sparse subspace for spectral clustering. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2018.2858782
Zhu X et al (2018) One-step multi-view spectral clustering. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2018.2873378
Funding
This work was partially supported by the Research Fund of Guangxi Key Lab of Multi-source Information Mining & Security (MIMS18-M-01), the Natural Science Foundation of China (Grants No: 61876046 and 61573270); the Guangxi High Institutions Program of Introducing 100 High-Level Overseas Talents; the Strategic Research Excellence Fund at Massey University, and the Marsden Fund of New Zealand (Grant No: MAU1721).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, T., Zhu, J., Zhou, J. et al. Initialization-similarity clustering algorithm. Multimed Tools Appl 78, 33279–33296 (2019). https://doi.org/10.1007/s11042-019-7663-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7663-8