Skip to main content
Log in

Initialization-similarity clustering algorithm

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Classic k-means clustering algorithm randomly selects centroids for initialization to possibly output unstable clustering results. Moreover, random initialization makes the clustering result hard to reproduce. Spectral clustering algorithm is a two-step strategy, which first generates a similarity matrix and then conducts eigenvalue decomposition on the Laplacian matrix of the similarity matrix to obtain the spectral representation. However, the goal of the first step in the spectral clustering algorithm does not guarantee the best clustering result. To address the above issues, this paper proposes an Initialization-Similarity (IS) algorithm which learns the similarity matrix and the new representation in a unified way and fixes initialization using the sum-of-norms regularization to make the clustering more robust. The experimental results on ten real-world benchmark datasets demonstrate that our IS clustering algorithm outperforms the comparison clustering algorithms in terms of three evaluation metrics for clustering algorithm including accuracy (ACC), normalized mutual information (NMI), and Purity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Ahmed T, Sarma M (2018) Locality sensitive hashing based space partitioning approach for indexing multidimensional feature vectors of fingerprint image data. IET Image Process 12(6):1056–1064

    Google Scholar 

  2. Ankerst M, et al (1999) OPTICS: ordering points to identify the clustering structure. in ACM Sigmod record. p. 49–60

  3. Barron JT (2017) A more general robust loss function. arXiv preprint arXiv:1701.03077

  4. Bian Z, Ishibuchi H, Wang S (2019) Joint learning of spectral clustering structure and fuzzy similarity matrix of data. IEEE Trans Fuzzy Syst 27(1):31–44

    Google Scholar 

  5. Bin Y et al (2018) Describing video with attention-based bidirectional LSTM. IEEE transactions on cybernetics. https://doi.org/10.1109/TCYB.2018.2831447

    Google Scholar 

  6. Black MJ, Rangarajan A (1996) On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. Int J Comput Vis 19(1):57–91

    Google Scholar 

  7. Bu Z et al (2018) GLEAM: a graph clustering framework based on potential game optimization for large-scale social networks. Knowl Inf Syst 55(3):741–770

    Google Scholar 

  8. Cherng JS, Lo MJ (2001) A hypergraph based clustering algorithm for spatial data sets. in ICDM, p. 83–90

  9. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619

    Google Scholar 

  10. Das A, Panigrahi P (2018) Normalized Laplacian spectrum of some subdivision-joins and R-joins of two regular graphs. AKCE International Journal of Graphs and Combinatorics 15(3):261–270

    MathSciNet  MATH  Google Scholar 

  11. Deelers S, Auwatanamongkol S (2007) Enhancing K-means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance. Int J Comput Sci 2(4):247–252

    Google Scholar 

  12. Doad PK, Mahip MB (2013) Survey on Clustering Algorithm & Diagnosing Unsupervised Anomalies for Network Security. International Journal of Current Engineering and Technology ISSN, p. 2277–410

  13. Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Transactions on Knowledge Discovery from Data (TKDD) 2(4):17

    Google Scholar 

  14. Duan Y, Liu Q, Xia S (2018) An improved initialization center k-means clustering algorithm based on distance and density in AIP: 1955(1), p. 040–046

  15. Estivill-Castro V, Lee I (2000) Amoeba: Hierarchical clustering based on spatial proximity using delaunay diagram. in ISSDH, p. 1–16

  16. Geman S, McClure DE (1987) Statistical methods for tomographic image reconstruction. Bulletin of the International statistical Institute 52(4):5–21

    MathSciNet  Google Scholar 

  17. Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366

    Google Scholar 

  18. Guha S, Rastogi R, Shim K (2001) Cure: an efficient clustering algorithm for large databases. Inf Syst 26(1):35–58

    MATH  Google Scholar 

  19. Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc: Ser C: Appl Stat 28(1):100–108

    MATH  Google Scholar 

  20. Hu H, et al (2014) Smooth representation clustering. in CV PR. p. 3834–3841

  21. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  22. Kang Z et al (2019) Low-rank kernel learning for graph-based clustering. Knowl-Based Syst 163:510–517

    Google Scholar 

  23. Karypis G, Han E-H, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75

    Google Scholar 

  24. Kuncheva LI, Vetrov DP (2006) Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans Pattern Anal Mach Intell 28(11):1798–1808

    Google Scholar 

  25. Lakshmi MA, Daniel GV, Rao DS (2019) Initial Centroids for K-Means Using Nearest Neighbors and Feature Means, in SCSP, p. 27–34

  26. Lei C, Zhu X (2018) Unsupervised feature selection via local structure learning and sparse learning. Multimed Tools Appl 77(22):29605–29622

    Google Scholar 

  27. Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461

    Google Scholar 

  28. Lindsten F, Ohlsson H, Ljung L (2011) Clustering using sum-of-norms regularization: With application to particle filter output computation. in SSP, p. 201–201

  29. Liu G et al (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184

    Google Scholar 

  30. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    MathSciNet  MATH  Google Scholar 

  31. Lu CY, et al (2012) Robust and efficient subspace segmentation via least squares regression. in ECCV. p. 347–360

    Google Scholar 

  32. Moftah HM et al (2014) Adaptive k-means clustering algorithm for MR breast image segmentation. Neural Comput & Applic 24(7–8):1917–1928

    Google Scholar 

  33. Motwani M, Arora N, Gupta A (2019) A Study on Initial Centroids Selection for Partitional Clustering Algorithms, in Software Engineering. p. 211–220

    Google Scholar 

  34. Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. in SIGKDD, p. 977–986

  35. Park S, Zhao H (2018) Spectral clustering based on learning similarity matrix. Bioinformatics 34(12):2069–2076

    Google Scholar 

  36. Pavan KK, Rao AD, Sridhar G (2010) Single pass seed selection algorithm for k-means. J Comput Sci 6(1):60–66

    Google Scholar 

  37. Radhakrishna V et al (2018) A novel fuzzy similarity measure and prevalence estimation approach for similarity profiled temporal association pattern mining. Futur Gener Comput Syst 83:582–595

    Google Scholar 

  38. Rasmussen CE (2000) The infinite Gaussian mixture model. in NIPS, p.554–560

  39. Rong H et al (2018) A novel subgraph K+-isomorphism method in social network based on graph similarity detection. Soft Comput 22(8):2583–2601

    Google Scholar 

  40. Satsiou A, Vrochidis S, Kompatsiaris I (2018) A Hybrid Recommendation System Based on Density-Based Clustering. in INSCI 2018

  41. Saxena A et al (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681

    Google Scholar 

  42. Shah SA, Koltun V (2017) Robust continuous clustering. Proc Natl Acad Sci 114(37):9814–9819

    Google Scholar 

  43. Sharan R, Shamir R (2000) CLICK: a clustering algorithm with applications to gene expression analysis. in ICISMB. 8(307), p. 307–316

  44. Silva FB et al (2018) Graph-based bag-of-words for classification. Pattern Recogn 74:266–285

    Google Scholar 

  45. Singh A, A Yadav, Rana A (2013) K-means with Three different Distance Metrics. International Journal of Computer Applications, 67(10)

    Google Scholar 

  46. Song J et al (2018) From deterministic to generative: multimodal stochastic RNNs for video captioning. IEEE transactions on neural networks and learning systems. https://doi.org/10.1109/TNNLS.2018.2851077

    Google Scholar 

  47. Voloshinov VV (2018) A generalization of the Karush–Kuhn–Tucker theorem for approximate solutions of mathematical programming problems based on quadratic approximation. Comput Math Math Phys 58(3):364–377

    MathSciNet  MATH  Google Scholar 

  48. Wang J, et al (2015) Fast Approximate K-Means via Cluster Closures, in MDMA. p. 373–395

  49. Wang C et al (2018) Multiple kernel clustering with global and local structure alignment. IEEE Access 6:77911–77920

    Google Scholar 

  50. Wong KC (2015) A short survey on data clustering algorithms. in ISCMI

  51. Wu S, Feng X, Zhou W (2014) Spectral clustering of high-dimensional data exploiting sparse representation vectors. Neurocomputing 135:229–239

    Google Scholar 

  52. Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Annals of Data Science 2(2):165–193

    MathSciNet  Google Scholar 

  53. Xu X, et al. (1998) A distribution-based clustering algorithm for mining in large spatial databases. in ICDE, p. 324–331

  54. Yan Q et al (2019) A discriminated similarity matrix construction based on sparse subspace clustering algorithm for hyperspectral imagery. Cogn Syst Res 53:98–110

    Google Scholar 

  55. Zahra S et al (2015) Novel centroid selection approaches for KMeans-clustering based recommender systems. Inf Sci 320:156–189

    MathSciNet  Google Scholar 

  56. Zheng W et al (2018) Unsupervised feature selection by self-paced learning regularization. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2018.06.029

  57. Zheng W et al (2018) Dynamic graph learning for spectral feature selection. Multimed Tools Appl 77(22):29739–29755

    Google Scholar 

  58. Zhou X et al (2018) Graph convolutional network hashing. IEEE transactions on cybernetics. https://doi.org/10.1109/TCYB.2018.2883970

  59. Zhu X et al (2017) Graph PCA hashing for similarity search. IEEE Transactions on Multimedia 19(9):2033–2044

    Google Scholar 

  60. Zhu X et al (2018) Low-rank sparse subspace for spectral clustering. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2018.2858782

    Google Scholar 

  61. Zhu X et al (2018) One-step multi-view spectral clustering. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2018.2873378

    Google Scholar 

Download references

Funding

This work was partially supported by the Research Fund of Guangxi Key Lab of Multi-source Information Mining & Security (MIMS18-M-01), the Natural Science Foundation of China (Grants No: 61876046 and 61573270); the Guangxi High Institutions Program of Introducing 100 High-Level Overseas Talents; the Strategic Research Excellence Fund at Massey University, and the Marsden Fund of New Zealand (Grant No: MAU1721).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng Zhu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Zhu, J., Zhou, J. et al. Initialization-similarity clustering algorithm. Multimed Tools Appl 78, 33279–33296 (2019). https://doi.org/10.1007/s11042-019-7663-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7663-8

Keywords

Navigation