Skip to main content

Advertisement

Log in

Seed selection algorithm through K-means on optimal number of clusters

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Clustering is one of the important unsupervised learning in data mining to group the similar features. The growing point of the cluster is known as a seed. To select the appropriate seed of a cluster is an important criterion of any seed based clustering technique. The performance of seed based algorithms are dependent on initial cluster center selection and the optimal number of clusters in an unknown data set. Cluster quality and an optimal number of clusters are the important issues in cluster analysis. In this paper, the proposed seed point selection algorithm has been applied to 3 band image data and 2D discrete data. This algorithm selects the seed point using the concept of maximization of the joint probability of pixel intensities with the distance restriction criteria. The optimal number of clusters has been decided on the basis of the combination of seven different cluster validity indices. We have also compared the results of our proposed seed selection algorithm on an optimal number of clusters using K-Means clustering with other classical seed selection algorithms applied through K-Means Clustering in terms of seed generation time (SGT), cluster building Time (CBT), segmentation entropy and the number of iterations (NOTKmeans). We have also made the analysis of CPU time and no. of iterations of our proposed seed selection method with other clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

References

  1. Al Malki A, Rizk MM, El-Shorbagy M, Mousa A (2016) Hybrid genetic algorithm with k-means for clustering problems. Open J Optim 5(02):71

    Article  Google Scholar 

  2. Alswaitti M, Albughdadi M, Isa NAM (2018) Density-based particle swarm optimization algorithm for data clustering. Expert Syst Appl 91:170–186

    Article  Google Scholar 

  3. Arifin AZ, Asano A (2006) Image segmentation by histogram thresholding using hierarchical cluster analysis. Pattern Recogn Lett 27(13):1515–1521

    Article  Google Scholar 

  4. Astrahan M (1970) Speech analysis by clustering or the hyperphoneme method. Tech. rep., STANFORD UNIV CA DEPT OF COMPUTER SCIENCE

  5. Bai L, Liang J, Dang C, Cao F (2012) A cluster centers initialization method for clustering categorical data. Expert Syst Appl 39(9):8022–8029

    Article  Google Scholar 

  6. Ball GH, Hall DJ (1965) Isodata a novel method of data analysis and pattern classification. Tech. rep., Stanford Research Institute, Menlo Park CA

  7. Bandyopadhyay O, Chanda B, Bhattacharya BB (2016) Automatic segmentation of bones in x-ray images based on entropy measure. Int J Image Graph 16(1):1650,001

    Article  MathSciNet  Google Scholar 

  8. Bezdek JC (1974) Numerical taxonomy with fuzzy sets. J Math Biol 1(1):57–71

    Article  MathSciNet  MATH  Google Scholar 

  9. Bhattacharya A, De RK (2008) Divisive correlation clustering algorithm (dcca) for grouping of genes: detecting varying patterns in expression profiles. Bioinformatics 24(11):1359–1366

    Article  Google Scholar 

  10. Bhusare BB, Bansode S (2014) Centroids initialization for k-means clustering using improved pillar algorithm. Int J Adv Res Comput Eng Technol 3(4):1317–1322

    Google Scholar 

  11. Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat-Theory Methods 3(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  12. Cao F, Liang J, Jiang G (2009) An initialization method for the k-means algorithm using neighborhood model. Comput Math Appl 58(3):474–483

    Article  MathSciNet  MATH  Google Scholar 

  13. Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40 (1):200–210

    Article  Google Scholar 

  14. Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13(2):195–212

    Article  MathSciNet  MATH  Google Scholar 

  15. Chaudhuri BB (1994) How to choose a representative subset from a set of data in multi-dimensional space. Pattern Recogn Lett 15(9):893–899

    Article  Google Scholar 

  16. Chaudhuri D (1994) Some studies on density estimation and data clustering techniques. PhD thesis, ISI, Calcutta

  17. Chaudhuri D, Chaudhuri BB (1997) A novel multiseed nonhierarchical data clustering technique. IEEE Trans Syst Man Cybern Part B (Cybernetics) 27(5):871–876

    Article  Google Scholar 

  18. Chaudhuri D, Murthy CA, Chaudhuri BB (1994) Finding a subset of representative points in a data set. IEEE Trans Syst Man Cybern 24(9):1416–1424

    Article  Google Scholar 

  19. Chen K, Liu L (2005) The “best k” for entropy-based categorical data clustering

  20. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Transactions on Pattern Analysis And Machine Intelligence (2):224–227

  21. Fahim A, Salem A, Torkey FA, Ramadan M (2006) An efficient enhanced k-means clustering algorithm. J Zheijang Univ Sci A 7(10):1626–1633

    Article  MATH  Google Scholar 

  22. Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769

    Google Scholar 

  23. Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306

    Article  MathSciNet  MATH  Google Scholar 

  24. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  25. Jain AK, Dubes RC (1988) Algorithms for clustering data

  26. Kalyani S, Swarup KS (2011) Particle swarm optimization based k-means clustering approach for security assessment in power systems. Expert systems with applications 38(9):10, 839–10, 846

    Article  Google Scholar 

  27. Kim DJ, Park YW, Park DJ (2001) A novel validity index for determination of the optimal number of clusters. IEICE Trans Inf Syst 84(2):281–285

    Google Scholar 

  28. Kim DW, Lee KH, Lee D (2004) On cluster validity index for estimation of the optimal number of fuzzy clusters. Pattern Recogn 37(10):2009–2025

    Article  Google Scholar 

  29. Kodabagi M, Hanji SS, Hanji SV (2014) Application of enhanced clustering technique using similarity measure for market segmentation. Computer Science & Information Technology : 15

  30. Kumar Y, Sahoo G (2014) A new initialization method to originate initial cluster centers for k-means algorithm. Int J Adv Sci Technol 62:43–54

    Article  Google Scholar 

  31. Liu Z, Zheng Q, Xue L, Guan X (2012) A distributed energy-efficient clustering algorithm with improved coverage in wireless sensor networks. Futur Gener Comput Syst 28(5):780–790

    Article  Google Scholar 

  32. Lu JF, Tang J, Tang ZM, Yang JY (2008) Hierarchical initialization approach for k-means clustering. Pattern Recogn Lett 29(6):787–795

    Article  Google Scholar 

  33. MacQueen J, et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, vol 1, pp 281–297

  34. Milligan GW (1981) A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2):187–199

    Article  MATH  Google Scholar 

  35. Nazeer KA, Sebastian M (2009) Improving the accuracy and efficiency of the k-means clustering algorithm. In: Proceedings of the world congress on engineering, vol 1, pp 1–3

  36. Nie L, Zhang L, Yan Y, Chang X, Liu M, Shaoling L (2017) Multiview physician-specific attributes fusion for health seeking. IEEE Trans Cybern 47(11):3680–3691

    Article  Google Scholar 

  37. Oyelade O, Oladipupo O, Obagbuwa I (2010) Application of k means clustering algorithm for prediction of students academic performance. arXiv:10022425

  38. Pal SK, Pramanik P (1986) Fuzzy measures in determining seed points in clustering

  39. Pol DUR (2014) Enhancing k-means clustering algorithm and proposed parallel k-means clustering for large data sets. International Journal of Advanced Research in Computer Science and Software Engineering 4(5)

  40. Purohit P, Joshi R (2013) An efficient approach towards k-means clustering algorithm. Int J Comput Sci Commun Netw 4(3):125–129

    Google Scholar 

  41. Reddy CK, Vinzamuri B (2013) A survey of partitional and hierarchical clustering algorithms. Data Clustering: Algorithms and Applications. 87

  42. Reddy D, Jana PK, et al. (2012) Initialization for k-means clustering using voronoi diagram. Procedia Technology 4:395–400

    Article  Google Scholar 

  43. Sardar TH, Faizabadi AR, Ansari Z (2017) An evaluation of mapreduce framework in cluster analysis. In: 2017 international conference on intelligent computing, instrumentation and control technologies (ICICICT). IEEE, pp 110–114

  44. Shafeeq A, Hareesha K (2012) Dynamic clustering of data with modified k-means algorithm. In: Proceedings of the 2012 conference on information and computer networks, pp 221–225

  45. Singh D, Reddy CK (2015) A survey on platforms for big data analytics. J Big Data 2(1):8

    Article  Google Scholar 

  46. Tian J, Zhu L, Zhang S, Liu L (2005) Improvement and parallelism of k-means clustering algorithm. Tsinghua Sci Technol 10(3):277–281

    Article  MathSciNet  Google Scholar 

  47. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63(2):411–423

    Article  MathSciNet  MATH  Google Scholar 

  48. Tou JT (1974) Pattern recognition principle. Appl Math Comput 7:75–109

    MathSciNet  Google Scholar 

  49. Tzortzis G, Likas A (2014) The minmax k-means clustering algorithm. Pattern Recogn 47(7):2505–2516

    Article  Google Scholar 

  50. Villmann T, Albani C (2001) Clustering of categoric data in medicine—application of evolutionary algorithms. In: International conference on computational intelligence. Springer, pp 619–627

  51. Wang Q, Megalooikonomou V (2005) A clustering algorithm for intrusion detection. In: Data mining, intrusion detection, information assurance, and data networks security 2005, international society for optics and photonics, vol 5812, pp 31–39

  52. Wang X, Bai Y (2016) A modified minmax-means algorithm based on pso. Comput Intell Neurosci 2016

  53. Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence (8):841–847

  54. Xiuchang H, Su W (2014) An improved k-means clustering algorithm. J Net 9(1):161

    Google Scholar 

  55. Yedla M, Pathakota SR, Srinivasa T (2010) Enhancing k-means clustering algorithm with improved initial center. Int J Comput Sci Inf Technol 1(2):121–125

    Google Scholar 

  56. Zahra S, Ghazanfar MA, Khalid A, Azam MA, Naeem U, Prugel-Bennett A (2015) Novel centroid selection approaches for kmeans-clustering based recommender systems. Inform Sci 320:156–189

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuntal Chowdhury.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chowdhury, K., Chaudhuri, D., Pal, A.K. et al. Seed selection algorithm through K-means on optimal number of clusters. Multimed Tools Appl 78, 18617–18651 (2019). https://doi.org/10.1007/s11042-018-7100-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-7100-4

Keywords