Abstract
Clustering is one of the important unsupervised learning in data mining to group the similar features. The growing point of the cluster is known as a seed. To select the appropriate seed of a cluster is an important criterion of any seed based clustering technique. The performance of seed based algorithms are dependent on initial cluster center selection and the optimal number of clusters in an unknown data set. Cluster quality and an optimal number of clusters are the important issues in cluster analysis. In this paper, the proposed seed point selection algorithm has been applied to 3 band image data and 2D discrete data. This algorithm selects the seed point using the concept of maximization of the joint probability of pixel intensities with the distance restriction criteria. The optimal number of clusters has been decided on the basis of the combination of seven different cluster validity indices. We have also compared the results of our proposed seed selection algorithm on an optimal number of clusters using K-Means clustering with other classical seed selection algorithms applied through K-Means Clustering in terms of seed generation time (SGT), cluster building Time (CBT), segmentation entropy and the number of iterations (NOTK−means). We have also made the analysis of CPU time and no. of iterations of our proposed seed selection method with other clustering algorithms.























Similar content being viewed by others
References
Al Malki A, Rizk MM, El-Shorbagy M, Mousa A (2016) Hybrid genetic algorithm with k-means for clustering problems. Open J Optim 5(02):71
Alswaitti M, Albughdadi M, Isa NAM (2018) Density-based particle swarm optimization algorithm for data clustering. Expert Syst Appl 91:170–186
Arifin AZ, Asano A (2006) Image segmentation by histogram thresholding using hierarchical cluster analysis. Pattern Recogn Lett 27(13):1515–1521
Astrahan M (1970) Speech analysis by clustering or the hyperphoneme method. Tech. rep., STANFORD UNIV CA DEPT OF COMPUTER SCIENCE
Bai L, Liang J, Dang C, Cao F (2012) A cluster centers initialization method for clustering categorical data. Expert Syst Appl 39(9):8022–8029
Ball GH, Hall DJ (1965) Isodata a novel method of data analysis and pattern classification. Tech. rep., Stanford Research Institute, Menlo Park CA
Bandyopadhyay O, Chanda B, Bhattacharya BB (2016) Automatic segmentation of bones in x-ray images based on entropy measure. Int J Image Graph 16(1):1650,001
Bezdek JC (1974) Numerical taxonomy with fuzzy sets. J Math Biol 1(1):57–71
Bhattacharya A, De RK (2008) Divisive correlation clustering algorithm (dcca) for grouping of genes: detecting varying patterns in expression profiles. Bioinformatics 24(11):1359–1366
Bhusare BB, Bansode S (2014) Centroids initialization for k-means clustering using improved pillar algorithm. Int J Adv Res Comput Eng Technol 3(4):1317–1322
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat-Theory Methods 3(1):1–27
Cao F, Liang J, Jiang G (2009) An initialization method for the k-means algorithm using neighborhood model. Comput Math Appl 58(3):474–483
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40 (1):200–210
Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13(2):195–212
Chaudhuri BB (1994) How to choose a representative subset from a set of data in multi-dimensional space. Pattern Recogn Lett 15(9):893–899
Chaudhuri D (1994) Some studies on density estimation and data clustering techniques. PhD thesis, ISI, Calcutta
Chaudhuri D, Chaudhuri BB (1997) A novel multiseed nonhierarchical data clustering technique. IEEE Trans Syst Man Cybern Part B (Cybernetics) 27(5):871–876
Chaudhuri D, Murthy CA, Chaudhuri BB (1994) Finding a subset of representative points in a data set. IEEE Trans Syst Man Cybern 24(9):1416–1424
Chen K, Liu L (2005) The “best k” for entropy-based categorical data clustering
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Transactions on Pattern Analysis And Machine Intelligence (2):224–227
Fahim A, Salem A, Torkey FA, Ramadan M (2006) An efficient enhanced k-means clustering algorithm. J Zheijang Univ Sci A 7(10):1626–1633
Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769
Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Jain AK, Dubes RC (1988) Algorithms for clustering data
Kalyani S, Swarup KS (2011) Particle swarm optimization based k-means clustering approach for security assessment in power systems. Expert systems with applications 38(9):10, 839–10, 846
Kim DJ, Park YW, Park DJ (2001) A novel validity index for determination of the optimal number of clusters. IEICE Trans Inf Syst 84(2):281–285
Kim DW, Lee KH, Lee D (2004) On cluster validity index for estimation of the optimal number of fuzzy clusters. Pattern Recogn 37(10):2009–2025
Kodabagi M, Hanji SS, Hanji SV (2014) Application of enhanced clustering technique using similarity measure for market segmentation. Computer Science & Information Technology : 15
Kumar Y, Sahoo G (2014) A new initialization method to originate initial cluster centers for k-means algorithm. Int J Adv Sci Technol 62:43–54
Liu Z, Zheng Q, Xue L, Guan X (2012) A distributed energy-efficient clustering algorithm with improved coverage in wireless sensor networks. Futur Gener Comput Syst 28(5):780–790
Lu JF, Tang J, Tang ZM, Yang JY (2008) Hierarchical initialization approach for k-means clustering. Pattern Recogn Lett 29(6):787–795
MacQueen J, et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, vol 1, pp 281–297
Milligan GW (1981) A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2):187–199
Nazeer KA, Sebastian M (2009) Improving the accuracy and efficiency of the k-means clustering algorithm. In: Proceedings of the world congress on engineering, vol 1, pp 1–3
Nie L, Zhang L, Yan Y, Chang X, Liu M, Shaoling L (2017) Multiview physician-specific attributes fusion for health seeking. IEEE Trans Cybern 47(11):3680–3691
Oyelade O, Oladipupo O, Obagbuwa I (2010) Application of k means clustering algorithm for prediction of students academic performance. arXiv:10022425
Pal SK, Pramanik P (1986) Fuzzy measures in determining seed points in clustering
Pol DUR (2014) Enhancing k-means clustering algorithm and proposed parallel k-means clustering for large data sets. International Journal of Advanced Research in Computer Science and Software Engineering 4(5)
Purohit P, Joshi R (2013) An efficient approach towards k-means clustering algorithm. Int J Comput Sci Commun Netw 4(3):125–129
Reddy CK, Vinzamuri B (2013) A survey of partitional and hierarchical clustering algorithms. Data Clustering: Algorithms and Applications. 87
Reddy D, Jana PK, et al. (2012) Initialization for k-means clustering using voronoi diagram. Procedia Technology 4:395–400
Sardar TH, Faizabadi AR, Ansari Z (2017) An evaluation of mapreduce framework in cluster analysis. In: 2017 international conference on intelligent computing, instrumentation and control technologies (ICICICT). IEEE, pp 110–114
Shafeeq A, Hareesha K (2012) Dynamic clustering of data with modified k-means algorithm. In: Proceedings of the 2012 conference on information and computer networks, pp 221–225
Singh D, Reddy CK (2015) A survey on platforms for big data analytics. J Big Data 2(1):8
Tian J, Zhu L, Zhang S, Liu L (2005) Improvement and parallelism of k-means clustering algorithm. Tsinghua Sci Technol 10(3):277–281
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63(2):411–423
Tou JT (1974) Pattern recognition principle. Appl Math Comput 7:75–109
Tzortzis G, Likas A (2014) The minmax k-means clustering algorithm. Pattern Recogn 47(7):2505–2516
Villmann T, Albani C (2001) Clustering of categoric data in medicine—application of evolutionary algorithms. In: International conference on computational intelligence. Springer, pp 619–627
Wang Q, Megalooikonomou V (2005) A clustering algorithm for intrusion detection. In: Data mining, intrusion detection, information assurance, and data networks security 2005, international society for optics and photonics, vol 5812, pp 31–39
Wang X, Bai Y (2016) A modified minmax-means algorithm based on pso. Comput Intell Neurosci 2016
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence (8):841–847
Xiuchang H, Su W (2014) An improved k-means clustering algorithm. J Net 9(1):161
Yedla M, Pathakota SR, Srinivasa T (2010) Enhancing k-means clustering algorithm with improved initial center. Int J Comput Sci Inf Technol 1(2):121–125
Zahra S, Ghazanfar MA, Khalid A, Azam MA, Naeem U, Prugel-Bennett A (2015) Novel centroid selection approaches for kmeans-clustering based recommender systems. Inform Sci 320:156–189
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chowdhury, K., Chaudhuri, D., Pal, A.K. et al. Seed selection algorithm through K-means on optimal number of clusters. Multimed Tools Appl 78, 18617–18651 (2019). https://doi.org/10.1007/s11042-018-7100-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-7100-4