Seed selection algorithm through K-means on optimal number of clusters

Chowdhury, Kuntal; Chaudhuri, Debasis; Pal, Arup Kumar; Samal, Ashok

doi:10.1007/s11042-018-7100-4

Seed selection algorithm through K-means on optimal number of clusters

Published: 30 January 2019

Volume 78, pages 18617–18651, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Kuntal Chowdhury¹,
Debasis Chaudhuri²,
Arup Kumar Pal¹ &
…
Ashok Samal³

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Clustering is one of the important unsupervised learning in data mining to group the similar features. The growing point of the cluster is known as a seed. To select the appropriate seed of a cluster is an important criterion of any seed based clustering technique. The performance of seed based algorithms are dependent on initial cluster center selection and the optimal number of clusters in an unknown data set. Cluster quality and an optimal number of clusters are the important issues in cluster analysis. In this paper, the proposed seed point selection algorithm has been applied to 3 band image data and 2D discrete data. This algorithm selects the seed point using the concept of maximization of the joint probability of pixel intensities with the distance restriction criteria. The optimal number of clusters has been decided on the basis of the combination of seven different cluster validity indices. We have also compared the results of our proposed seed selection algorithm on an optimal number of clusters using K-Means clustering with other classical seed selection algorithms applied through K-Means Clustering in terms of seed generation time (SGT), cluster building Time (CBT), segmentation entropy and the number of iterations (NOT_K−means). We have also made the analysis of CPU time and no. of iterations of our proposed seed selection method with other clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 13

Fig. 14

Fig. 15

Fig. 16

Fig. 17

Fig. 18

Seed Point Selection Algorithm in Clustering of Image Data

Initial Seed Selection for Mixed Data Using Modified K-means Clustering Algorithm

Article 26 September 2019

An entropy-based initialization method of K-means clustering on the optimal number of clusters

Article 10 November 2020

References

Al Malki A, Rizk MM, El-Shorbagy M, Mousa A (2016) Hybrid genetic algorithm with k-means for clustering problems. Open J Optim 5(02):71
Article Google Scholar
Alswaitti M, Albughdadi M, Isa NAM (2018) Density-based particle swarm optimization algorithm for data clustering. Expert Syst Appl 91:170–186
Article Google Scholar
Arifin AZ, Asano A (2006) Image segmentation by histogram thresholding using hierarchical cluster analysis. Pattern Recogn Lett 27(13):1515–1521
Article Google Scholar
Astrahan M (1970) Speech analysis by clustering or the hyperphoneme method. Tech. rep., STANFORD UNIV CA DEPT OF COMPUTER SCIENCE
Bai L, Liang J, Dang C, Cao F (2012) A cluster centers initialization method for clustering categorical data. Expert Syst Appl 39(9):8022–8029
Article Google Scholar
Ball GH, Hall DJ (1965) Isodata a novel method of data analysis and pattern classification. Tech. rep., Stanford Research Institute, Menlo Park CA
Bandyopadhyay O, Chanda B, Bhattacharya BB (2016) Automatic segmentation of bones in x-ray images based on entropy measure. Int J Image Graph 16(1):1650,001
Article MathSciNet Google Scholar
Bezdek JC (1974) Numerical taxonomy with fuzzy sets. J Math Biol 1(1):57–71
Article MathSciNet MATH Google Scholar
Bhattacharya A, De RK (2008) Divisive correlation clustering algorithm (dcca) for grouping of genes: detecting varying patterns in expression profiles. Bioinformatics 24(11):1359–1366
Article Google Scholar
Bhusare BB, Bansode S (2014) Centroids initialization for k-means clustering using improved pillar algorithm. Int J Adv Res Comput Eng Technol 3(4):1317–1322
Google Scholar
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat-Theory Methods 3(1):1–27
Article MathSciNet MATH Google Scholar
Cao F, Liang J, Jiang G (2009) An initialization method for the k-means algorithm using neighborhood model. Comput Math Appl 58(3):474–483
Article MathSciNet MATH Google Scholar
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40 (1):200–210
Article Google Scholar
Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13(2):195–212
Article MathSciNet MATH Google Scholar
Chaudhuri BB (1994) How to choose a representative subset from a set of data in multi-dimensional space. Pattern Recogn Lett 15(9):893–899
Article Google Scholar
Chaudhuri D (1994) Some studies on density estimation and data clustering techniques. PhD thesis, ISI, Calcutta
Chaudhuri D, Chaudhuri BB (1997) A novel multiseed nonhierarchical data clustering technique. IEEE Trans Syst Man Cybern Part B (Cybernetics) 27(5):871–876
Article Google Scholar
Chaudhuri D, Murthy CA, Chaudhuri BB (1994) Finding a subset of representative points in a data set. IEEE Trans Syst Man Cybern 24(9):1416–1424
Article Google Scholar
Chen K, Liu L (2005) The “best k” for entropy-based categorical data clustering
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Transactions on Pattern Analysis And Machine Intelligence (2):224–227
Fahim A, Salem A, Torkey FA, Ramadan M (2006) An efficient enhanced k-means clustering algorithm. J Zheijang Univ Sci A 7(10):1626–1633
Article MATH Google Scholar
Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769
Google Scholar
Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306
Article MathSciNet MATH Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data
Kalyani S, Swarup KS (2011) Particle swarm optimization based k-means clustering approach for security assessment in power systems. Expert systems with applications 38(9):10, 839–10, 846
Article Google Scholar
Kim DJ, Park YW, Park DJ (2001) A novel validity index for determination of the optimal number of clusters. IEICE Trans Inf Syst 84(2):281–285
Google Scholar
Kim DW, Lee KH, Lee D (2004) On cluster validity index for estimation of the optimal number of fuzzy clusters. Pattern Recogn 37(10):2009–2025
Article Google Scholar
Kodabagi M, Hanji SS, Hanji SV (2014) Application of enhanced clustering technique using similarity measure for market segmentation. Computer Science & Information Technology : 15
Kumar Y, Sahoo G (2014) A new initialization method to originate initial cluster centers for k-means algorithm. Int J Adv Sci Technol 62:43–54
Article Google Scholar
Liu Z, Zheng Q, Xue L, Guan X (2012) A distributed energy-efficient clustering algorithm with improved coverage in wireless sensor networks. Futur Gener Comput Syst 28(5):780–790
Article Google Scholar
Lu JF, Tang J, Tang ZM, Yang JY (2008) Hierarchical initialization approach for k-means clustering. Pattern Recogn Lett 29(6):787–795
Article Google Scholar
MacQueen J, et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, vol 1, pp 281–297
Milligan GW (1981) A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2):187–199
Article MATH Google Scholar
Nazeer KA, Sebastian M (2009) Improving the accuracy and efficiency of the k-means clustering algorithm. In: Proceedings of the world congress on engineering, vol 1, pp 1–3
Nie L, Zhang L, Yan Y, Chang X, Liu M, Shaoling L (2017) Multiview physician-specific attributes fusion for health seeking. IEEE Trans Cybern 47(11):3680–3691
Article Google Scholar
Oyelade O, Oladipupo O, Obagbuwa I (2010) Application of k means clustering algorithm for prediction of students academic performance. arXiv:10022425
Pal SK, Pramanik P (1986) Fuzzy measures in determining seed points in clustering
Pol DUR (2014) Enhancing k-means clustering algorithm and proposed parallel k-means clustering for large data sets. International Journal of Advanced Research in Computer Science and Software Engineering 4(5)
Purohit P, Joshi R (2013) An efficient approach towards k-means clustering algorithm. Int J Comput Sci Commun Netw 4(3):125–129
Google Scholar
Reddy CK, Vinzamuri B (2013) A survey of partitional and hierarchical clustering algorithms. Data Clustering: Algorithms and Applications. 87
Reddy D, Jana PK, et al. (2012) Initialization for k-means clustering using voronoi diagram. Procedia Technology 4:395–400
Article Google Scholar
Sardar TH, Faizabadi AR, Ansari Z (2017) An evaluation of mapreduce framework in cluster analysis. In: 2017 international conference on intelligent computing, instrumentation and control technologies (ICICICT). IEEE, pp 110–114
Shafeeq A, Hareesha K (2012) Dynamic clustering of data with modified k-means algorithm. In: Proceedings of the 2012 conference on information and computer networks, pp 221–225
Singh D, Reddy CK (2015) A survey on platforms for big data analytics. J Big Data 2(1):8
Article Google Scholar
Tian J, Zhu L, Zhang S, Liu L (2005) Improvement and parallelism of k-means clustering algorithm. Tsinghua Sci Technol 10(3):277–281
Article MathSciNet Google Scholar
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63(2):411–423
Article MathSciNet MATH Google Scholar
Tou JT (1974) Pattern recognition principle. Appl Math Comput 7:75–109
MathSciNet Google Scholar
Tzortzis G, Likas A (2014) The minmax k-means clustering algorithm. Pattern Recogn 47(7):2505–2516
Article Google Scholar
Villmann T, Albani C (2001) Clustering of categoric data in medicine—application of evolutionary algorithms. In: International conference on computational intelligence. Springer, pp 619–627
Wang Q, Megalooikonomou V (2005) A clustering algorithm for intrusion detection. In: Data mining, intrusion detection, information assurance, and data networks security 2005, international society for optics and photonics, vol 5812, pp 31–39
Wang X, Bai Y (2016) A modified minmax-means algorithm based on pso. Comput Intell Neurosci 2016
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence (8):841–847
Xiuchang H, Su W (2014) An improved k-means clustering algorithm. J Net 9(1):161
Google Scholar
Yedla M, Pathakota SR, Srinivasa T (2010) Enhancing k-means clustering algorithm with improved initial center. Int J Comput Sci Inf Technol 1(2):121–125
Google Scholar
Zahra S, Ghazanfar MA, Khalid A, Azam MA, Naeem U, Prugel-Bennett A (2015) Novel centroid selection approaches for kmeans-clustering based recommender systems. Inform Sci 320:156–189
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology (Indian School of Mines)[IIT(ISM)], Dhanbad, Jharkhand, India
Kuntal Chowdhury & Arup Kumar Pal
DRDO Integration Centre Panagarh WestBengal, WestBengal, India
Debasis Chaudhuri
Department of Computer Science and Engineering, University of Nebraska, Linclon, UK
Ashok Samal

Authors

Kuntal Chowdhury
View author publications
You can also search for this author inPubMed Google Scholar
Debasis Chaudhuri
View author publications
You can also search for this author inPubMed Google Scholar
Arup Kumar Pal
View author publications
You can also search for this author inPubMed Google Scholar
Ashok Samal
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Kuntal Chowdhury.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chowdhury, K., Chaudhuri, D., Pal, A.K. et al. Seed selection algorithm through K-means on optimal number of clusters. Multimed Tools Appl 78, 18617–18651 (2019). https://doi.org/10.1007/s11042-018-7100-4

Download citation

Received: 08 March 2018
Revised: 06 December 2018
Accepted: 18 December 2018
Published: 30 January 2019
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s11042-018-7100-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Seed selection algorithm through K-means on optimal number of clusters

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Seed Point Selection Algorithm in Clustering of Image Data

Initial Seed Selection for Mixed Data Using Modified K-means Clustering Algorithm

An entropy-based initialization method of K-means clustering on the optimal number of clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now