Abstract
This article proposes an elitist evolutionary approach to determine the optimal number of clusters for clustering data sets. The proposed method is based on the cluster number optimization and in the same time, finds the potential clusters seeds. This method can be used as an initialization of k-means algorithm or directly as a clustering algorithm without prior knowledge of the clusters number. In this approach, elitist population is composed of the individuals with potential clusters seeds. We introduce a new mutation strategy according to the neighborhood search and new evaluation criteria. This strategy allows us to find the global optimal solution or near-optimal solution for clustering tasks, precisely finding the optimal clusters seeds. The experimental results show that our algorithm performs well on multi-class and large-size data sets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)
Mao, J., Jain, A.K.: A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Transacations on Neural Networks 7(1), 16–29 (1996)
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Journal Expert Systems with Applications: An International Journal Archive 1(40), 200–210 (2013)
Babu, G.P., Murty, M.N.: Simulated annealing for selecting optimal initial seeds in the k-means algorithm. Indian Journal of Pure and Applied Mathematics 25(1-2), 85–94 (1994)
Babu, G.P., Murty, M.N.: A near-optimal initial seed value selection in k-means algorithm using a genetic algorithm. Pattern Recognition Letters 14(10), 763–769 (1993)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Transactions on Communications 28(1), 84–95 (1980)
Huang, C.M., Harris, R.W.: A comparison of several vector quantization codebook generation approaches. IEEE Transactions on Image Processing 2(1), 108–112 (1993)
Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Mathematical Programming, 1–26 (2010)
Sarma, J., De, J.: Generation gap methods. In: Handbook of Evolutionary Computation, vol. 2(7), pp. 1–5 (1997)
Qasem, S.N., Shamsuddin, S.M.: Memetic Elitist Pareto Differential Evolution algorithm based Radial Basis Function Networks for classification problems. Original Research Article Applied Soft Computing 8(11), 5565–5581 (2011)
Das, S., Abraham, A., Konar, A.: Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm. Pattern Recognition Letters 5(29), 688–699 (2008)
Gou, S., Zhuang, X., Li, Y., Xu, C., Jiao, L.C.: Multi-elitist immune clonal quantum clustering algorithm. Neurocomputing 101(4), 275–289 (2013)
Milligan, G., Cooper, M.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics Simulation and Computation 3(1), 1–27 (1974)
Palshikar, G.: Simple algorithms for peak detection in time-series. In: Proceedings of 1st International Conference on Advanced Data Analysis Business Analytics and Intelligence (2009)
Radcliffe, N.J.: Equivalence class analysis and presentation of strong rules. In: Knowledge Discovery in Database, vol. 11, pp. 229–248 (1991)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databses. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://archive.ics.uci.edu/ml/datasets.html (accessed on May 2013)
Carr, D.B., Littlefield, R.J., Nicholson, W.L.: Scatter-plot matrix techniques for large N. Journal of the American Statistical Association 82(398), 424–436 (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Boudjeloud-Assala, L., Thuy, T.M. (2014). Determine Optimal Number of Clusters with an Elitist Evolutionary Approach. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-06605-9_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06604-2
Online ISBN: 978-3-319-06605-9
eBook Packages: Computer ScienceComputer Science (R0)