Determine Optimal Number of Clusters with an Elitist Evolutionary Approach

Boudjeloud-Assala, Lydia; Thuy, Ta Minh

doi:10.1007/978-3-319-06605-9_27

Determine Optimal Number of Clusters with an Elitist Evolutionary Approach

Lydia Boudjeloud-Assala²³ &
Ta Minh Thuy²³

Conference paper

4048 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8444))

Abstract

This article proposes an elitist evolutionary approach to determine the optimal number of clusters for clustering data sets. The proposed method is based on the cluster number optimization and in the same time, finds the potential clusters seeds. This method can be used as an initialization of k-means algorithm or directly as a clustering algorithm without prior knowledge of the clusters number. In this approach, elitist population is composed of the individuals with potential clusters seeds. We introduce a new mutation strategy according to the neighborhood search and new evaluation criteria. This strategy allows us to find the global optimal solution or near-optimal solution for clustering tasks, precisely finding the optimal clusters seeds. The experimental results show that our algorithm performs well on multi-class and large-size data sets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)
Article Google Scholar
Mao, J., Jain, A.K.: A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Transacations on Neural Networks 7(1), 16–29 (1996)
Article Google Scholar
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Journal Expert Systems with Applications: An International Journal Archive 1(40), 200–210 (2013)
Article Google Scholar
Babu, G.P., Murty, M.N.: Simulated annealing for selecting optimal initial seeds in the k-means algorithm. Indian Journal of Pure and Applied Mathematics 25(1-2), 85–94 (1994)
MATH Google Scholar
Babu, G.P., Murty, M.N.: A near-optimal initial seed value selection in k-means algorithm using a genetic algorithm. Pattern Recognition Letters 14(10), 763–769 (1993)
Article MATH Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Article Google Scholar
Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Transactions on Communications 28(1), 84–95 (1980)
Article Google Scholar
Huang, C.M., Harris, R.W.: A comparison of several vector quantization codebook generation approaches. IEEE Transactions on Image Processing 2(1), 108–112 (1993)
Article Google Scholar
Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Mathematical Programming, 1–26 (2010)
Google Scholar
Sarma, J., De, J.: Generation gap methods. In: Handbook of Evolutionary Computation, vol. 2(7), pp. 1–5 (1997)
Google Scholar
Qasem, S.N., Shamsuddin, S.M.: Memetic Elitist Pareto Differential Evolution algorithm based Radial Basis Function Networks for classification problems. Original Research Article Applied Soft Computing 8(11), 5565–5581 (2011)
Article Google Scholar
Das, S., Abraham, A., Konar, A.: Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm. Pattern Recognition Letters 5(29), 688–699 (2008)
Article Google Scholar
Gou, S., Zhuang, X., Li, Y., Xu, C., Jiao, L.C.: Multi-elitist immune clonal quantum clustering algorithm. Neurocomputing 101(4), 275–289 (2013)
Article Google Scholar
Milligan, G., Cooper, M.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)
Article Google Scholar
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics Simulation and Computation 3(1), 1–27 (1974)
Article MATH MathSciNet Google Scholar
Palshikar, G.: Simple algorithms for peak detection in time-series. In: Proceedings of 1st International Conference on Advanced Data Analysis Business Analytics and Intelligence (2009)
Google Scholar
Radcliffe, N.J.: Equivalence class analysis and presentation of strong rules. In: Knowledge Discovery in Database, vol. 11, pp. 229–248 (1991)
Google Scholar
Blake, C.L., Merz, C.J.: UCI repository of machine learning databses. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://archive.ics.uci.edu/ml/datasets.html (accessed on May 2013)
Carr, D.B., Littlefield, R.J., Nicholson, W.L.: Scatter-plot matrix techniques for large N. Journal of the American Statistical Association 82(398), 424–436 (1987)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Theoretical and Applied Computer Science, LITA EA 3097, University of Lorraine, Ile du Saulcy, Metz, F-57045, France
Lydia Boudjeloud-Assala & Ta Minh Thuy

Authors

Lydia Boudjeloud-Assala
View author publications
You can also search for this author in PubMed Google Scholar
Ta Minh Thuy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Cheng Kung University, Tainan, Taiwan, R.O.C.
Vincent S. Tseng & Hung-Yu Kao &
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Tu Bao Ho
Nanjing University, China
Zhi-Hua Zhou
National Chengchi University, Taipei, Taiwan, R.O.C.
Arbee L. P. Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boudjeloud-Assala, L., Thuy, T.M. (2014). Determine Optimal Number of Clusters with an Elitist Evolutionary Approach. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-06605-9_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06604-2
Online ISBN: 978-3-319-06605-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics