Skip to main content

Determine Optimal Number of Clusters with an Elitist Evolutionary Approach

  • Conference paper
  • 4048 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8444))

Abstract

This article proposes an elitist evolutionary approach to determine the optimal number of clusters for clustering data sets. The proposed method is based on the cluster number optimization and in the same time, finds the potential clusters seeds. This method can be used as an initialization of k-means algorithm or directly as a clustering algorithm without prior knowledge of the clusters number. In this approach, elitist population is composed of the individuals with potential clusters seeds. We introduce a new mutation strategy according to the neighborhood search and new evaluation criteria. This strategy allows us to find the global optimal solution or near-optimal solution for clustering tasks, precisely finding the optimal clusters seeds. The experimental results show that our algorithm performs well on multi-class and large-size data sets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)

    Article  Google Scholar 

  2. Mao, J., Jain, A.K.: A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Transacations on Neural Networks 7(1), 16–29 (1996)

    Article  Google Scholar 

  3. Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Journal Expert Systems with Applications: An International Journal Archive 1(40), 200–210 (2013)

    Article  Google Scholar 

  4. Babu, G.P., Murty, M.N.: Simulated annealing for selecting optimal initial seeds in the k-means algorithm. Indian Journal of Pure and Applied Mathematics 25(1-2), 85–94 (1994)

    MATH  Google Scholar 

  5. Babu, G.P., Murty, M.N.: A near-optimal initial seed value selection in k-means algorithm using a genetic algorithm. Pattern Recognition Letters 14(10), 763–769 (1993)

    Article  MATH  Google Scholar 

  6. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  7. Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Transactions on Communications 28(1), 84–95 (1980)

    Article  Google Scholar 

  8. Huang, C.M., Harris, R.W.: A comparison of several vector quantization codebook generation approaches. IEEE Transactions on Image Processing 2(1), 108–112 (1993)

    Article  Google Scholar 

  9. Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Mathematical Programming, 1–26 (2010)

    Google Scholar 

  10. Sarma, J., De, J.: Generation gap methods. In: Handbook of Evolutionary Computation, vol. 2(7), pp. 1–5 (1997)

    Google Scholar 

  11. Qasem, S.N., Shamsuddin, S.M.: Memetic Elitist Pareto Differential Evolution algorithm based Radial Basis Function Networks for classification problems. Original Research Article Applied Soft Computing 8(11), 5565–5581 (2011)

    Article  Google Scholar 

  12. Das, S., Abraham, A., Konar, A.: Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm. Pattern Recognition Letters 5(29), 688–699 (2008)

    Article  Google Scholar 

  13. Gou, S., Zhuang, X., Li, Y., Xu, C., Jiao, L.C.: Multi-elitist immune clonal quantum clustering algorithm. Neurocomputing 101(4), 275–289 (2013)

    Article  Google Scholar 

  14. Milligan, G., Cooper, M.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)

    Article  Google Scholar 

  15. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics Simulation and Computation 3(1), 1–27 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  16. Palshikar, G.: Simple algorithms for peak detection in time-series. In: Proceedings of 1st International Conference on Advanced Data Analysis Business Analytics and Intelligence (2009)

    Google Scholar 

  17. Radcliffe, N.J.: Equivalence class analysis and presentation of strong rules. In: Knowledge Discovery in Database, vol. 11, pp. 229–248 (1991)

    Google Scholar 

  18. Blake, C.L., Merz, C.J.: UCI repository of machine learning databses. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://archive.ics.uci.edu/ml/datasets.html (accessed on May 2013)

  19. Carr, D.B., Littlefield, R.J., Nicholson, W.L.: Scatter-plot matrix techniques for large N. Journal of the American Statistical Association 82(398), 424–436 (1987)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Boudjeloud-Assala, L., Thuy, T.M. (2014). Determine Optimal Number of Clusters with an Elitist Evolutionary Approach. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06605-9_27

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06604-2

  • Online ISBN: 978-3-319-06605-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics