Skip to main content

An Empirical Comparison Between K-Means, GAs and EDAs in Partitional Clustering

  • Chapter
Estimation of Distribution Algorithms

Part of the book series: Genetic Algorithms and Evolutionary Computation ((GENA,volume 2))

Abstract

In this chapter we empirically compare the performance of three different approaches to partitional clustering. These are iterative hillclimbing algorithms, genetic algorithms and estimation of distribution algorithms. Emphasis is placed on their ability to avoid local maxima and also on the simplicity of setting good parameters for them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alippi, C. and Cucchiara, R. (1992). Cluster partitioning in image analysis classification: a genetic algorithm approach. In Proc. CompEuro 92, pages 139–144. IEEE Computer Society Press.

    Article  Google Scholar 

  • Babu, G. P. and Murty, M. N. (1994). Clustering with evolution strategies. Pattern Recognition, 27(2):321–329.

    Article  Google Scholar 

  • Ball, G. H. and Hall, D. J. (1967). A clustering technique for summarizing multivariate data. Behavioral Science, 12:153–155.

    Article  Google Scholar 

  • Bandfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49:803–821.

    Article  MathSciNet  Google Scholar 

  • Bengoetxea, E., Larranaga, P., Bloch, I., Perchant, A., and Boeres, C. (2000). Inexact graph matching using learning and simulation of bayesian networks. an empirical comparison between different approaches with synthetic data. In Workshop Notes of CaNew2000: Workshop on Bayesian and Causal Networks: From Inference to Data Mining. Fourteenth European Conference on Artificial Intelligence, ECAI2000. Berlin.

    Google Scholar 

  • Bezdek, J. C., Boggavaparu, S., Hall, L. O., and Bensaid, A. (1994). Genetic algorithm guided clustering. In Fogel, D. B., editor, Proceedings of The First IEEE Conference on Evolutionary Computation, volume I, pages 34–40. IEEE Computer Society Press.

    Google Scholar 

  • Bhuyan, J. N., Raghavan, V. V., and Elayavalli, V. K. (1991). Genetic algorithms with an ordered representation. In Belew, R. and Booker, L. B., editors, Proc. of the Fourth International Conference on Genetic Algorithms, pages 408–415. Morgan Kaufmann.

    Google Scholar 

  • Bozdogan, H. (1994). Choosing the number of clusters, subset selection of variablesm and outlier detection in the standard mixture-model cluster analysis. In Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., and Burtschy, B., editors, New Approaches in Classification and Data Analysis,pages 169–177. Springer-Verlag.

    Google Scholar 

  • Celeux, G. and Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13(2):195–212.

    Article  MathSciNet  Google Scholar 

  • Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14:462–467.

    Article  MATH  Google Scholar 

  • Cucchiara, R. (1993). Analysis and comparison of different genetic models for the clustering problem in image analysis. In Albretch, R. F., Reeves, C. R., and Steele, N. C., editors, Artificial Neural Networks and Genetic Algorithms, pages 423–427. Springer-Verlag.

    Chapter  Google Scholar 

  • De Bonet, J. S., Isbell, C. L., and Viola, P. (1997). MIMIC: Finding optima by estimating probability densities.. Advances in Neural Information Processing Systems, Vol. 9.

    Google Scholar 

  • Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley.

    MATH  Google Scholar 

  • Etxeberria, R. and Larranaga, P. (1999). Global optimization with Bayesian networks. In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 1 332–339.

    Google Scholar 

  • Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). Knowledge discovery and data mining: towards a unifying framework. In Press, A., editor, Second International Conference on Knowledge Discovery and Data Mining, Portland OR.

    Google Scholar 

  • Fisher, D., Pazzani, M., and Langley, P. (1992a). Concept Formation: Knowledge and expertise on unsupervised learning. Morgan Kaufmman Publishers, Inc.

    Google Scholar 

  • Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.

    Google Scholar 

  • Fisher, D. H., Xu, L., and Zard, N. (1992b). Ordering effects in clustering. In Ninth International Conference on Machine Learning, pages 163–168.

    Google Scholar 

  • Forgy, E. W. (1965). Cluster analysis of multivariate data: efficency versus interpretability of classifications (abstract). Biometrics, 21:768–769.

    Google Scholar 

  • Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society, Series A, 150(2):119–137.

    MathSciNet  MATH  Google Scholar 

  • Hanson, R., Stutz, J., and Cheesman, P. (1990). Bayesian classification theory. Technical Report FIA-90–12–7–01, NASA, Ames Research Center.

    Google Scholar 

  • Hardy, A. (1994). An examination of procedures for determining the number of clusters in a data set. In Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., and Burtschy, B., editors, New Approaches in Classification and Data Analysis, pages 178–185. Springer-Verlag.

    Google Scholar 

  • Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall.

    MATH  Google Scholar 

  • Jones, D. R. and Beltramo, M. A. (1990). Clustering with genetic algorithms. Technical Report GMR-7156, Operating Sciences Department, General Motors Research Laboratories.

    Google Scholar 

  • Jones, D. R. and Beltramo, M. A. (1991). Solving partitioning problems with genetic algorithms. In Belew, R. and Booker, L. B., editors, Proc. of the Fourth International Conference on Genetic Algorithms, pages 442–449. Morgan Kaufmann.

    Google Scholar 

  • Langley, P. (1995). Order effects in incremental learning. In Reimann, P. and Spada, H., editors, Learning in humans and machines: Towards an Interdisciplinary Learning Science. Pergamon.

    Google Scholar 

  • Langley, P. (1998). Elements of Machine Learning. Series in Machine Learning. Morgan Kaufmann Publishers, Inc., San Francisco, California.

    Google Scholar 

  • Lozano, J. A. (1998). Genetic Algorithms Applied to Unsupervised Classification. PhD thesis, University of the Basque Country (In spanish).

    Google Scholar 

  • Lozano, J. A., Larranaga, P., and Grana, M. (1998). Partitional cluster analysis with genetic algorithms: searching for the number of clusters. In Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H., and Baba, Y., editors, Data Science,Classification and Related Methods, pages 117–125. Springer.

    Google Scholar 

  • Lucasius, C. B., Dane, A. D., and Kateman, G. (1993). On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasibility and comparison. Analytica Chimica Acta, 282:647–669.

    Article  Google Scholar 

  • Luchian, S., Luchian, H., and Petriuc, M. (1994). Evolutionary automated classification. In Fogel, D. B., editor, Proceedings of The First IEEE Conference on Evolutionary Computation, volume I, pages 585–589. IEEE Computer Society Press.

    Google Scholar 

  • MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of Fifth Berkeley Symposium, volume 2, pages 281–297.

    Google Scholar 

  • Maulik, U. and Bandyopdhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33:1455–1465.

    Article  Google Scholar 

  • Michalski, R. S. and Stepp, R. E. (1983). Learning from observation: Conceptual clustering. In Michalski, R. S., Carbonell, J. G., and Mitchell, T. M., editors, Machine learning: An artificial intelligence approach. Morgan Kaufmann, Los Altos, CA.

    Google Scholar 

  • Mühlenbein, H. and Paaß, G. (1996). From recombination of genes to the estimation of distributions i. binary parameters. In Lecture Notes in Computer Science 1411: Parallel Problem Solving from Nature - PPSN IV, pages 178–187.

    Google Scholar 

  • Murphy,P.M. and Aha,D.W. (1994).Uci repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html

  • Pelikan, M. and Goldberg, D. E. (2000). Genetic algorithms, clustering, and the breaking of symmetry. Technical Report I11i200013, University of Illinois at Urbana-Champaign, Illinois.

    Google Scholar 

  • Pena, J. M., Lozano, J. A., and Larranaga, P. (1999). An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recognition Letters, 20:1027–1040.

    Article  Google Scholar 

  • Raghavan, V. V. and Birchard, K. (1979). A clustering strategy based on a formalism of the reproductive process in natural systems. SIGIR Forum, 14:10–22.

    Article  Google Scholar 

  • Rasson, J. P. and Kubushishi, T. (1993). The gap test: an optimal method for determining the number of natural classes in cluster analysis. In Diday, E.,Lechevallier, Y., Schader, M., Bertrand, P., and Burtschy, B., editors, New Approaches in Classification and Data Analysis, pages 186–193. Springer-Verlag.

    Google Scholar 

  • Roure, J. and Talavera, L. (1998). Robust incremental clustering with bad instance orderings: a new strategy. In Coelho, H., editor, Progress in Artificial Intelligence—IBERAMIA 98, Sixth Ibero-American Conference on AI, pages 136–147. Springer.

    Google Scholar 

  • Sarkar, M., Yegnanarayana, B., and Khemani, D. (1997). A clustering algorithm using evolutionary programming-based approach. Pattern Recognition Letters, 18:975–986.

    Article  Google Scholar 

  • Syswerda, G. (1993). Simulated crossover in genetic algorithms. Foundations of Genetic Algorithms 2, pages 239–255.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Roure, J., Larrañaga, P., Sangüesa, R. (2002). An Empirical Comparison Between K-Means, GAs and EDAs in Partitional Clustering. In: Larrañaga, P., Lozano, J.A. (eds) Estimation of Distribution Algorithms. Genetic Algorithms and Evolutionary Computation, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1539-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-1539-5_17

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5604-2

  • Online ISBN: 978-1-4615-1539-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics