An Empirical Comparison Between K-Means, GAs and EDAs in Partitional Clustering

Roure, J.; Larrañaga, P.; Sangüesa, R.

doi:10.1007/978-1-4615-1539-5_17

J. Roure²,
P. Larrañaga³ &
R. Sangüesa⁴

Part of the book series: Genetic Algorithms and Evolutionary Computation ((GENA,volume 2))

590 Accesses
1 Citations

Abstract

In this chapter we empirically compare the performance of three different approaches to partitional clustering. These are iterative hillclimbing algorithms, genetic algorithms and estimation of distribution algorithms. Emphasis is placed on their ability to avoid local maxima and also on the simplicity of setting good parameters for them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alippi, C. and Cucchiara, R. (1992). Cluster partitioning in image analysis classification: a genetic algorithm approach. In Proc. CompEuro 92, pages 139–144. IEEE Computer Society Press.
Article Google Scholar
Babu, G. P. and Murty, M. N. (1994). Clustering with evolution strategies. Pattern Recognition, 27(2):321–329.
Article Google Scholar
Ball, G. H. and Hall, D. J. (1967). A clustering technique for summarizing multivariate data. Behavioral Science, 12:153–155.
Article Google Scholar
Bandfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49:803–821.
Article MathSciNet Google Scholar
Bengoetxea, E., Larranaga, P., Bloch, I., Perchant, A., and Boeres, C. (2000). Inexact graph matching using learning and simulation of bayesian networks. an empirical comparison between different approaches with synthetic data. In Workshop Notes of CaNew2000: Workshop on Bayesian and Causal Networks: From Inference to Data Mining. Fourteenth European Conference on Artificial Intelligence, ECAI2000. Berlin.
Google Scholar
Bezdek, J. C., Boggavaparu, S., Hall, L. O., and Bensaid, A. (1994). Genetic algorithm guided clustering. In Fogel, D. B., editor, Proceedings of The First IEEE Conference on Evolutionary Computation, volume I, pages 34–40. IEEE Computer Society Press.
Google Scholar
Bhuyan, J. N., Raghavan, V. V., and Elayavalli, V. K. (1991). Genetic algorithms with an ordered representation. In Belew, R. and Booker, L. B., editors, Proc. of the Fourth International Conference on Genetic Algorithms, pages 408–415. Morgan Kaufmann.
Google Scholar
Bozdogan, H. (1994). Choosing the number of clusters, subset selection of variablesm and outlier detection in the standard mixture-model cluster analysis. In Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., and Burtschy, B., editors, New Approaches in Classification and Data Analysis,pages 169–177. Springer-Verlag.
Google Scholar
Celeux, G. and Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13(2):195–212.
Article MathSciNet Google Scholar
Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14:462–467.
Article MATH Google Scholar
Cucchiara, R. (1993). Analysis and comparison of different genetic models for the clustering problem in image analysis. In Albretch, R. F., Reeves, C. R., and Steele, N. C., editors, Artificial Neural Networks and Genetic Algorithms, pages 423–427. Springer-Verlag.
Chapter Google Scholar
De Bonet, J. S., Isbell, C. L., and Viola, P. (1997). MIMIC: Finding optima by estimating probability densities.. Advances in Neural Information Processing Systems, Vol. 9.
Google Scholar
Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley.
MATH Google Scholar
Etxeberria, R. and Larranaga, P. (1999). Global optimization with Bayesian networks. In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 1 332–339.
Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). Knowledge discovery and data mining: towards a unifying framework. In Press, A., editor, Second International Conference on Knowledge Discovery and Data Mining, Portland OR.
Google Scholar
Fisher, D., Pazzani, M., and Langley, P. (1992a). Concept Formation: Knowledge and expertise on unsupervised learning. Morgan Kaufmman Publishers, Inc.
Google Scholar
Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.
Google Scholar
Fisher, D. H., Xu, L., and Zard, N. (1992b). Ordering effects in clustering. In Ninth International Conference on Machine Learning, pages 163–168.
Google Scholar
Forgy, E. W. (1965). Cluster analysis of multivariate data: efficency versus interpretability of classifications (abstract). Biometrics, 21:768–769.
Google Scholar
Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society, Series A, 150(2):119–137.
MathSciNet MATH Google Scholar
Hanson, R., Stutz, J., and Cheesman, P. (1990). Bayesian classification theory. Technical Report FIA-90–12–7–01, NASA, Ames Research Center.
Google Scholar
Hardy, A. (1994). An examination of procedures for determining the number of clusters in a data set. In Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., and Burtschy, B., editors, New Approaches in Classification and Data Analysis, pages 178–185. Springer-Verlag.
Google Scholar
Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall.
MATH Google Scholar
Jones, D. R. and Beltramo, M. A. (1990). Clustering with genetic algorithms. Technical Report GMR-7156, Operating Sciences Department, General Motors Research Laboratories.
Google Scholar
Jones, D. R. and Beltramo, M. A. (1991). Solving partitioning problems with genetic algorithms. In Belew, R. and Booker, L. B., editors, Proc. of the Fourth International Conference on Genetic Algorithms, pages 442–449. Morgan Kaufmann.
Google Scholar
Langley, P. (1995). Order effects in incremental learning. In Reimann, P. and Spada, H., editors, Learning in humans and machines: Towards an Interdisciplinary Learning Science. Pergamon.
Google Scholar
Langley, P. (1998). Elements of Machine Learning. Series in Machine Learning. Morgan Kaufmann Publishers, Inc., San Francisco, California.
Google Scholar
Lozano, J. A. (1998). Genetic Algorithms Applied to Unsupervised Classification. PhD thesis, University of the Basque Country (In spanish).
Google Scholar
Lozano, J. A., Larranaga, P., and Grana, M. (1998). Partitional cluster analysis with genetic algorithms: searching for the number of clusters. In Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H., and Baba, Y., editors, Data Science,Classification and Related Methods, pages 117–125. Springer.
Google Scholar
Lucasius, C. B., Dane, A. D., and Kateman, G. (1993). On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasibility and comparison. Analytica Chimica Acta, 282:647–669.
Article Google Scholar
Luchian, S., Luchian, H., and Petriuc, M. (1994). Evolutionary automated classification. In Fogel, D. B., editor, Proceedings of The First IEEE Conference on Evolutionary Computation, volume I, pages 585–589. IEEE Computer Society Press.
Google Scholar
MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of Fifth Berkeley Symposium, volume 2, pages 281–297.
Google Scholar
Maulik, U. and Bandyopdhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33:1455–1465.
Article Google Scholar
Michalski, R. S. and Stepp, R. E. (1983). Learning from observation: Conceptual clustering. In Michalski, R. S., Carbonell, J. G., and Mitchell, T. M., editors, Machine learning: An artificial intelligence approach. Morgan Kaufmann, Los Altos, CA.
Google Scholar
Mühlenbein, H. and Paaß, G. (1996). From recombination of genes to the estimation of distributions i. binary parameters. In Lecture Notes in Computer Science 1411: Parallel Problem Solving from Nature - PPSN IV, pages 178–187.
Google Scholar
Murphy,P.M. and Aha,D.W. (1994).Uci repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Pelikan, M. and Goldberg, D. E. (2000). Genetic algorithms, clustering, and the breaking of symmetry. Technical Report I11i200013, University of Illinois at Urbana-Champaign, Illinois.
Google Scholar
Pena, J. M., Lozano, J. A., and Larranaga, P. (1999). An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recognition Letters, 20:1027–1040.
Article Google Scholar
Raghavan, V. V. and Birchard, K. (1979). A clustering strategy based on a formalism of the reproductive process in natural systems. SIGIR Forum, 14:10–22.
Article Google Scholar
Rasson, J. P. and Kubushishi, T. (1993). The gap test: an optimal method for determining the number of natural classes in cluster analysis. In Diday, E.,Lechevallier, Y., Schader, M., Bertrand, P., and Burtschy, B., editors, New Approaches in Classification and Data Analysis, pages 186–193. Springer-Verlag.
Google Scholar
Roure, J. and Talavera, L. (1998). Robust incremental clustering with bad instance orderings: a new strategy. In Coelho, H., editor, Progress in Artificial Intelligence—IBERAMIA 98, Sixth Ibero-American Conference on AI, pages 136–147. Springer.
Google Scholar
Sarkar, M., Yegnanarayana, B., and Khemani, D. (1997). A clustering algorithm using evolutionary programming-based approach. Pattern Recognition Letters, 18:975–986.
Article Google Scholar
Syswerda, G. (1993). Simulated crossover in genetic algorithms. Foundations of Genetic Algorithms 2, pages 239–255.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Management, Escola Universitária Politécnica de Mataró, Spain
J. Roure
Department of Computer Science and Artificial Intelligence, University of the Basque Country, Spain
P. Larrañaga
Department of Computer Languages and Systems, Technical University of Catalunya, Spain
R. Sangüesa

Authors

J. Roure
View author publications
You can also search for this author in PubMed Google Scholar
P. Larrañaga
View author publications
You can also search for this author in PubMed Google Scholar
R. Sangüesa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of the Basque Country, Spain
Pedro Larrañaga & Jose A. Lozano &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Roure, J., Larrañaga, P., Sangüesa, R. (2002). An Empirical Comparison Between K-Means, GAs and EDAs in Partitional Clustering. In: Larrañaga, P., Lozano, J.A. (eds) Estimation of Distribution Algorithms. Genetic Algorithms and Evolutionary Computation, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1539-5_17

Download citation

DOI: https://doi.org/10.1007/978-1-4615-1539-5_17
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5604-2
Online ISBN: 978-1-4615-1539-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics