Abstract
In this chapter we empirically compare the performance of three different approaches to partitional clustering. These are iterative hillclimbing algorithms, genetic algorithms and estimation of distribution algorithms. Emphasis is placed on their ability to avoid local maxima and also on the simplicity of setting good parameters for them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alippi, C. and Cucchiara, R. (1992). Cluster partitioning in image analysis classification: a genetic algorithm approach. In Proc. CompEuro 92, pages 139–144. IEEE Computer Society Press.
Babu, G. P. and Murty, M. N. (1994). Clustering with evolution strategies. Pattern Recognition, 27(2):321–329.
Ball, G. H. and Hall, D. J. (1967). A clustering technique for summarizing multivariate data. Behavioral Science, 12:153–155.
Bandfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49:803–821.
Bengoetxea, E., Larranaga, P., Bloch, I., Perchant, A., and Boeres, C. (2000). Inexact graph matching using learning and simulation of bayesian networks. an empirical comparison between different approaches with synthetic data. In Workshop Notes of CaNew2000: Workshop on Bayesian and Causal Networks: From Inference to Data Mining. Fourteenth European Conference on Artificial Intelligence, ECAI2000. Berlin.
Bezdek, J. C., Boggavaparu, S., Hall, L. O., and Bensaid, A. (1994). Genetic algorithm guided clustering. In Fogel, D. B., editor, Proceedings of The First IEEE Conference on Evolutionary Computation, volume I, pages 34–40. IEEE Computer Society Press.
Bhuyan, J. N., Raghavan, V. V., and Elayavalli, V. K. (1991). Genetic algorithms with an ordered representation. In Belew, R. and Booker, L. B., editors, Proc. of the Fourth International Conference on Genetic Algorithms, pages 408–415. Morgan Kaufmann.
Bozdogan, H. (1994). Choosing the number of clusters, subset selection of variablesm and outlier detection in the standard mixture-model cluster analysis. In Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., and Burtschy, B., editors, New Approaches in Classification and Data Analysis,pages 169–177. Springer-Verlag.
Celeux, G. and Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13(2):195–212.
Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14:462–467.
Cucchiara, R. (1993). Analysis and comparison of different genetic models for the clustering problem in image analysis. In Albretch, R. F., Reeves, C. R., and Steele, N. C., editors, Artificial Neural Networks and Genetic Algorithms, pages 423–427. Springer-Verlag.
De Bonet, J. S., Isbell, C. L., and Viola, P. (1997). MIMIC: Finding optima by estimating probability densities.. Advances in Neural Information Processing Systems, Vol. 9.
Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley.
Etxeberria, R. and Larranaga, P. (1999). Global optimization with Bayesian networks. In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 1 332–339.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). Knowledge discovery and data mining: towards a unifying framework. In Press, A., editor, Second International Conference on Knowledge Discovery and Data Mining, Portland OR.
Fisher, D., Pazzani, M., and Langley, P. (1992a). Concept Formation: Knowledge and expertise on unsupervised learning. Morgan Kaufmman Publishers, Inc.
Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.
Fisher, D. H., Xu, L., and Zard, N. (1992b). Ordering effects in clustering. In Ninth International Conference on Machine Learning, pages 163–168.
Forgy, E. W. (1965). Cluster analysis of multivariate data: efficency versus interpretability of classifications (abstract). Biometrics, 21:768–769.
Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society, Series A, 150(2):119–137.
Hanson, R., Stutz, J., and Cheesman, P. (1990). Bayesian classification theory. Technical Report FIA-90–12–7–01, NASA, Ames Research Center.
Hardy, A. (1994). An examination of procedures for determining the number of clusters in a data set. In Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., and Burtschy, B., editors, New Approaches in Classification and Data Analysis, pages 178–185. Springer-Verlag.
Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice Hall.
Jones, D. R. and Beltramo, M. A. (1990). Clustering with genetic algorithms. Technical Report GMR-7156, Operating Sciences Department, General Motors Research Laboratories.
Jones, D. R. and Beltramo, M. A. (1991). Solving partitioning problems with genetic algorithms. In Belew, R. and Booker, L. B., editors, Proc. of the Fourth International Conference on Genetic Algorithms, pages 442–449. Morgan Kaufmann.
Langley, P. (1995). Order effects in incremental learning. In Reimann, P. and Spada, H., editors, Learning in humans and machines: Towards an Interdisciplinary Learning Science. Pergamon.
Langley, P. (1998). Elements of Machine Learning. Series in Machine Learning. Morgan Kaufmann Publishers, Inc., San Francisco, California.
Lozano, J. A. (1998). Genetic Algorithms Applied to Unsupervised Classification. PhD thesis, University of the Basque Country (In spanish).
Lozano, J. A., Larranaga, P., and Grana, M. (1998). Partitional cluster analysis with genetic algorithms: searching for the number of clusters. In Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H., and Baba, Y., editors, Data Science,Classification and Related Methods, pages 117–125. Springer.
Lucasius, C. B., Dane, A. D., and Kateman, G. (1993). On k-medoid clustering of large data sets with the aid of a genetic algorithm: background, feasibility and comparison. Analytica Chimica Acta, 282:647–669.
Luchian, S., Luchian, H., and Petriuc, M. (1994). Evolutionary automated classification. In Fogel, D. B., editor, Proceedings of The First IEEE Conference on Evolutionary Computation, volume I, pages 585–589. IEEE Computer Society Press.
MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of Fifth Berkeley Symposium, volume 2, pages 281–297.
Maulik, U. and Bandyopdhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33:1455–1465.
Michalski, R. S. and Stepp, R. E. (1983). Learning from observation: Conceptual clustering. In Michalski, R. S., Carbonell, J. G., and Mitchell, T. M., editors, Machine learning: An artificial intelligence approach. Morgan Kaufmann, Los Altos, CA.
Mühlenbein, H. and Paaß, G. (1996). From recombination of genes to the estimation of distributions i. binary parameters. In Lecture Notes in Computer Science 1411: Parallel Problem Solving from Nature - PPSN IV, pages 178–187.
Murphy,P.M. and Aha,D.W. (1994).Uci repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Pelikan, M. and Goldberg, D. E. (2000). Genetic algorithms, clustering, and the breaking of symmetry. Technical Report I11i200013, University of Illinois at Urbana-Champaign, Illinois.
Pena, J. M., Lozano, J. A., and Larranaga, P. (1999). An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recognition Letters, 20:1027–1040.
Raghavan, V. V. and Birchard, K. (1979). A clustering strategy based on a formalism of the reproductive process in natural systems. SIGIR Forum, 14:10–22.
Rasson, J. P. and Kubushishi, T. (1993). The gap test: an optimal method for determining the number of natural classes in cluster analysis. In Diday, E.,Lechevallier, Y., Schader, M., Bertrand, P., and Burtschy, B., editors, New Approaches in Classification and Data Analysis, pages 186–193. Springer-Verlag.
Roure, J. and Talavera, L. (1998). Robust incremental clustering with bad instance orderings: a new strategy. In Coelho, H., editor, Progress in Artificial Intelligence—IBERAMIA 98, Sixth Ibero-American Conference on AI, pages 136–147. Springer.
Sarkar, M., Yegnanarayana, B., and Khemani, D. (1997). A clustering algorithm using evolutionary programming-based approach. Pattern Recognition Letters, 18:975–986.
Syswerda, G. (1993). Simulated crossover in genetic algorithms. Foundations of Genetic Algorithms 2, pages 239–255.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Roure, J., Larrañaga, P., Sangüesa, R. (2002). An Empirical Comparison Between K-Means, GAs and EDAs in Partitional Clustering. In: Larrañaga, P., Lozano, J.A. (eds) Estimation of Distribution Algorithms. Genetic Algorithms and Evolutionary Computation, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1539-5_17
Download citation
DOI: https://doi.org/10.1007/978-1-4615-1539-5_17
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5604-2
Online ISBN: 978-1-4615-1539-5
eBook Packages: Springer Book Archive