A Distributed Genetic Algorithm for Graph-Based Clustering

Buza, Krisztian; Buza, Antal; Kis, Piroska B.

doi:10.1007/978-3-642-23169-8_35

Krisztian Buza^4,5,
Antal Buza⁶ &
Piroska B. Kis⁶

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 103))

1226 Accesses
5 Citations

Abstract

Clustering is one of the most prominent data analysis techniques to structure large datasets and produce a human-understandable overview. In this paper, we focus on the case when the data has many categorical attributes, and thus can not be represented in a faithful way in the Euclidean space. We follow the graph-based paradigm and propose a graph-based genetic algorithm for clustering, the flexibility of which can mainly be attributed to the possibility of using various kernels. As our approach can naturally be parallelized, while implementing and testing it, we distribute the computations over several CPUs. In contrast to the complexity of the problem, that is NP-hard, our experiments show that in case of well clusterable data, our algorithm scales well. We also perform experiments on real medical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ackerman, M., Ben-David, S.: Which data sets are clusterable?—A theoretical study of clusterability (2008), http://www.cs.uwaterloo.ca/~shai/publications/ability_submit.pdf
Ben-David, S., Ackerman, M.: Measures of clustering quality: A working set of axioms for clustering. In: Advances in Neural Information Processing Systems, vol. 21, pp. 121–128 (2009)
Google Scholar
Ben-David, S., Pál, D., Simon, H.: Stability of k-means clustering. In: Bshouty, N.H., Gentile, C. (eds.) COLT. LNCS (LNAI), vol. 4539, pp. 20–34. Springer, Heidelberg (2007)
Chapter Google Scholar
Ben-David, S., Von Luxburg, U.: Relating clustering stability to properties of cluster boundaries. In: Proceedings of the International Conference on Computational Learning Theory, COLT (2008)
Google Scholar
Beyer, H.: The theory of evolution strategies. Springer, Heidelberg (2001)
Google Scholar
Brown, N., McKay, B., Gilardoni, F., Gasteiger, J.: A graph-based genetic algorithm and its application to the multiobjective evolution of median molecules. Journal of Chemical Information and Computer Sciences 44(3), 1079–1087 (2004)
Google Scholar
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to algorithms. The MIT Press, Cambridge (2003)
Google Scholar
Czumaj, A., Sohler, C.: Sublinear-time approximation algorithms for clustering via random sampling. Random Structures & Algorithms 30(1-2), 226–256 (2007)
Article MathSciNet MATH Google Scholar
Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)
Article Google Scholar
Kleinberg, J.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems, vol. 15, p. 463 (2003)
Google Scholar
Meyerson, A., O’Callaghan, L., Plotkin, S.: A k-median algorithm with running time independent of data size. Machine Learning 56(1), 61–87 (2004)
Article MATH Google Scholar
Mishra, N., Oblinger, D., Pitt, L.: Sublinear time approximate clustering. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 439–447. Society for Industrial and Applied Mathematics, Philadelphia (2001)
Google Scholar
Shamir, O., Tishby, N.: On the reliability of clustering stability in the large sample regime. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1465–1472 (2009)
Google Scholar
de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing, pp. 50–58. ACM, New York (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Systems and Machine Learning Lab, University of Hildesheim, Marienburger Platz 22, D-31141, Hildesheim, Germany
Krisztian Buza
Department of Information Theory and Computer Science, Budapest Univ. of Technology and Economics, H-1117, Budapest, Magyar tudósk körútja 2., Hungary
Krisztian Buza
College of Dunaujvaros, Tancsics Mihály u. 1/a, H-2400, Dunaujvaros, Hungary
Antal Buza & Piroska B. Kis

Authors

Krisztian Buza
View author publications
You can also search for this author in PubMed Google Scholar
Antal Buza
View author publications
You can also search for this author in PubMed Google Scholar
Piroska B. Kis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Bałtycka 5, 44-100, Gliwice, Poland
Tadeusz Czachórski
Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Stanisław Kozielski & Urszula Stańczyk &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buza, K., Buza, A., Kis, P.B. (2011). A Distributed Genetic Algorithm for Graph-Based Clustering. In: Czachórski, T., Kozielski, S., Stańczyk, U. (eds) Man-Machine Interactions 2. Advances in Intelligent and Soft Computing, vol 103. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23169-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-23169-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23168-1
Online ISBN: 978-3-642-23169-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics