Skip to main content

A Distributed Genetic Algorithm for Graph-Based Clustering

  • Conference paper
Man-Machine Interactions 2

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 103))

Abstract

Clustering is one of the most prominent data analysis techniques to structure large datasets and produce a human-understandable overview. In this paper, we focus on the case when the data has many categorical attributes, and thus can not be represented in a faithful way in the Euclidean space. We follow the graph-based paradigm and propose a graph-based genetic algorithm for clustering, the flexibility of which can mainly be attributed to the possibility of using various kernels. As our approach can naturally be parallelized, while implementing and testing it, we distribute the computations over several CPUs. In contrast to the complexity of the problem, that is NP-hard, our experiments show that in case of well clusterable data, our algorithm scales well. We also perform experiments on real medical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ackerman, M., Ben-David, S.: Which data sets are clusterable?—A theoretical study of clusterability (2008), http://www.cs.uwaterloo.ca/~shai/publications/ability_submit.pdf

  2. Ben-David, S., Ackerman, M.: Measures of clustering quality: A working set of axioms for clustering. In: Advances in Neural Information Processing Systems, vol. 21, pp. 121–128 (2009)

    Google Scholar 

  3. Ben-David, S., Pál, D., Simon, H.: Stability of k-means clustering. In: Bshouty, N.H., Gentile, C. (eds.) COLT. LNCS (LNAI), vol. 4539, pp. 20–34. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Ben-David, S., Von Luxburg, U.: Relating clustering stability to properties of cluster boundaries. In: Proceedings of the International Conference on Computational Learning Theory, COLT (2008)

    Google Scholar 

  5. Beyer, H.: The theory of evolution strategies. Springer, Heidelberg (2001)

    Google Scholar 

  6. Brown, N., McKay, B., Gilardoni, F., Gasteiger, J.: A graph-based genetic algorithm and its application to the multiobjective evolution of median molecules. Journal of Chemical Information and Computer Sciences 44(3), 1079–1087 (2004)

    Google Scholar 

  7. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to algorithms. The MIT Press, Cambridge (2003)

    Google Scholar 

  8. Czumaj, A., Sohler, C.: Sublinear-time approximation algorithms for clustering via random sampling. Random Structures & Algorithms 30(1-2), 226–256 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  9. Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)

    Article  Google Scholar 

  10. Kleinberg, J.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems, vol. 15, p. 463 (2003)

    Google Scholar 

  11. Meyerson, A., O’Callaghan, L., Plotkin, S.: A k-median algorithm with running time independent of data size. Machine Learning 56(1), 61–87 (2004)

    Article  MATH  Google Scholar 

  12. Mishra, N., Oblinger, D., Pitt, L.: Sublinear time approximate clustering. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 439–447. Society for Industrial and Applied Mathematics, Philadelphia (2001)

    Google Scholar 

  13. Shamir, O., Tishby, N.: On the reliability of clustering stability in the large sample regime. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1465–1472 (2009)

    Google Scholar 

  14. de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing, pp. 50–58. ACM, New York (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Buza, K., Buza, A., Kis, P.B. (2011). A Distributed Genetic Algorithm for Graph-Based Clustering. In: Czachórski, T., Kozielski, S., Stańczyk, U. (eds) Man-Machine Interactions 2. Advances in Intelligent and Soft Computing, vol 103. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23169-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23169-8_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23168-1

  • Online ISBN: 978-3-642-23169-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics