Skip to main content

Advertisement

Log in

Use of symmetry and stability for data clustering

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

An important consideration in clustering is the determination of an algorithm appropriate for partitioning a given data set. Thereafter identification of the correct model order and determining the corresponding partitioning need to be performed. In this paper, at first the effectiveness of the recently developed symmetry based cluster validity index named Sym-index which provides a measure of “symmetricity” of the different partitionings of a data set is shown to address all the above mentioned issues, viz., identifying the appropriate clustering algorithm, determining the proper model order and evolving the proper partitioning as long as the clusters possess the property of symmetry. Results demonstrating the superiority of the proposed cluster validity measure in appropriately determining the proper clustering technique as well as appropriate model order as compared to five other recently proposed measures, namely PS-index, I-index, CS-index, well-known XB-index, and stability based index, are provided for several clustering methods viz., two recently developed genetic algorithm based clustering techniques, the average linkage clustering algorithm, self organizing map and the expectation maximization clustering algorithm. Five artificial data sets and three real life data sets, are considered for this purpose. In the second part of the paper, a new measure of stability of clustering solutions over different bootstrap samples of a data set is proposed. Thereafter a multiobjective optimization based clustering technique is developed which optimizes both Sym-index and the measure of stability simultaneously to automatically determine the appropriate number of clusters and the appropriate partitioning of the data sets having symmetrical shaped clusters. Results on five artificial and five real-life data sets show that the proposed technique is well-suited to detect the number of clusters from data sets having point symmetric clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654

    Article  Google Scholar 

  2. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybernet 3(3):32–57

    Article  MATH  MathSciNet  Google Scholar 

  3. Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847

    Article  Google Scholar 

  4. Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220

    MathSciNet  Google Scholar 

  5. Bandyopadhyay S, Saha S (2007) GAPS: A clustering method using a new point symmetry based distance measure. Pattern Recognit 40(12):3430–3451

    Article  MATH  Google Scholar 

  6. Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17

    Article  Google Scholar 

  7. Attneave F (1995) Symmetry information and memory for pattern. Am J Psychol 68:209–222

    Article  Google Scholar 

  8. Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, England

    MATH  Google Scholar 

  9. Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evol Comput 12(3):269–283

    Article  Google Scholar 

  10. Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76

    Article  Google Scholar 

  11. Saha S, Bandyopadhyay S (2010) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst (KAIS) 23(1):1–27

    Article  Google Scholar 

  12. Saha S, Bandyopadhyay S (2010) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognit 43(3):738–751

    Article  MATH  Google Scholar 

  13. Saha S, Bandyopadhyay S (2009) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recognit Lett 30(15):1392–1403

    Article  Google Scholar 

  14. Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2):166–170

    Article  Google Scholar 

  15. Veldhuizen DV, Lamont G (2000) Multiobjective evolutionary algorithms: analyzing the state-of-the-art. Evol Comput 2:125–147

    Article  Google Scholar 

  16. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  17. Su MC, Chou CH (2001) A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6):674–680

    Article  Google Scholar 

  18. Pakhira MK, Maulik U, Bandyopadhyay S (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3):487–501

    Article  MATH  Google Scholar 

  19. http://www.ics.uci.edu/∼mlearn/MLRepository.html

  20. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 3:179–188

    Google Scholar 

  21. Maulik U, Bandyopadhyay S (2000) Genetic algorithm based clustering technique. Pattern Recognit 33(9):1455–1465

    Article  Google Scholar 

  22. Everitt BS, Landau S, Leese M (2001) Cluster analysis. London, Arnold

    Google Scholar 

  23. Kohonen T (1989) Self-organization and associative memory 3rd edn. Springer, New York, Berlin

    Google Scholar 

  24. Bradley PS, Fayyad UM, Reina C (1998) Scaling EM (expectation maximization) clustering to large databases. Technical report, Microsoft Research Center

  25. Chou CH, Su MC, Lai E (2002) Symmetry as a new measure for cluster validity. In: 2nd WSEAS international conference on scientific computation and soft computing, Crete, Greece, 209–213

  26. Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst, Man Cybernat 24(4):656–667

    Article  Google Scholar 

  27. Jardine N, Sibson R (1971) Mathematical taxonomy. Wiley, New York

    MATH  Google Scholar 

  28. Anderson TW, Scolve SL (1978) Introduction to the statistical analysis of data. Houghton Mifflin, Bostan

    MATH  Google Scholar 

  29. http://www.dbkgroup.org/handl/mock/

  30. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters via the gap statistics. J R Stat Soc 63:411–423

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sriparna Saha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saha, S., Maulik, U. Use of symmetry and stability for data clustering. Evol. Intel. 3, 103–122 (2010). https://doi.org/10.1007/s12065-010-0041-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-010-0041-0

Keywords

Navigation