Abstract
An important consideration in clustering is the determination of an algorithm appropriate for partitioning a given data set. Thereafter identification of the correct model order and determining the corresponding partitioning need to be performed. In this paper, at first the effectiveness of the recently developed symmetry based cluster validity index named Sym-index which provides a measure of “symmetricity” of the different partitionings of a data set is shown to address all the above mentioned issues, viz., identifying the appropriate clustering algorithm, determining the proper model order and evolving the proper partitioning as long as the clusters possess the property of symmetry. Results demonstrating the superiority of the proposed cluster validity measure in appropriately determining the proper clustering technique as well as appropriate model order as compared to five other recently proposed measures, namely PS-index, I-index, CS-index, well-known XB-index, and stability based index, are provided for several clustering methods viz., two recently developed genetic algorithm based clustering techniques, the average linkage clustering algorithm, self organizing map and the expectation maximization clustering algorithm. Five artificial data sets and three real life data sets, are considered for this purpose. In the second part of the paper, a new measure of stability of clustering solutions over different bootstrap samples of a data set is proposed. Thereafter a multiobjective optimization based clustering technique is developed which optimizes both Sym-index and the measure of stability simultaneously to automatically determine the appropriate number of clusters and the appropriate partitioning of the data sets having symmetrical shaped clusters. Results on five artificial and five real-life data sets show that the proposed technique is well-suited to detect the number of clusters from data sets having point symmetric clusters.
Similar content being viewed by others
References
Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654
Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybernet 3(3):32–57
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847
Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220
Bandyopadhyay S, Saha S (2007) GAPS: A clustering method using a new point symmetry based distance measure. Pattern Recognit 40(12):3430–3451
Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17
Attneave F (1995) Symmetry information and memory for pattern. Am J Psychol 68:209–222
Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, England
Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evol Comput 12(3):269–283
Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76
Saha S, Bandyopadhyay S (2010) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst (KAIS) 23(1):1–27
Saha S, Bandyopadhyay S (2010) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognit 43(3):738–751
Saha S, Bandyopadhyay S (2009) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recognit Lett 30(15):1392–1403
Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2):166–170
Veldhuizen DV, Lamont G (2000) Multiobjective evolutionary algorithms: analyzing the state-of-the-art. Evol Comput 2:125–147
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Su MC, Chou CH (2001) A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6):674–680
Pakhira MK, Maulik U, Bandyopadhyay S (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3):487–501
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 3:179–188
Maulik U, Bandyopadhyay S (2000) Genetic algorithm based clustering technique. Pattern Recognit 33(9):1455–1465
Everitt BS, Landau S, Leese M (2001) Cluster analysis. London, Arnold
Kohonen T (1989) Self-organization and associative memory 3rd edn. Springer, New York, Berlin
Bradley PS, Fayyad UM, Reina C (1998) Scaling EM (expectation maximization) clustering to large databases. Technical report, Microsoft Research Center
Chou CH, Su MC, Lai E (2002) Symmetry as a new measure for cluster validity. In: 2nd WSEAS international conference on scientific computation and soft computing, Crete, Greece, 209–213
Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst, Man Cybernat 24(4):656–667
Jardine N, Sibson R (1971) Mathematical taxonomy. Wiley, New York
Anderson TW, Scolve SL (1978) Introduction to the statistical analysis of data. Houghton Mifflin, Bostan
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters via the gap statistics. J R Stat Soc 63:411–423
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saha, S., Maulik, U. Use of symmetry and stability for data clustering. Evol. Intel. 3, 103–122 (2010). https://doi.org/10.1007/s12065-010-0041-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-010-0041-0