Use of symmetry and stability for data clustering

Saha, Sriparna; Maulik, Ujjwal

doi:10.1007/s12065-010-0041-0

Use of symmetry and stability for data clustering

Research Paper
Published: 01 August 2010

Volume 3, pages 103–122, (2010)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Sriparna Saha¹ &
Ujjwal Maulik²

195 Accesses
6 Citations
Explore all metrics

Abstract

An important consideration in clustering is the determination of an algorithm appropriate for partitioning a given data set. Thereafter identification of the correct model order and determining the corresponding partitioning need to be performed. In this paper, at first the effectiveness of the recently developed symmetry based cluster validity index named Sym-index which provides a measure of “symmetricity” of the different partitionings of a data set is shown to address all the above mentioned issues, viz., identifying the appropriate clustering algorithm, determining the proper model order and evolving the proper partitioning as long as the clusters possess the property of symmetry. Results demonstrating the superiority of the proposed cluster validity measure in appropriately determining the proper clustering technique as well as appropriate model order as compared to five other recently proposed measures, namely PS-index, I-index, CS-index, well-known XB-index, and stability based index, are provided for several clustering methods viz., two recently developed genetic algorithm based clustering techniques, the average linkage clustering algorithm, self organizing map and the expectation maximization clustering algorithm. Five artificial data sets and three real life data sets, are considered for this purpose. In the second part of the paper, a new measure of stability of clustering solutions over different bootstrap samples of a data set is proposed. Thereafter a multiobjective optimization based clustering technique is developed which optimizes both Sym-index and the measure of stability simultaneously to automatically determine the appropriate number of clusters and the appropriate partitioning of the data sets having symmetrical shaped clusters. Results on five artificial and five real-life data sets show that the proposed technique is well-suited to detect the number of clusters from data sets having point symmetric clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Use of line based symmetry for developing cluster validity indices

Article 15 September 2015

A Study of Cluster Validity Indices for Real-Life Data

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time

Article 29 November 2019

References

Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654
Article Google Scholar
Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybernet 3(3):32–57
Article MATH MathSciNet Google Scholar
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847
Article Google Scholar
Chou CH, Su MC, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220
MathSciNet Google Scholar
Bandyopadhyay S, Saha S (2007) GAPS: A clustering method using a new point symmetry based distance measure. Pattern Recognit 40(12):3430–3451
Article MATH Google Scholar
Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17
Article Google Scholar
Attneave F (1995) Symmetry information and memory for pattern. Am J Psychol 68:209–222
Article Google Scholar
Deb K (2001) Multi-objective optimization using evolutionary algorithms. Wiley, England
MATH Google Scholar
Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evol Comput 12(3):269–283
Article Google Scholar
Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76
Article Google Scholar
Saha S, Bandyopadhyay S (2010) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst (KAIS) 23(1):1–27
Article Google Scholar
Saha S, Bandyopadhyay S (2010) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognit 43(3):738–751
Article MATH Google Scholar
Saha S, Bandyopadhyay S (2009) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recognit Lett 30(15):1392–1403
Article Google Scholar
Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2):166–170
Article Google Scholar
Veldhuizen DV, Lamont G (2000) Multiobjective evolutionary algorithms: analyzing the state-of-the-art. Evol Comput 2:125–147
Article Google Scholar
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
Su MC, Chou CH (2001) A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6):674–680
Article Google Scholar
Pakhira MK, Maulik U, Bandyopadhyay S (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3):487–501
Article MATH Google Scholar
http://www.ics.uci.edu/∼mlearn/MLRepository.html
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 3:179–188
Google Scholar
Maulik U, Bandyopadhyay S (2000) Genetic algorithm based clustering technique. Pattern Recognit 33(9):1455–1465
Article Google Scholar
Everitt BS, Landau S, Leese M (2001) Cluster analysis. London, Arnold
Google Scholar
Kohonen T (1989) Self-organization and associative memory 3rd edn. Springer, New York, Berlin
Google Scholar
Bradley PS, Fayyad UM, Reina C (1998) Scaling EM (expectation maximization) clustering to large databases. Technical report, Microsoft Research Center
Chou CH, Su MC, Lai E (2002) Symmetry as a new measure for cluster validity. In: 2nd WSEAS international conference on scientific computation and soft computing, Crete, Greece, 209–213
Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst, Man Cybernat 24(4):656–667
Article Google Scholar
Jardine N, Sibson R (1971) Mathematical taxonomy. Wiley, New York
MATH Google Scholar
Anderson TW, Scolve SL (1978) Introduction to the statistical analysis of data. Houghton Mifflin, Bostan
MATH Google Scholar
http://www.dbkgroup.org/handl/mock/
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters via the gap statistics. J R Stat Soc 63:411–423
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Image Processing and Modeling, Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, Heidelberg, Germany
Sriparna Saha
Department of Theoretical Bioinformatics, DKFZ (Deutsches Krebsforschungszentrum, German Cancer Research Center), Heidelberg, Germany
Ujjwal Maulik

Authors

Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar
Ujjwal Maulik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sriparna Saha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saha, S., Maulik, U. Use of symmetry and stability for data clustering. Evol. Intel. 3, 103–122 (2010). https://doi.org/10.1007/s12065-010-0041-0

Download citation

Received: 28 May 2009
Accepted: 09 June 2010
Published: 01 August 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s12065-010-0041-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Use of symmetry and stability for data clustering

Abstract

Access this article

Similar content being viewed by others

Use of line based symmetry for developing cluster validity indices

A Study of Cluster Validity Indices for Real-Life Data

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Use of symmetry and stability for data clustering

Abstract

Access this article

Similar content being viewed by others

Use of line based symmetry for developing cluster validity indices

A Study of Cluster Validity Indices for Real-Life Data

Partitioning and hierarchical based clustering: a comparative empirical assessment on internal and external indices, accuracy, and time

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation