A new multiobjective clustering technique based on the concepts of stability and symmetry

Saha, Sriparna; Bandyopadhyay, Sanghamitra

doi:10.1007/s10115-009-0204-4

A new multiobjective clustering technique based on the concepts of stability and symmetry

Regular Paper
Published: 04 April 2009

Volume 23, pages 1–27, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Sriparna Saha¹ &
Sanghamitra Bandyopadhyay¹

308 Accesses
31 Citations
Explore all metrics

Abstract

Most clustering algorithms operate by optimizing (either implicitly or explicitly) a single measure of cluster solution quality. Such methods may perform well on some data sets but lack robustness with respect to variations in cluster shape, proximity, evenness and so forth. In this paper, we have proposed a multiobjective clustering technique which optimizes simultaneously two objectives, one reflecting the total cluster symmetry and the other reflecting the stability of the obtained partitions over different bootstrap samples of the data set. The proposed algorithm uses a recently developed simulated annealing-based multiobjective optimization technique, named AMOSA, as the underlying optimization strategy. Here, points are assigned to different clusters based on a newly defined point symmetry-based distance rather than the Euclidean distance. Results on several artificial and real-life data sets in comparison with another multiobjective clustering technique, MOCK, three single objective genetic algorithm-based automatic clustering techniques, VGAPS clustering, GCUK clustering and HNGA clustering, and several hybrid methods of determining the appropriate number of clusters from data sets show that the proposed technique is well suited to detect automatically the appropriate number of clusters as well as the appropriate partitioning from data sets having point symmetric clusters. The performance of AMOSA as the underlying optimization technique in the proposed clustering algorithm is also compared with PESA-II, another evolutionary multiobjective optimization technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Anderberg MR (2000) Computational geometry: algorithms and applications. Springer, Berlin
Google Scholar
Assent I, Krieger R, Glavic B, Seidli T (2008) Clustering multidimensional sequences in spatial and temporal databases. Knowl Inf Syst 16(1): 1–27
Article Google Scholar
Attneave F (1995) Symmetry information and memory for pattern. Am J Psychol 68: 209–222
Article Google Scholar
Bandyopadhyay S, Maulik U (2001) Nonparametric genetic clustering: comparison of validity indices. IEEE Trans Syst Man Cybernet C 31(1): 120–125
Article Google Scholar
Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6): 1197–1208
Article MATH Google Scholar
Bandyopadhyay S, Pal SK (2007) Classification and learning using genetic algorithms: applications in bioinformatics and web intelligence. Springer, Heidelberg
MATH Google Scholar
Bandyopadhyay S, Saha S (2007) GAPS: A clustering method using a new point symmetry based distance measure. Pattern Recognit 40: 3430–3451
Article MATH Google Scholar
Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17 (accepted)
Google Scholar
Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evol Comput 12(3): 269–283
Article Google Scholar
Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis in methods in molecular biology. Humana press, Totowa, NJ
Google Scholar
Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybernet 28: 301–315
Article Google Scholar
Breckenridge J (1989) Replicating cluster analysis: method, consistency and validity. Multivar Behav Res 24: 147–161
Article Google Scholar
Corne DW, Jerram NR, Knowles JD, Oates MJ (2001) PESA-II: region-based selection in evolutionary multiobjective optimization, In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt H-M, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001). Morgan Kaufmann, San Francisco, California, USA, pp. 283–290. http://citeseer.ist.psu.edu/corne01pesaii.html
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1: 224–227
Article Google Scholar
Deb K (2001) Multi-objective optimization using evolutionary algorithms. John Wiley and Sons, Ltd, England
MATH Google Scholar
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2): 182–197
Article Google Scholar
Denton AM, Besemann CA, Dorr DH (2009) Pattern-based time-series subsequence clustering using radial distribution functions. Knowl Inf Syst 18(1): 1–27
Article Google Scholar
Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a data set. Genome Biol 3(7): 1299–1323
Article Google Scholar
Dunn JC (1974) Well separated clusters and optimal fuzzy partitions. J Cyberns 4: 95–104
Article MathSciNet Google Scholar
Eduardo RH, Nelson FFE (2003) A genetic algorithm for cluster analysis. Intell Data Anal 7: 15–25
Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 3: 179–188
Google Scholar
Fukuyama Y, Sugeno M (1989) A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of the fifth fuzzy systems symposium, pp. 247–250
Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6): 721–741
Article MATH Google Scholar
Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1): 56–76
Article Google Scholar
Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, AnnArbor
Google Scholar
Jain AK, Duin P, Jianchang M (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1): 4–37
Article Google Scholar
Kim DJ, Park YW, Park DJ (2001) A novel validity index for determination of the optimal number of clusters. IEICE Trans Inf Syst D-E84(2): 281–285
Google Scholar
Kwon SH (1998) Cluster validity index for fuzzy clustering. Electron Lett 34(22): 2176–2177
Article Google Scholar
Lange T, Roth V, Braun ML, Buhmann JM (2004) Stability-based validation of clustering solutions. Neural Comput 16: 1299–1323
Article MATH Google Scholar
Le T (2007) Multiobjective clustering with automatic determination of the number of clusters. http://dbkgroup.org/handl/mock/
Li T (2008) Clustering based on matrix approximation: a unifying view. Knowl Inf Syst 17(1): 1–15
Article MATH Google Scholar
Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12): 1650–1654
Article Google Scholar
Moise G, Sander J, Ester M (2008) Robust projected clustering. Knowl Inf Syst 14(3): 273–298
Article MATH Google Scholar
Mount DM, Arya S (2005) ANN: A library for approximate nearest neighbor searching. http://www.cs.umd.edu/~mount/ANN
Nayak R (2008) Fast and effective clustering of xml data using structural information. Knowl Inf Syst 14(2): 197–215
Article MathSciNet Google Scholar
Ohsawa Y, Sakauchi M (1983) BD-Tree: A new n-dimensional data structure with efficient dynamic characteristics. In: Proceedings of the 9th world computer congress, IFIP83’, pp. 539–544
Pakhira MK, Maulik U, Bandyopadhyay S (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3): 487–501
Article MATH Google Scholar
Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2): 166–170
Article Google Scholar
Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst Man Cybernet B Cybernet 35(6): 1156–1167
Article Google Scholar
Srinivas M, Patnaik L (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybernet 24(4): 656–667
Article Google Scholar
Su M-C, Chou C-H (2001) A modified version of the k-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6): 674–680
Article Google Scholar
Tibshirani R, Walther G, Botstein D, Brown P (2001) Cluster validation by prediction strength, Technical report, Statistics Department, Stanford University, Stanford, CA
Tibshirani R, Walther G, Hastie T (2000) Estimating the number of clusters in a dataset via the gap statistic, Technical report
van Laarhoven PJM, Aarts EHL (1987) Simulated annealing: theory and applications. Kluwer Academic Publisher, Dordrecht
MATH Google Scholar
Veldhuizen DV, Lamont G (2000) Multiobjective evolutionary algorithms: analyzing the state-of-the-art. Evol Comput 2: 125–147
Article Google Scholar
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Article Google Scholar
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13: 841–847
Article Google Scholar

Download references

Author information

Authors and Affiliations

Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata, 700108, India
Sriparna Saha & Sanghamitra Bandyopadhyay

Authors

Sriparna Saha
View author publications
You can also search for this author inPubMed Google Scholar
Sanghamitra Bandyopadhyay
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Sriparna Saha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saha, S., Bandyopadhyay, S. A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst 23, 1–27 (2010). https://doi.org/10.1007/s10115-009-0204-4

Download citation

Received: 13 January 2009
Revised: 24 February 2009
Accepted: 01 March 2009
Published: 04 April 2009
Issue Date: April 2010
DOI: https://doi.org/10.1007/s10115-009-0204-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new multiobjective clustering technique based on the concepts of stability and symmetry

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improved multi-objective clustering with automatic determination of the number of clusters

Evaluation of Relative Indexes for Multi-objective Clustering

A multi-objective clustering approach based on different clustering measures combinations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A new multiobjective clustering technique based on the concepts of stability and symmetry

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improved multi-objective clustering with automatic determination of the number of clusters

Evaluation of Relative Indexes for Multi-objective Clustering

A multi-objective clustering approach based on different clustering measures combinations

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now