Abstract
Most clustering algorithms operate by optimizing (either implicitly or explicitly) a single measure of cluster solution quality. Such methods may perform well on some data sets but lack robustness with respect to variations in cluster shape, proximity, evenness and so forth. In this paper, we have proposed a multiobjective clustering technique which optimizes simultaneously two objectives, one reflecting the total cluster symmetry and the other reflecting the stability of the obtained partitions over different bootstrap samples of the data set. The proposed algorithm uses a recently developed simulated annealing-based multiobjective optimization technique, named AMOSA, as the underlying optimization strategy. Here, points are assigned to different clusters based on a newly defined point symmetry-based distance rather than the Euclidean distance. Results on several artificial and real-life data sets in comparison with another multiobjective clustering technique, MOCK, three single objective genetic algorithm-based automatic clustering techniques, VGAPS clustering, GCUK clustering and HNGA clustering, and several hybrid methods of determining the appropriate number of clusters from data sets show that the proposed technique is well suited to detect automatically the appropriate number of clusters as well as the appropriate partitioning from data sets having point symmetric clusters. The performance of AMOSA as the underlying optimization technique in the proposed clustering algorithm is also compared with PESA-II, another evolutionary multiobjective optimization technique.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Anderberg MR (2000) Computational geometry: algorithms and applications. Springer, Berlin
Assent I, Krieger R, Glavic B, Seidli T (2008) Clustering multidimensional sequences in spatial and temporal databases. Knowl Inf Syst 16(1): 1–27
Attneave F (1995) Symmetry information and memory for pattern. Am J Psychol 68: 209–222
Bandyopadhyay S, Maulik U (2001) Nonparametric genetic clustering: comparison of validity indices. IEEE Trans Syst Man Cybernet C 31(1): 120–125
Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6): 1197–1208
Bandyopadhyay S, Pal SK (2007) Classification and learning using genetic algorithms: applications in bioinformatics and web intelligence. Springer, Heidelberg
Bandyopadhyay S, Saha S (2007) GAPS: A clustering method using a new point symmetry based distance measure. Pattern Recognit 40: 3430–3451
Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17 (accepted)
Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evol Comput 12(3): 269–283
Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis in methods in molecular biology. Humana press, Totowa, NJ
Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybernet 28: 301–315
Breckenridge J (1989) Replicating cluster analysis: method, consistency and validity. Multivar Behav Res 24: 147–161
Corne DW, Jerram NR, Knowles JD, Oates MJ (2001) PESA-II: region-based selection in evolutionary multiobjective optimization, In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt H-M, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001). Morgan Kaufmann, San Francisco, California, USA, pp. 283–290. http://citeseer.ist.psu.edu/corne01pesaii.html
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1: 224–227
Deb K (2001) Multi-objective optimization using evolutionary algorithms. John Wiley and Sons, Ltd, England
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2): 182–197
Denton AM, Besemann CA, Dorr DH (2009) Pattern-based time-series subsequence clustering using radial distribution functions. Knowl Inf Syst 18(1): 1–27
Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a data set. Genome Biol 3(7): 1299–1323
Dunn JC (1974) Well separated clusters and optimal fuzzy partitions. J Cyberns 4: 95–104
Eduardo RH, Nelson FFE (2003) A genetic algorithm for cluster analysis. Intell Data Anal 7: 15–25
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 3: 179–188
Fukuyama Y, Sugeno M (1989) A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of the fifth fuzzy systems symposium, pp. 247–250
Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6): 721–741
Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1): 56–76
Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, AnnArbor
Jain AK, Duin P, Jianchang M (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1): 4–37
Kim DJ, Park YW, Park DJ (2001) A novel validity index for determination of the optimal number of clusters. IEICE Trans Inf Syst D-E84(2): 281–285
Kwon SH (1998) Cluster validity index for fuzzy clustering. Electron Lett 34(22): 2176–2177
Lange T, Roth V, Braun ML, Buhmann JM (2004) Stability-based validation of clustering solutions. Neural Comput 16: 1299–1323
Le T (2007) Multiobjective clustering with automatic determination of the number of clusters. http://dbkgroup.org/handl/mock/
Li T (2008) Clustering based on matrix approximation: a unifying view. Knowl Inf Syst 17(1): 1–15
Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12): 1650–1654
Moise G, Sander J, Ester M (2008) Robust projected clustering. Knowl Inf Syst 14(3): 273–298
Mount DM, Arya S (2005) ANN: A library for approximate nearest neighbor searching. http://www.cs.umd.edu/~mount/ANN
Nayak R (2008) Fast and effective clustering of xml data using structural information. Knowl Inf Syst 14(2): 197–215
Ohsawa Y, Sakauchi M (1983) BD-Tree: A new n-dimensional data structure with efficient dynamic characteristics. In: Proceedings of the 9th world computer congress, IFIP83’, pp. 539–544
Pakhira MK, Maulik U, Bandyopadhyay S (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3): 487–501
Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2): 166–170
Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst Man Cybernet B Cybernet 35(6): 1156–1167
Srinivas M, Patnaik L (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybernet 24(4): 656–667
Su M-C, Chou C-H (2001) A modified version of the k-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6): 674–680
Tibshirani R, Walther G, Botstein D, Brown P (2001) Cluster validation by prediction strength, Technical report, Statistics Department, Stanford University, Stanford, CA
Tibshirani R, Walther G, Hastie T (2000) Estimating the number of clusters in a dataset via the gap statistic, Technical report
van Laarhoven PJM, Aarts EHL (1987) Simulated annealing: theory and applications. Kluwer Academic Publisher, Dordrecht
Veldhuizen DV, Lamont G (2000) Multiobjective evolutionary algorithms: analyzing the state-of-the-art. Evol Comput 2: 125–147
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13: 841–847
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saha, S., Bandyopadhyay, S. A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst 23, 1–27 (2010). https://doi.org/10.1007/s10115-009-0204-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0204-4