Skip to main content

Advertisement

Log in

A new multiobjective clustering technique based on the concepts of stability and symmetry

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Most clustering algorithms operate by optimizing (either implicitly or explicitly) a single measure of cluster solution quality. Such methods may perform well on some data sets but lack robustness with respect to variations in cluster shape, proximity, evenness and so forth. In this paper, we have proposed a multiobjective clustering technique which optimizes simultaneously two objectives, one reflecting the total cluster symmetry and the other reflecting the stability of the obtained partitions over different bootstrap samples of the data set. The proposed algorithm uses a recently developed simulated annealing-based multiobjective optimization technique, named AMOSA, as the underlying optimization strategy. Here, points are assigned to different clusters based on a newly defined point symmetry-based distance rather than the Euclidean distance. Results on several artificial and real-life data sets in comparison with another multiobjective clustering technique, MOCK, three single objective genetic algorithm-based automatic clustering techniques, VGAPS clustering, GCUK clustering and HNGA clustering, and several hybrid methods of determining the appropriate number of clusters from data sets show that the proposed technique is well suited to detect automatically the appropriate number of clusters as well as the appropriate partitioning from data sets having point symmetric clusters. The performance of AMOSA as the underlying optimization technique in the proposed clustering algorithm is also compared with PESA-II, another evolutionary multiobjective optimization technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anderberg MR (2000) Computational geometry: algorithms and applications. Springer, Berlin

    Google Scholar 

  2. Assent I, Krieger R, Glavic B, Seidli T (2008) Clustering multidimensional sequences in spatial and temporal databases. Knowl Inf Syst 16(1): 1–27

    Article  Google Scholar 

  3. Attneave F (1995) Symmetry information and memory for pattern. Am J Psychol 68: 209–222

    Article  Google Scholar 

  4. Bandyopadhyay S, Maulik U (2001) Nonparametric genetic clustering: comparison of validity indices. IEEE Trans Syst Man Cybernet C 31(1): 120–125

    Article  Google Scholar 

  5. Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6): 1197–1208

    Article  MATH  Google Scholar 

  6. Bandyopadhyay S, Pal SK (2007) Classification and learning using genetic algorithms: applications in bioinformatics and web intelligence. Springer, Heidelberg

    MATH  Google Scholar 

  7. Bandyopadhyay S, Saha S (2007) GAPS: A clustering method using a new point symmetry based distance measure. Pattern Recognit 40: 3430–3451

    Article  MATH  Google Scholar 

  8. Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17 (accepted)

    Google Scholar 

  9. Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Trans Evol Comput 12(3): 269–283

    Article  Google Scholar 

  10. Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis in methods in molecular biology. Humana press, Totowa, NJ

    Google Scholar 

  11. Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybernet 28: 301–315

    Article  Google Scholar 

  12. Breckenridge J (1989) Replicating cluster analysis: method, consistency and validity. Multivar Behav Res 24: 147–161

    Article  Google Scholar 

  13. Corne DW, Jerram NR, Knowles JD, Oates MJ (2001) PESA-II: region-based selection in evolutionary multiobjective optimization, In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt H-M, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001). Morgan Kaufmann, San Francisco, California, USA, pp. 283–290. http://citeseer.ist.psu.edu/corne01pesaii.html

  14. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1: 224–227

    Article  Google Scholar 

  15. Deb K (2001) Multi-objective optimization using evolutionary algorithms. John Wiley and Sons, Ltd, England

    MATH  Google Scholar 

  16. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2): 182–197

    Article  Google Scholar 

  17. Denton AM, Besemann CA, Dorr DH (2009) Pattern-based time-series subsequence clustering using radial distribution functions. Knowl Inf Syst 18(1): 1–27

    Article  Google Scholar 

  18. Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a data set. Genome Biol 3(7): 1299–1323

    Article  Google Scholar 

  19. Dunn JC (1974) Well separated clusters and optimal fuzzy partitions. J Cyberns 4: 95–104

    Article  MathSciNet  Google Scholar 

  20. Eduardo RH, Nelson FFE (2003) A genetic algorithm for cluster analysis. Intell Data Anal 7: 15–25

    Google Scholar 

  21. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 3: 179–188

    Google Scholar 

  22. Fukuyama Y, Sugeno M (1989) A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of the fifth fuzzy systems symposium, pp. 247–250

  23. Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6): 721–741

    Article  MATH  Google Scholar 

  24. Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1): 56–76

    Article  Google Scholar 

  25. Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, AnnArbor

    Google Scholar 

  26. Jain AK, Duin P, Jianchang M (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1): 4–37

    Article  Google Scholar 

  27. Kim DJ, Park YW, Park DJ (2001) A novel validity index for determination of the optimal number of clusters. IEICE Trans Inf Syst D-E84(2): 281–285

    Google Scholar 

  28. Kwon SH (1998) Cluster validity index for fuzzy clustering. Electron Lett 34(22): 2176–2177

    Article  Google Scholar 

  29. Lange T, Roth V, Braun ML, Buhmann JM (2004) Stability-based validation of clustering solutions. Neural Comput 16: 1299–1323

    Article  MATH  Google Scholar 

  30. Le T (2007) Multiobjective clustering with automatic determination of the number of clusters. http://dbkgroup.org/handl/mock/

  31. Li T (2008) Clustering based on matrix approximation: a unifying view. Knowl Inf Syst 17(1): 1–15

    Article  MATH  Google Scholar 

  32. Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12): 1650–1654

    Article  Google Scholar 

  33. Moise G, Sander J, Ester M (2008) Robust projected clustering. Knowl Inf Syst 14(3): 273–298

    Article  MATH  Google Scholar 

  34. Mount DM, Arya S (2005) ANN: A library for approximate nearest neighbor searching. http://www.cs.umd.edu/~mount/ANN

  35. Nayak R (2008) Fast and effective clustering of xml data using structural information. Knowl Inf Syst 14(2): 197–215

    Article  MathSciNet  Google Scholar 

  36. Ohsawa Y, Sakauchi M (1983) BD-Tree: A new n-dimensional data structure with efficient dynamic characteristics. In: Proceedings of the 9th world computer congress, IFIP83’, pp. 539–544

  37. Pakhira MK, Maulik U, Bandyopadhyay S (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37(3): 487–501

    Article  MATH  Google Scholar 

  38. Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2): 166–170

    Article  Google Scholar 

  39. Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst Man Cybernet B Cybernet 35(6): 1156–1167

    Article  Google Scholar 

  40. Srinivas M, Patnaik L (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybernet 24(4): 656–667

    Article  Google Scholar 

  41. Su M-C, Chou C-H (2001) A modified version of the k-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6): 674–680

    Article  Google Scholar 

  42. Tibshirani R, Walther G, Botstein D, Brown P (2001) Cluster validation by prediction strength, Technical report, Statistics Department, Stanford University, Stanford, CA

  43. Tibshirani R, Walther G, Hastie T (2000) Estimating the number of clusters in a dataset via the gap statistic, Technical report

  44. van Laarhoven PJM, Aarts EHL (1987) Simulated annealing: theory and applications. Kluwer Academic Publisher, Dordrecht

    MATH  Google Scholar 

  45. Veldhuizen DV, Lamont G (2000) Multiobjective evolutionary algorithms: analyzing the state-of-the-art. Evol Comput 2: 125–147

    Article  Google Scholar 

  46. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37

    Article  Google Scholar 

  47. Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13: 841–847

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sriparna Saha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saha, S., Bandyopadhyay, S. A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst 23, 1–27 (2010). https://doi.org/10.1007/s10115-009-0204-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0204-4

Keywords

Navigation