Skip to main content

Advertisement

Log in

Multi-objective selection for collecting cluster alternatives

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Grouping objects into different categories is a basic means of cognition. In the fields of machine learning and statistics, this subject is addressed by cluster analysis. Yet, it is still controversially discussed how to assess the reliability and quality of clusterings. In particular, it is hard to determine the optimal number of clusters inherent in the underlying data. Running different cluster algorithms and cluster validation methods usually yields different optimal clusterings. In fact, several clusterings with different numbers of clusters are plausible in many situations, as different methods are specialized on diverse structural properties. To account for the possibility of multiple plausible clusterings, we employ a multi-objective approach for collecting cluster alternatives (MOCCA) from a combination of cluster algorithms and validation measures. In an application to artificial data as well as microarray data sets, we demonstrate that exploring a Pareto set of optimal partitions rather than a single solution can identify alternative solutions that are overlooked by conventional clustering strategies. Competitive solutions are hereby ranked following an impartial criterion, while the ultimate judgement is left to the investigator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Albatineh AN, Niewiadomska-Bugaj M, Mihalko D (2006) On similarity indices and correction for chance agreement. J Classif 23(2): 301–313

    Article  MathSciNet  Google Scholar 

  • Ben-David S, von Luxburg U, Pál D (2006) A sober look at clustering stability. In: Carbonell JG, Siekmann J (eds) Conference on learning theory. Lecture notes in artificial intelligence, vol 4005. Springer, Berlin, pp 5–19

  • Ben-David S, Pál D, Simon HU (2007) Stability of k-means clustering. In: Bshouty NH, Gentile C (eds) Conference on learning theory. Lecture notes in artificial intelligence, vol 4539. Springer, Berlin, pp 20–34

  • Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing, pp 6–17

  • Bertoni A, Valentini G (2005) Random projections for assessing gene expression cluster stability. In: Proceedings of the IEEE-international joint conference on neural networks (IJCNN), vol 1. IEEE Computer Society, pp 149–154

  • Brock G, Pihur V, Datta S, Datta S (2008) clvalid: an r package for cluster validation. J Stat Softw 25(4): 1–22

    Google Scholar 

  • Conover WJ (1999) Practical nonparametric statistics, 3rd edn. Wiley, New York

    Google Scholar 

  • Cottrell M, Hammer B, Hasenfuss A, Villmann T (2006) Batch and median neural gas. Neural Netw 19(6–7): 762–771

    Article  MATH  Google Scholar 

  • de Souto MCP, Costa IG, de Araujo DSA, Ludermir TB, Schliep A (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinf 9(1): 497

    Article  Google Scholar 

  • Deb K (2004) Multi-objective optimization using evolutionary algorithms. Wiley, New York

    Google Scholar 

  • Dimitriadou E (2009) cclust: convex clustering methods and clustering indexes http://CRAN.R-project.org/package=cclust. R package version 0.6-16

  • Dimitriadou E, Barth M, Windischberger C, Hornik K, Moser E (2004) A quantitative comparison of functional MRI cluster analysis. Artif Intell Med 32(1): 57–71

    Article  Google Scholar 

  • Dimitriadou E, Weingessel A, Hornik K (1999) Voting in clustering and finding the number of clusters. In: Bothe H, Oja E, Massad E, Haefke C (eds) Proceedings of the “International symposium on advances in intelligent data analysis (AIDA 99)” (“International ICSC congress on computational intelligence: methods and applications (CIMA 99)”, ICSC Academic Press, pp 291–296

  • Dolnicar S, Leisch F (2009) Evaluation of structure and reproducibility of cluster solutions using the bootstrap. Technical report 63, Department of Statistics, LMU Munich

  • Dos Santos EM, Sabourin R, Maupin P (2009) Overfitting cautious selection of classifier ensembles with genetic algorithms. Inf Fusion 10(2): 150–162

    Article  Google Scholar 

  • Faceli K, de Souto MCP (2006) Multi-objective clustering ensemble. In: Proceedings of the 6th international conference on hybrid intelligent systems, IEEE Computer Society, Los Alamitos, p 51

  • Fowlkes E, Mallows C (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78(383): 553–569

    Article  MATH  Google Scholar 

  • Fridlyand J, Dudoit S (2001) Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Technical report 600, University of California, Berkeley

  • Golub T, Slonim D, Tamayo P, Huard C, Gassenbeek M, Coller H, Loh M, Downing J, Caliguri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439): 531–537

    Article  Google Scholar 

  • Handl J, Knowles J, Kell D (2005) Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15): 3201–3212

    Article  Google Scholar 

  • Hornik K, Leisch F (2005) Ensemble methods for cluster analysis. In: Taudes A (eds) Adaptive information systems and modelling in economics and management science. Springer, Berlin, pp 261–268

    Chapter  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Math. Classif 2: 193–218

    Article  Google Scholar 

  • Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bulletin de la Société Vaudoise des sciences naturelles 44: 223–270

    Google Scholar 

  • Jain A, Dubes R (1988) Algorithms for clustering data. Prentice Hall, New Jersey

    MATH  Google Scholar 

  • Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8): 651–666

    Article  Google Scholar 

  • Jain AK, Moreau JV (1987) Bootstrap technique in cluster analysis. Pattern Recognit 20(5): 547–568

    Article  Google Scholar 

  • Kerr MK, Churchill GA (2001) Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proceedings of the national academy of sciences 98(16):8961–8965

  • Kestler HA, Müller A, Buchholz M, Palm G, Gress TM (2003) Robustness evaluation of clusterings. In: Spang R, Béziat P, Vingron M (eds) Currents in computational molecular biology, (Abstract) pp 253–254

  • Kestler HA, Müller A, Schwenker F, Gress T, Mattfeldt T, Palm G (2001) Cluster analysis of comparative genomic hybridization data. Lecture notes NATO ASI: aritificial intelligence and heuristic methods for bioinformatics, (Abstract) pp S–40

  • Khan J, Wei J, Ringner M, Saal L, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonesco C, Peterson C, Meltzer P (2001) Classification and diagnostic prediction of cancer using gene expression profiling and artificial neural networks. Nat Med 6(7): 673–679

    Article  Google Scholar 

  • Kraus JM, Kestler HA (2010) A highly effcient multi-core algorithm for clustering extremely large datasets. BMC Bioinf 11(1): 169

    Article  Google Scholar 

  • Lange T, Roth V, Braun ML, Buhmann JM (2004) Stability-based validation of clustering solutions. Neural Comput 16(6): 1299–1323

    Article  MATH  Google Scholar 

  • Law MHC, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE Computer Society, Los Alamitos, pp 424–430

  • Leisch F, Hornik K (1999) Stabilization of k-means with bagged clustering. In: Proceedings of the 1999 joint statistical meetings

  • Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13(11): 2573–2593

    Article  MATH  Google Scholar 

  • Maechler M, Rousseeuw P, Struyf A, Hubert M (2005) Cluster analysis basics and extensions. Unpublished

  • Nieweglowski L (2009) clv: cluster validation techniques. http://CRAN.R-project.org/package=clv. R package version 0.3-2

  • Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C, Allen J, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870): 436–442

    Article  Google Scholar 

  • Radtke PW, Wong T, Sabourin R (2009) Solution over-fit control in evolutionary multiobjective optimization of pattern classification systems. Int J Pattern Recognit Artif Intell 23(6): 1107–1127

    Article  Google Scholar 

  • Rakhlin A, Caponnetto A (2007) Stability of k-means clustering. In: Schölkopf B, Platt JC, Hoffman T (eds) Advances in neural information processing systems 19. MIT Press, Cambridge, pp 1121–1128

    Google Scholar 

  • Smolkin M, Ghosh D (2003) Cluster stability scores for microarray data in cancer studies. BMC Bioinf 4(36)

  • Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617

    Article  MathSciNet  Google Scholar 

  • Ultsch A (2005) Clustering with som: U*c. In: Proceedings of the workshop on self-organizing maps. Paris, pp 75–82

  • Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans A. Kestler.

Electronic Supplementary Material

The Below is the Electronic Supplementary Material.

ESM 1 (PDF 103 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kraus, J.M., Müssel, C., Palm, G. et al. Multi-objective selection for collecting cluster alternatives. Comput Stat 26, 341–353 (2011). https://doi.org/10.1007/s00180-011-0244-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-011-0244-6

Keywords

Navigation