Bootstrap technique in cluster analysis

https://doi.org/10.1016/0031-3203(87)90081-1Get rights and content

Abstract

We define a method to estimate the number of clusters in a data set E, using the bootstrap technique. This approach involves the generation of several “fake” data sets by sampling patterns with replacement in E (bootstrapping). For each number, K, of clusters, a measure of stability of the K-cluster partitions over the bootstrap samples is used to characterize the significance of the K-cluster partition for the original data set. The value of K which provides the most stable partitions is the estimate of the number of clusters in E. The performance of this new technique is demonstrated on both synthetic and real data, and is applied to the segmentation of range images.

References (29)

  • T.A. Bailey et al.

    Cluster validity profiles

    Pattern Recognition

    (1982)
  • R. Dubes et al.

    Validity studies in clustering methodologies

    Pattern Recognition

    (1979)
  • R. Dubes et al.

    Clustering techniques: the user's dilemma

    Pattern Recognition

    (1976)
  • M.R. Anderberg

    Cluster Analysis for Applications

    (1973)
  • F.B. Baker et al.

    Measuring the power of hierarchical cluster analysis

    Journal of the American Statistical Association

    (1975)
  • H.H. Bock

    On some significance tests in cluster analysis

    Journal of Classification

    (1985)
  • D.L. Davies et al.

    A cluster separation measure

    IEEE Trans. on Pattern Analysis and Machine Intelligence

    (1979)
  • P. Diaconis et al.

    Computer intensive methods in statistics

    Scientific American

    (1983)
  • E. Diday

    Optimisation en classification automatique, T1 et T2, INRIA

    (1980)
  • R.O. Duda et al.

    Pattern Classification and Scene Analysis

    (1973)
  • B. Efron

    Bootstrap method: another look at the Jacknife

    Annals of Statistics

    (1979)
  • B. Efron et al.

    A leisurely look at the bootstrap, the Jacknife, and cross-validation

    The American Statistician

    (1983)
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

  • J.A. Hartigan

    A K-means clustering algorithm

    Applied Statistics

    (1979)
  • Cited by (0)

    View full text