Skip to main content

Techniques for measuring the stability of clustering: A comparative study

  • Conference paper
  • First Online:
Research and Development in Information Retrieval (SIGIR 1982)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 146))

Abstract

Among the significant factors in assessing the suitability of a clustering technique to a given application is its stability; that is, how sensitive the algorithm is to perturbations in the input data. A number of techniques that appear to be suitable for measuring the stability of clustering have been published in the literature. The details about each of these measures, such as a description of the steps involved in their computation and an identification of precisely what they measure, are presented. These measures are considered in the context of analysing the stability characteristics of clustering techniques and are compared using a framework developed for this purpose. The question of generalizing some of these measures is addressed and the measures are also analyzed to identify conditions under which they can be reduced to one another.

This research has been supported in part by a grant from Natural Sciences and Research Council of Canada.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Arabie, P. and Boorman, S.A. (1973). Multidimensional scaling of measures of distance between partitions. J. Math. Psych., 10, 148–203.

    Google Scholar 

  • Ball, G.H. (1965). Data analysis in social sciences: What about details. Proceedings AFIPS. FJCC, Macmillan, New York, N.Y., 533–559.

    Google Scholar 

  • Bonner, R.E. (1964). On some clustering techniques. IBM J. of Research and Development, 8, 22–32.

    Google Scholar 

  • Boorman, S.A. and Olivier, D.C. (1973). Metrics on spaces of finite trees, J. Math. Psych., 10, 26–59.

    Google Scholar 

  • Borko, H., Blakenship, D.A. and Burket, R.C. (1968). On-line information retrieval using associative indexing. Technical Report, Systems Development Corporation, RACD-TR-68-100.

    Google Scholar 

  • Cormack, R.M. (1971). A review of classification. J. Royal Statistical Society-Series A, 134, 321–367.

    Google Scholar 

  • Corneil D.G. and Woodward, M.E. (1978). A comparison and evaluation of graph theoretical clustering techniques. INFOR, 16, 74–89.

    Google Scholar 

  • Day, W.H.E. (1977). Validity of clusters formed by graph-theoretic cluster methods. Mathematical Biosciences, 36, 229–317.

    Google Scholar 

  • Day, W.H.E. (1979). The complexity of computing metric distances between partitions. Technical Report No. 7901, Memorial University of Newfoundland, St. John's, Newfoundland, Canada.

    Google Scholar 

  • Farris, J.S. (1969). A successive approximation approach to character weighting. Syst. Zool., 18, 374–385.

    Google Scholar 

  • Jackson, D.M. (1969). Comparison of classifications. In: Cole (Ed.), Numerical Taxonomy, pp. 91–111, Academic Press Inc., New York, N.Y.

    Google Scholar 

  • Jardine, N. and Sibson, R. (1971). Mathematical Taxonomy. John Wiley & Sons, Inc., New York, N.Y.

    Google Scholar 

  • Johnson S.C. (1967). Hierarchical clustering schemes. Psychometrika, 12, 241–254.

    Google Scholar 

  • Kendall, M.G. (1938). A new measure of rank correlation. Biometrika, 30, 81–93.

    Google Scholar 

  • Lance, G.N. and Williams, W.T. (1967a). A general theory of classificatory sorting strategies. I. Hierarchical system. Computer J., 9, 373–382.

    Google Scholar 

  • Lance, G.N. and Williams, W.T. (1967b). A general theory of classificatory sorting strategies. II. Clustering systems. Computer J., 10, 271–277.

    Google Scholar 

  • Matula, D.W. (1977). Graph theoretic techniques for cluster analysis algorithms. In: Van Ryzin (Ed.), Advance Seminar on Classification and Clustering, pp. 95–129, Academic Press Inc., New York, N.Y.

    Google Scholar 

  • Raghavan V.V. and Yu, C.T. (1981). A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-3, 393–402.

    Google Scholar 

  • Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. J. of the American Statistical Association, Vol. 66, 846–850.

    Google Scholar 

  • Rohlf, F.J. (1974). Methods of comparing classifications. Annu. Rev. Ecol. Syst., 5, 101–113.

    Google Scholar 

  • Salton, G. (1975). Dynamic Information and Library Processing. Prentice-Hall, Englewood Cliffs, N.J.

    Google Scholar 

  • Sneath, P.H.A. and Sokal, R.R. (1973). Numerical Taxonomy. Freeman, San Francisco, Ca.

    Google Scholar 

  • Sokal, R.R. and Rohlf, F.J. (1962). The comparison of dendrograms by objective methods. Taxon, 11, 33–40.

    Google Scholar 

  • Watanabe, S. (1972). A unified view of clustering algorithms. In: Information Processing 71, North Holland Publishing Co., Amsterdam, 149–154.

    Google Scholar 

  • Yu, C.T. (1974). A clustering algorithm based on user queries. J. of the American Society for Information Science, 25, 218–226.

    Google Scholar 

  • Yu, C.T. (1976). The stability of two common matching functions in classification with respect to a proposed measure. J. of the American society for Information Science, 27, 248–255.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Gerard Salton Hans-Jochen Schneider

Rights and permissions

Reprints and permissions

Copyright information

© 1983 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Raghavan, V.V., Ip, M.Y.L. (1983). Techniques for measuring the stability of clustering: A comparative study. In: Salton, G., Schneider, HJ. (eds) Research and Development in Information Retrieval. SIGIR 1982. Lecture Notes in Computer Science, vol 146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0036348

Download citation

  • DOI: https://doi.org/10.1007/BFb0036348

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-11978-4

  • Online ISBN: 978-3-540-39440-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics