Techniques for measuring the stability of clustering: A comparative study

Raghavan, Vijay V.; Ip, M. Y. L.

doi:10.1007/BFb0036348

Vijay V. Raghavan¹ &
M. Y. L. Ip²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 146))

Included in the following conference series:

International Conference on Research and Development in Information Retrieval

159 Accesses
4 Citations

Abstract

Among the significant factors in assessing the suitability of a clustering technique to a given application is its stability; that is, how sensitive the algorithm is to perturbations in the input data. A number of techniques that appear to be suitable for measuring the stability of clustering have been published in the literature. The details about each of these measures, such as a description of the steps involved in their computation and an identification of precisely what they measure, are presented. These measures are considered in the context of analysing the stability characteristics of clustering techniques and are compared using a framework developed for this purpose. The question of generalizing some of these measures is addressed and the measures are also analyzed to identify conditions under which they can be reduced to one another.

This research has been supported in part by a grant from Natural Sciences and Research Council of Canada.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arabie, P. and Boorman, S.A. (1973). Multidimensional scaling of measures of distance between partitions. J. Math. Psych., 10, 148–203.
Google Scholar
Ball, G.H. (1965). Data analysis in social sciences: What about details. Proceedings AFIPS. FJCC, Macmillan, New York, N.Y., 533–559.
Google Scholar
Bonner, R.E. (1964). On some clustering techniques. IBM J. of Research and Development, 8, 22–32.
Google Scholar
Boorman, S.A. and Olivier, D.C. (1973). Metrics on spaces of finite trees, J. Math. Psych., 10, 26–59.
Google Scholar
Borko, H., Blakenship, D.A. and Burket, R.C. (1968). On-line information retrieval using associative indexing. Technical Report, Systems Development Corporation, RACD-TR-68-100.
Google Scholar
Cormack, R.M. (1971). A review of classification. J. Royal Statistical Society-Series A, 134, 321–367.
Google Scholar
Corneil D.G. and Woodward, M.E. (1978). A comparison and evaluation of graph theoretical clustering techniques. INFOR, 16, 74–89.
Google Scholar
Day, W.H.E. (1977). Validity of clusters formed by graph-theoretic cluster methods. Mathematical Biosciences, 36, 229–317.
Google Scholar
Day, W.H.E. (1979). The complexity of computing metric distances between partitions. Technical Report No. 7901, Memorial University of Newfoundland, St. John's, Newfoundland, Canada.
Google Scholar
Farris, J.S. (1969). A successive approximation approach to character weighting. Syst. Zool., 18, 374–385.
Google Scholar
Jackson, D.M. (1969). Comparison of classifications. In: Cole (Ed.), Numerical Taxonomy, pp. 91–111, Academic Press Inc., New York, N.Y.
Google Scholar
Jardine, N. and Sibson, R. (1971). Mathematical Taxonomy. John Wiley & Sons, Inc., New York, N.Y.
Google Scholar
Johnson S.C. (1967). Hierarchical clustering schemes. Psychometrika, 12, 241–254.
Google Scholar
Kendall, M.G. (1938). A new measure of rank correlation. Biometrika, 30, 81–93.
Google Scholar
Lance, G.N. and Williams, W.T. (1967a). A general theory of classificatory sorting strategies. I. Hierarchical system. Computer J., 9, 373–382.
Google Scholar
Lance, G.N. and Williams, W.T. (1967b). A general theory of classificatory sorting strategies. II. Clustering systems. Computer J., 10, 271–277.
Google Scholar
Matula, D.W. (1977). Graph theoretic techniques for cluster analysis algorithms. In: Van Ryzin (Ed.), Advance Seminar on Classification and Clustering, pp. 95–129, Academic Press Inc., New York, N.Y.
Google Scholar
Raghavan V.V. and Yu, C.T. (1981). A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-3, 393–402.
Google Scholar
Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. J. of the American Statistical Association, Vol. 66, 846–850.
Google Scholar
Rohlf, F.J. (1974). Methods of comparing classifications. Annu. Rev. Ecol. Syst., 5, 101–113.
Google Scholar
Salton, G. (1975). Dynamic Information and Library Processing. Prentice-Hall, Englewood Cliffs, N.J.
Google Scholar
Sneath, P.H.A. and Sokal, R.R. (1973). Numerical Taxonomy. Freeman, San Francisco, Ca.
Google Scholar
Sokal, R.R. and Rohlf, F.J. (1962). The comparison of dendrograms by objective methods. Taxon, 11, 33–40.
Google Scholar
Watanabe, S. (1972). A unified view of clustering algorithms. In: Information Processing 71, North Holland Publishing Co., Amsterdam, 149–154.
Google Scholar
Yu, C.T. (1974). A clustering algorithm based on user queries. J. of the American Society for Information Science, 25, 218–226.
Google Scholar
Yu, C.T. (1976). The stability of two common matching functions in classification with respect to a proposed measure. J. of the American society for Information Science, 27, 248–255.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Dept., University of Regina, Regina, Sask., Canada
Vijay V. Raghavan
Datatron Corp., Lethbridge, Alta., Canada
M. Y. L. Ip

Authors

Vijay V. Raghavan
View author publications
You can also search for this author in PubMed Google Scholar
M. Y. L. Ip
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Gerard Salton Hans-Jochen Schneider

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raghavan, V.V., Ip, M.Y.L. (1983). Techniques for measuring the stability of clustering: A comparative study. In: Salton, G., Schneider, HJ. (eds) Research and Development in Information Retrieval. SIGIR 1982. Lecture Notes in Computer Science, vol 146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0036348

Download citation

DOI: https://doi.org/10.1007/BFb0036348
Published: 08 July 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-11978-4
Online ISBN: 978-3-540-39440-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics