Abstract
Among the significant factors in assessing the suitability of a clustering technique to a given application is its stability; that is, how sensitive the algorithm is to perturbations in the input data. A number of techniques that appear to be suitable for measuring the stability of clustering have been published in the literature. The details about each of these measures, such as a description of the steps involved in their computation and an identification of precisely what they measure, are presented. These measures are considered in the context of analysing the stability characteristics of clustering techniques and are compared using a framework developed for this purpose. The question of generalizing some of these measures is addressed and the measures are also analyzed to identify conditions under which they can be reduced to one another.
This research has been supported in part by a grant from Natural Sciences and Research Council of Canada.
Preview
Unable to display preview. Download preview PDF.
References
Arabie, P. and Boorman, S.A. (1973). Multidimensional scaling of measures of distance between partitions. J. Math. Psych., 10, 148–203.
Ball, G.H. (1965). Data analysis in social sciences: What about details. Proceedings AFIPS. FJCC, Macmillan, New York, N.Y., 533–559.
Bonner, R.E. (1964). On some clustering techniques. IBM J. of Research and Development, 8, 22–32.
Boorman, S.A. and Olivier, D.C. (1973). Metrics on spaces of finite trees, J. Math. Psych., 10, 26–59.
Borko, H., Blakenship, D.A. and Burket, R.C. (1968). On-line information retrieval using associative indexing. Technical Report, Systems Development Corporation, RACD-TR-68-100.
Cormack, R.M. (1971). A review of classification. J. Royal Statistical Society-Series A, 134, 321–367.
Corneil D.G. and Woodward, M.E. (1978). A comparison and evaluation of graph theoretical clustering techniques. INFOR, 16, 74–89.
Day, W.H.E. (1977). Validity of clusters formed by graph-theoretic cluster methods. Mathematical Biosciences, 36, 229–317.
Day, W.H.E. (1979). The complexity of computing metric distances between partitions. Technical Report No. 7901, Memorial University of Newfoundland, St. John's, Newfoundland, Canada.
Farris, J.S. (1969). A successive approximation approach to character weighting. Syst. Zool., 18, 374–385.
Jackson, D.M. (1969). Comparison of classifications. In: Cole (Ed.), Numerical Taxonomy, pp. 91–111, Academic Press Inc., New York, N.Y.
Jardine, N. and Sibson, R. (1971). Mathematical Taxonomy. John Wiley & Sons, Inc., New York, N.Y.
Johnson S.C. (1967). Hierarchical clustering schemes. Psychometrika, 12, 241–254.
Kendall, M.G. (1938). A new measure of rank correlation. Biometrika, 30, 81–93.
Lance, G.N. and Williams, W.T. (1967a). A general theory of classificatory sorting strategies. I. Hierarchical system. Computer J., 9, 373–382.
Lance, G.N. and Williams, W.T. (1967b). A general theory of classificatory sorting strategies. II. Clustering systems. Computer J., 10, 271–277.
Matula, D.W. (1977). Graph theoretic techniques for cluster analysis algorithms. In: Van Ryzin (Ed.), Advance Seminar on Classification and Clustering, pp. 95–129, Academic Press Inc., New York, N.Y.
Raghavan V.V. and Yu, C.T. (1981). A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-3, 393–402.
Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. J. of the American Statistical Association, Vol. 66, 846–850.
Rohlf, F.J. (1974). Methods of comparing classifications. Annu. Rev. Ecol. Syst., 5, 101–113.
Salton, G. (1975). Dynamic Information and Library Processing. Prentice-Hall, Englewood Cliffs, N.J.
Sneath, P.H.A. and Sokal, R.R. (1973). Numerical Taxonomy. Freeman, San Francisco, Ca.
Sokal, R.R. and Rohlf, F.J. (1962). The comparison of dendrograms by objective methods. Taxon, 11, 33–40.
Watanabe, S. (1972). A unified view of clustering algorithms. In: Information Processing 71, North Holland Publishing Co., Amsterdam, 149–154.
Yu, C.T. (1974). A clustering algorithm based on user queries. J. of the American Society for Information Science, 25, 218–226.
Yu, C.T. (1976). The stability of two common matching functions in classification with respect to a proposed measure. J. of the American society for Information Science, 27, 248–255.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1983 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Raghavan, V.V., Ip, M.Y.L. (1983). Techniques for measuring the stability of clustering: A comparative study. In: Salton, G., Schneider, HJ. (eds) Research and Development in Information Retrieval. SIGIR 1982. Lecture Notes in Computer Science, vol 146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0036348
Download citation
DOI: https://doi.org/10.1007/BFb0036348
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-11978-4
Online ISBN: 978-3-540-39440-2
eBook Packages: Springer Book Archive