Abstract
Hierarchical clustering algorithms are typically more effective in detecting the true clustering structure of a data set than partitioning algorithms. However, hierarchical clustering algorithms do not actually create clusters, but compute only a hierarchical representation of the data set. This makes them unsuitable as an automatic pre-processing step for other algorithms that operate on detected clusters. This is true for both dendrograms and reachability plots, which have been proposed as hierarchical clustering representations, and which have different advantages and disadvantages. In this paper we first investigate the relation between dendrograms and reachability plots and introduce methods to convert them into each other showing that they essentially contain the same information. Based on reachability plots, we then introduce a technique that automatically determines the significant clusters in a hierarchical cluster representation. This makes it for the first time possible to use hierarchical clustering as an automatic pre-processing step that requires no user interaction to select clusters from a hierarchical cluster representation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ankerst M., Breunig M. M., Kriegel H.-P., Sander J.: “OPTICS: Ordering Points To Identify the Clustering Structure”, Proc. ACM SIGMOD, Philadelphia, PA, 1999, pp 49–60.
Ester M., Kriegel H.-P., Sander J., Xu X.: “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, Proc. KDD’96, Portland, OR, 1996, pp. 226–231.
Hinneburg A., Keim D.: “An Efficient Approach to Clustering in Large Multimedia Databases with Noise”, KDD’98, New York City, NY, 1998.
Jain A. K., Dubes R. C.: “Algorithms for Clustering Data,” Prentice-Hall, Inc., 1988.
Knorr E. M., Ng R.T.: “Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining,” IEEE Trans. on Knowledge and Data Engineering, Vol. 8, No. 6, December 1996, pp. 884–897.
Kaufman L., Rousseeuw P. J.: “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley & Sons, 1990.
MacQueen J.: “Some Methods for Classification and Analysis of Multivariate Observations”, Proc. 5th Berkeley Symp. Math. Statist. Prob., 1967, Vol. 1, pp. 281–297.
Ng R. T., Han J.: “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proc. VLDB’94, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144v155.
Sheikholeslami G., Chatterjee S., Zhang A.: “WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases”, Proc. VLDB’98, New York, NY, 1998, pp. 428–439.
Sibson R.: “SLINK: an optimally efficient algorithm for the single-link cluster method”, The Computer Journal Vol. 16, No. 1, 1973, pp. 30–34.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A. (2003). Automatic Extraction of Clusters from Hierarchical Clustering Representations. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8_8
Download citation
DOI: https://doi.org/10.1007/3-540-36175-8_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-04760-5
Online ISBN: 978-3-540-36175-6
eBook Packages: Springer Book Archive