Summary
In hierarchical cluster analysis dendrogram graphs are used to visualize how clusters are formed. Because each observation is displayed dendrograms are impractical when the data set is large. For non-hierarchical cluster algorithms (e.g. Kmeans) a graph like the dendrogram does not exist. This paper discusses a graph named “clustergram” to examine how cluster members are assigned to clusters as the number of clusters increases. The clustergram can also give insight into algorithms. For example, it can easily be seen that the “single linkage” algorithm tends to form clusters that consist of just one observation. It is also useful in distinguishing between random and deterministic implementations of the Kmeans algorithm. A data set related to asbestos claims and the Thailand Landmine Data are used throughout to illustrate the clustergram.










Similar content being viewed by others
Notes
1IThe Stata ado files can be obtained from https://doi.org/www.schonlau.net/clustergram.html or by emailing Mattbias_Sehonlau@rand.org.
References
Everitt, B.S., Dunn, G. (1991),Applied Multivariate Data Analysis, New York: John Wiley & Sons.
Hand, D., Mannila, H., Smyth, P. (2001),Principles of Data Mining, Cambridge, MA: Massachusetts Institute of Technology.
Hartigan, J.A. (1975).Clustering Algorithms. New York: Wiley.
Hartigan, J.A., Wong, M.A. (1979), A k-means clustering algorithm.Applied Statistics, 28, 100–108.
Johnson R.A., Wichern D.W. (1988),Applied Multivariate Analysis, 2nd ed, Englewood Cliffs, NJ: Prentice Hall.
MacQueen, J. (1967), Some methods for classification and analysis of multivariate observations,Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. L.M. LeCam and J. Neyman (eds.) Berkeley: University of California Press, 281–297.
Rousseuw, P.J. (1987), Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.
Schonlau, M. (2002) The clustergram: a graph for visualizing hierarchical and non-hierarchical cluster analyses.The Stata Journal, 2, 4, 391–402.
Survey Action Center (2002),Landmine Impact Survey Executive Summary: Kingdom of Thailand. Implemented by the Survey Action Center and Norwegian’s Peoples Aid. Certified by the United Nations Certifications Committee. Downloadable from https://doi.org/www.sac-na.org/resources_report_thailand.html (last accessed on April 29, 2003).
Acknowledgement
I am grateful for support from the RAND statistics group. I am grateful for discussions with Brad Efron, members of the RAND statistics group, participants of the 2002 Augsburg (Germany) workshop on data visualization and for comments from two anonymous referees. I am grateful to Steve Carroll at RAND for involving me in the Asbestos project, which prompted this work. I am grateful to Aldo Benini who was part of the Landmine Impact Project and gave me access to the data.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Schonlau, M. Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams. Computational Statistics 19, 95–111 (2004). https://doi.org/10.1007/BF02915278
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF02915278