Abstract
This paper describes a contribution to the GfKl 2004 Contest. The contest task is to cluster, classify and interpret the 170 districts of the city of Dortmund with respect to their ‘social milieux’. A data set containing 204 variables measured for every district is given.
We apply annealed κ-means clustering to the preprocessed contest data. Superparamagnetic clustering is used to foster insight into the natural partitions of the data. A stable and interpretable solution is obtained with κ = 3 clusters, dividing Dortmund into three social milieux. A decision tree is deduced from this cluster solution and is used for interpretation and rule generation. The tree offers the possibility to monitor and predict future assessments. To gain information about cluster solutions with κ > 3 a stability analysis based on a resampling approach is performed resulting in further interesting insights.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BLATT, M., WISEMAN, S. and DOMANY, E. (1996): Super-parametric clustering of data. Physical Review Letters, 76.
DUDA, R.O., HART, P.E. and STORK, D.G. (2001): Pattern classification. John Wiley & Sons, second edition.
HOFMANN, T. and BUHMANN, J. (1997): Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(19), 1–14.
QUINLAN, J.R. (1993): C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo.
ROTH, V., BRAUN, M., LANGE, T. and BUHMANN, J. (2002): A Resampling Approach to Cluster Validation. Computational Statistics-COMPSTAT'02, 123–128.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Schäfer, C., Laub, J. (2005). Annealed κ-Means Clustering and Decision Trees. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_81
Download citation
DOI: https://doi.org/10.1007/3-540-28084-7_81
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25677-9
Online ISBN: 978-3-540-28084-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)