Abstract
Successful applications of hierarchical cluster analysis in the area of quantitative linguistics were reported in the pioneering works by Goebl (1982, 1984, 1994). Often the dimensionality of linguistic data is high. Therefore multivariate statistical techniques like cluster analysis can to some degree support the researcher. However there is much room left for heuristics. Cluster analysis methods can be generalized by taking weights of observations into account. Using special weights leads to well-known resampling techniques. Here we offer an automatic validation technique for hierarchical cluster analysis that can be considered as a so-called built-in validation of the number of clusters and of each cluster itself, respectively. Furthermore this built-in validation can be used to find the appropriate cluster analysis model. As an illustration of an application in linguistics, the validation of results of hierarchical clustering based on the adjusted Rand's measure is presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BANFIELD, J.D. and RAFTERY, A.E. (1993): Model-Based Gaussian and non-Gaussian Clustering. Biometrics, 49, 803–821.
BAUER, R. (2003): Dolomitenladinische Ähnlichkeitsprofile aus dem Gadertal; ein Werkstattbericht zur dialektometrischen Analyse des ALD-I. Ladinia XXVI–XXVII (2002–2003), 209–250.
FRALEY, C. (1996): Algorithms for model-based Gaussian Hierarchical Clustering. Technical Report, 311. Department of Statistics, University of Washington, Seattle.
GOEBL, H. (1982): Dialektometrie; Prinzipien und Methoden des Einsatzes der numerischen Taxonomie im Bereich der Dialektgeographie. Verlag der Öst. Akademie der Wissenschaften, Wien.
GOEBL, H. (1984): Dialektometrische Studien anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF, vol. 1 (vol. 2 and 3 contain maps and tables). Max Niemeyer, Tübingen.
GOEBL, H. (1994): Die Dialektale Gliederung Ladiniens aus der Sicht der Ladiner. Eine Pilotstudie zum Problem der geolinguistischen “Mental Maps”. Ladinia XVII, 59–95.
GOEBL, H. (Ed.) (1998): Atlante linguistico del ladino dolomitico e dei dialetti limitrofi I (ALD I)-Sprachatlas des Dolomitenladinischen und angrenzender Dialekte I. Dr. Ludwig Reichert Verlag, Wiesbaden.
GOWER, J.C. (1971): A General Coefficient of Similarity and some of its Properties. Biometrics, 27, 857–874.
HAIMERL, E. (2004): Das Dialektometrieprojekt der Universität Salzburg. (in German and English). http://ald.sbg.ac.at/dm
HUBERT, L.J. and ARABIE, P. (1985): Comparing Partitions. Journal of Classification, 2, 193–218.
KAUFMAN, L. and ROUSSEEUW, P.J. (1990): Finding Groups in Data. Wiley, New York.
MUCHA, H.-J. (1992): Clusteranalyse mit Mikrocomputern. Akademie Verlag, Berlin.
MUCHA, H.-J., SIMON, U. and BRÜGGEMANN, R. (2002): Model-based Cluster Analysis Applied to Flow Cytometry Data of Phytoplankton. Weierstraß-Institute for Applied Analysis and Stochastic, Technical Report No. 5. http://www.wias-berlin.de/.
RAND, W.M. (1971): Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66, 846–850.
SPÄTH, H. (1985): Cluster Dissection and Analysis. Ellis Horwood, Chichester.
WARD, J.H. (1963): Hierarchical Grouping Methods to Optimise an Objective Function. JASA, 58, 235–244.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Mucha, HJ., Haimerl, E. (2005). Automatic Validation of Hierarchical Cluster Analysis with Application in Dialectometry. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_60
Download citation
DOI: https://doi.org/10.1007/3-540-28084-7_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25677-9
Online ISBN: 978-3-540-28084-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)