Automatic Validation of Hierarchical Cluster Analysis with Application in Dialectometry

Mucha, Hans-Joachim; Haimerl, Edgar

doi:10.1007/3-540-28084-7_60

Hans-Joachim Mucha²¹ &
Edgar Haimerl²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2279 Accesses
6 Citations

Abstract

Successful applications of hierarchical cluster analysis in the area of quantitative linguistics were reported in the pioneering works by Goebl (1982, 1984, 1994). Often the dimensionality of linguistic data is high. Therefore multivariate statistical techniques like cluster analysis can to some degree support the researcher. However there is much room left for heuristics. Cluster analysis methods can be generalized by taking weights of observations into account. Using special weights leads to well-known resampling techniques. Here we offer an automatic validation technique for hierarchical cluster analysis that can be considered as a so-called built-in validation of the number of clusters and of each cluster itself, respectively. Furthermore this built-in validation can be used to find the appropriate cluster analysis model. As an illustration of an application in linguistics, the validation of results of hierarchical clustering based on the adjusted Rand's measure is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BANFIELD, J.D. and RAFTERY, A.E. (1993): Model-Based Gaussian and non-Gaussian Clustering. Biometrics, 49, 803–821.
MathSciNet Google Scholar
BAUER, R. (2003): Dolomitenladinische Ähnlichkeitsprofile aus dem Gadertal; ein Werkstattbericht zur dialektometrischen Analyse des ALD-I. Ladinia XXVI–XXVII (2002–2003), 209–250.
Google Scholar
FRALEY, C. (1996): Algorithms for model-based Gaussian Hierarchical Clustering. Technical Report, 311. Department of Statistics, University of Washington, Seattle.
Google Scholar
GOEBL, H. (1982): Dialektometrie; Prinzipien und Methoden des Einsatzes der numerischen Taxonomie im Bereich der Dialektgeographie. Verlag der Öst. Akademie der Wissenschaften, Wien.
Google Scholar
GOEBL, H. (1984): Dialektometrische Studien anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF, vol. 1 (vol. 2 and 3 contain maps and tables). Max Niemeyer, Tübingen.
Google Scholar
GOEBL, H. (1994): Die Dialektale Gliederung Ladiniens aus der Sicht der Ladiner. Eine Pilotstudie zum Problem der geolinguistischen “Mental Maps”. Ladinia XVII, 59–95.
Google Scholar
GOEBL, H. (Ed.) (1998): Atlante linguistico del ladino dolomitico e dei dialetti limitrofi I (ALD I)-Sprachatlas des Dolomitenladinischen und angrenzender Dialekte I. Dr. Ludwig Reichert Verlag, Wiesbaden.
Google Scholar
GOWER, J.C. (1971): A General Coefficient of Similarity and some of its Properties. Biometrics, 27, 857–874.
Google Scholar
HAIMERL, E. (2004): Das Dialektometrieprojekt der Universität Salzburg. (in German and English). http://ald.sbg.ac.at/dm
Google Scholar
HUBERT, L.J. and ARABIE, P. (1985): Comparing Partitions. Journal of Classification, 2, 193–218.
Article Google Scholar
KAUFMAN, L. and ROUSSEEUW, P.J. (1990): Finding Groups in Data. Wiley, New York.
Google Scholar
MUCHA, H.-J. (1992): Clusteranalyse mit Mikrocomputern. Akademie Verlag, Berlin.
Google Scholar
MUCHA, H.-J., SIMON, U. and BRÜGGEMANN, R. (2002): Model-based Cluster Analysis Applied to Flow Cytometry Data of Phytoplankton. Weierstraß-Institute for Applied Analysis and Stochastic, Technical Report No. 5. http://www.wias-berlin.de/.
Google Scholar
RAND, W.M. (1971): Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66, 846–850.
Article Google Scholar
SPÄTH, H. (1985): Cluster Dissection and Analysis. Ellis Horwood, Chichester.
Google Scholar
WARD, J.H. (1963): Hierarchical Grouping Methods to Optimise an Objective Function. JASA, 58, 235–244.
Google Scholar

Download references

Author information

Authors and Affiliations

Weierstraß-Institut für Angewandte Analysis und Stochastik (WIAS), Mohrenstraße 39, 10117, Berlin, Germany
Hans-Joachim Mucha
Institut für Romanistik, Universität Salzburg, Akademiestraße 24, 5024, Salzburg, Austria
Edgar Haimerl

Authors

Hans-Joachim Mucha
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Haimerl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich Statistik, Universität Dortmund, 44221, Dortmund
Claus Weihs
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mucha, HJ., Haimerl, E. (2005). Automatic Validation of Hierarchical Cluster Analysis with Application in Dialectometry. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_60

Download citation

DOI: https://doi.org/10.1007/3-540-28084-7_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25677-9
Online ISBN: 978-3-540-28084-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics