Skip to main content

Automatic Validation of Hierarchical Cluster Analysis with Application in Dialectometry

  • Conference paper
Classification — the Ubiquitous Challenge

Abstract

Successful applications of hierarchical cluster analysis in the area of quantitative linguistics were reported in the pioneering works by Goebl (1982, 1984, 1994). Often the dimensionality of linguistic data is high. Therefore multivariate statistical techniques like cluster analysis can to some degree support the researcher. However there is much room left for heuristics. Cluster analysis methods can be generalized by taking weights of observations into account. Using special weights leads to well-known resampling techniques. Here we offer an automatic validation technique for hierarchical cluster analysis that can be considered as a so-called built-in validation of the number of clusters and of each cluster itself, respectively. Furthermore this built-in validation can be used to find the appropriate cluster analysis model. As an illustration of an application in linguistics, the validation of results of hierarchical clustering based on the adjusted Rand's measure is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BANFIELD, J.D. and RAFTERY, A.E. (1993): Model-Based Gaussian and non-Gaussian Clustering. Biometrics, 49, 803–821.

    MathSciNet  Google Scholar 

  • BAUER, R. (2003): Dolomitenladinische Ähnlichkeitsprofile aus dem Gadertal; ein Werkstattbericht zur dialektometrischen Analyse des ALD-I. Ladinia XXVI–XXVII (2002–2003), 209–250.

    Google Scholar 

  • FRALEY, C. (1996): Algorithms for model-based Gaussian Hierarchical Clustering. Technical Report, 311. Department of Statistics, University of Washington, Seattle.

    Google Scholar 

  • GOEBL, H. (1982): Dialektometrie; Prinzipien und Methoden des Einsatzes der numerischen Taxonomie im Bereich der Dialektgeographie. Verlag der Öst. Akademie der Wissenschaften, Wien.

    Google Scholar 

  • GOEBL, H. (1984): Dialektometrische Studien anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF, vol. 1 (vol. 2 and 3 contain maps and tables). Max Niemeyer, Tübingen.

    Google Scholar 

  • GOEBL, H. (1994): Die Dialektale Gliederung Ladiniens aus der Sicht der Ladiner. Eine Pilotstudie zum Problem der geolinguistischen “Mental Maps”. Ladinia XVII, 59–95.

    Google Scholar 

  • GOEBL, H. (Ed.) (1998): Atlante linguistico del ladino dolomitico e dei dialetti limitrofi I (ALD I)-Sprachatlas des Dolomitenladinischen und angrenzender Dialekte I. Dr. Ludwig Reichert Verlag, Wiesbaden.

    Google Scholar 

  • GOWER, J.C. (1971): A General Coefficient of Similarity and some of its Properties. Biometrics, 27, 857–874.

    Google Scholar 

  • HAIMERL, E. (2004): Das Dialektometrieprojekt der Universität Salzburg. (in German and English). http://ald.sbg.ac.at/dm

    Google Scholar 

  • HUBERT, L.J. and ARABIE, P. (1985): Comparing Partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • KAUFMAN, L. and ROUSSEEUW, P.J. (1990): Finding Groups in Data. Wiley, New York.

    Google Scholar 

  • MUCHA, H.-J. (1992): Clusteranalyse mit Mikrocomputern. Akademie Verlag, Berlin.

    Google Scholar 

  • MUCHA, H.-J., SIMON, U. and BRÜGGEMANN, R. (2002): Model-based Cluster Analysis Applied to Flow Cytometry Data of Phytoplankton. Weierstraß-Institute for Applied Analysis and Stochastic, Technical Report No. 5. http://www.wias-berlin.de/.

    Google Scholar 

  • RAND, W.M. (1971): Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66, 846–850.

    Article  Google Scholar 

  • SPÄTH, H. (1985): Cluster Dissection and Analysis. Ellis Horwood, Chichester.

    Google Scholar 

  • WARD, J.H. (1963): Hierarchical Grouping Methods to Optimise an Objective Function. JASA, 58, 235–244.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Mucha, HJ., Haimerl, E. (2005). Automatic Validation of Hierarchical Cluster Analysis with Application in Dialectometry. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_60

Download citation

Publish with us

Policies and ethics