Skip to main content
Log in

GEVA: geometric variability-based approaches for identifying patterns in data

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This paper, arising from population studies, develops clustering algorithms for identifying patterns in data. Based on the concept of geometric variability, we have developed one polythetic-divisive and three agglomerative algorithms. The effectiveness of these procedures is shown by relating them to classical clustering algorithms. They are very general since they do not impose constraints on the type of data, so they are applicable to general (economics, ecological, genetics...) studies. Our major contributions include a rigorous formulation for novel clustering algorithms, and the discovery of new relationship between geometric variability and clustering. Finally, these novel procedures give a theoretical frame with an intuitive interpretation to some classical clustering methods to be applied with any type of data, including mixed data. These approaches are illustrated with real data on Drosophila chromosomal inversions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson MJ, Robinson J (2003) Generalized discriminant analysis based on distances. Aust N Z J Stat 45: 301–318

    Article  MATH  MathSciNet  Google Scholar 

  • Anderson MJ, Willis TJ (2003) Canonical analysis of principal coordinates: a useful method of constrained ordination for ecology. Ecology 84: 511–525

    Article  Google Scholar 

  • Arenas C, Cuadras CM (2002) Some recent statistical methods based on distances. Contrib Sci 2: 183–191

    Google Scholar 

  • Balanyà J, Solé E, Oller JM, Sperlich D, Serra L (2004) Long-term changes in chromosomal inversion polymorphism of D. subobscura. II. European populations. J Zool Syst Evol Res 42: 191–201

    Article  Google Scholar 

  • Balanyà J, Oller JM, Huey RB, Gilchrist GW, Serra L (2006) Global genetic change tracks global climate warming in D. subobscura. Science 313: 1773–1775

    Article  Google Scholar 

  • Bhattacharyya A (1946) On a measure of divergence of two multinominal populations Sankhyā. Indian J Stat 7: 401–406

    MATH  Google Scholar 

  • Calinski R, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3: 1–27

    Article  MathSciNet  Google Scholar 

  • Cuadras CM (1992) Probability distributions with given multivariate marginals and given dependence structure. J Multivar Anal 42: 51–66

    Article  MATH  MathSciNet  Google Scholar 

  • Cuadras CM, Arenas C (1990) A distance based regression model for prediction with mixed data. Commun Stat Theory Methods 19: 2261–2279

    Article  MathSciNet  Google Scholar 

  • Cuadras CM, Fortiana J (1995) A continuous metric scaling solution for a random variable. J Multivar Anal 32: 1–14

    Article  MathSciNet  Google Scholar 

  • Cuadras CM, Fortiana J, Oliva F (1997) The proximity of an individual to a population with applications in discriminant analysis. J Classif 14: 117–136

    Article  MATH  MathSciNet  Google Scholar 

  • Edwards AWF, Cavalli-Sforza LL (1965) A method for cluster analysis. Biometrics 21: 362–375

    Article  Google Scholar 

  • Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53: 325–338

    MATH  MathSciNet  Google Scholar 

  • Gower JC (1985) Measures of similarity, dissimilarity and distance. In: Kotz S, Johson NL, Read CB (eds) Encyclopedia of statistical sciences. Wiley, New York, pp 307–316

    Google Scholar 

  • Gower JC, Krzanowski WJ (1999) Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance. J R Stat Soc Ser C Appl Stat 48: 505–519

    Article  MATH  Google Scholar 

  • Gower JC, Legendre P (1986) Metric and euclidean properties of dissimilarity coefficients. J Classif 3: 5–48

    Article  MATH  MathSciNet  Google Scholar 

  • Irigoien I, Arenas C (2008) INCA: new statistic for estimating the number of clusters, and identifying atypical units. Stat Med 27: 2948–2973

    Article  MathSciNet  Google Scholar 

  • Krimbas CB (1993) D. subobscura biology, genetics and inversion polymorphism. Verlag, Dr. Kovac, Hamburg

    Google Scholar 

  • Krzanowski WJ (2004) Biplots for multifactorial analysis of distance. Biometrics 60: 517–524

    Article  MATH  MathSciNet  Google Scholar 

  • Krzanowski WJ, Marriott FHC (1994) Multivariate analysis part 1: distributions, ordination and inference. Kendall’s Library of Statistics, Edward Arnold, London

    Google Scholar 

  • Lance GN, Williams WT (1967) A general theory of classification sorting strategies: 1. Hierarchical systems. Comput J 9: 373–380

    Google Scholar 

  • Legendre P, Anderson MJ (1999) Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecol Monogr 48: 505–519

    Google Scholar 

  • Lingoes JC (1971) Some boundary conditions for a monotone analysis of symmetric matrices. Psychometrika 36: 195–203

    Article  MATH  MathSciNet  Google Scholar 

  • Mestres F, Balanyà J, Pascual M, Arenas C, Gilchrist GW, Huey RB, Serra L (2009) Evolution of Chilean colonizing populations of D. subobscura: lethal genes and chromosomal arrangements. Genetica 136: 37–48

    Article  Google Scholar 

  • Prevosti A, Ribó G, Serra L, Aguadé M, Balanyà J, Monclús M, Mestres F (1988) Colonization of America by D. subobscura: experiment in natural populations that supports the adaptative role of chromosomal-inversion polymorphism. Proc Natl Acad Sci USA 85: 5597–5600

    Article  Google Scholar 

  • Rao CR (1982) Diversity: its measurement, decomposition, apportionment and analysis Sankhyā. Indian J Stat 44: 1–22

    MATH  Google Scholar 

  • Solé E, Mestres F, Balanyà J, Arenas C, Serra L (2000) Colonization of America by D. subobscura: spatial and temporal lethal-gene allelism. Hereditas 133: 65–72

    Article  Google Scholar 

  • Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58: 236–244

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Concepcion Arenas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Irigoien, I., Arenas, C., Fernández, E. et al. GEVA: geometric variability-based approaches for identifying patterns in data. Comput Stat 25, 241–255 (2010). https://doi.org/10.1007/s00180-009-0173-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-009-0173-9

Keywords