Skip to main content
Log in

Chimeral Clustering

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Hybrid species tend to exhibit a mixture of parent characteristics; we propose chimeral clusters as exhibiting a mixture of parent parameters, a type of intercluster structure. Morphometric measurements in the iris dataset describe the hybrid Iris versicolor as intermediate to those of parent species Iris setosa and Iris virginica, which motivates our extension of Gaussian mixture models to allow mixing in the parameter space. We propose a mixing mechanism whereby chimeral clusters are parameterized by a convex combination of fully varying prototype cluster parameters and characterize the identifiability of the postulated mixture model. Estimation of chimeral clustering models is described using variations of the expectation-maximization algorithm and the solution to the continuous-time algebraic Riccati equation. The efficacy of chimeral clustering is demonstrated using morphometric datasets describing iris, Cooper’s hawks, and water striders, with comparisons to typical Gaussian mixture models. We evaluate parameter recovery on a synthetic dataset and demonstrate that parsimonious covariance matrices and chimeral clustering capture different kinds of intercluster structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Anderson, E. (1936). The species problem in Iris. Annals of the Missouri Botanical Garden, 23(3), 457–509.

    Article  Google Scholar 

  • Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.

    Article  MathSciNet  Google Scholar 

  • Battle, A., Segal, E., & Koller, D. (2005). Probabilistic discovery of overlapping cellular processes and their regulation. Journal of Computational Biology, 12(7), 909–927. pMID 16201912.

    Article  Google Scholar 

  • Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the em algorithm for getting the highest likelihood in multivariate gaussian mixture models. Computational Statistics & Data Analysis, 41(3), 561–575. recent Developments in Mixture Model.

    Article  MathSciNet  Google Scholar 

  • Blei, D.M., Ng, A.Y., & Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  • Browne, R.P., & McNicholas, P.D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8(2), 217–226.

    Article  MathSciNet  Google Scholar 

  • Cannon, A., Cobb, G., Hartlaub, B., Legler, J., Lock, R., Moore, T., Rossman, A., & Witmer, J. (2019). Stat2data: Datasets for Stat2. R package version 2.0.0.

  • Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.

    Article  Google Scholar 

  • Clarkson, D.B., & Jennrich, R.I. (1988). Quartic rotation criteria and algorithms. Psychometrika, 53(2), 251–259.

    Article  MathSciNet  Google Scholar 

  • De Leeuw, J., & Heiser, W.J. (1977). Convergence of correction matrix algorithms for multidimensional scaling. Geometric Representations of Relational Data, 735–752.

  • Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.

    MathSciNet  MATH  Google Scholar 

  • Erosheva, E.A., Fienberg, S.E., & Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. The Annals of Applied Statistics, 1(2), 346–384. 21687832[pmid].

    Article  MathSciNet  Google Scholar 

  • Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

    Article  Google Scholar 

  • Fraley, C., & Raftery, A.E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.

    Article  MathSciNet  Google Scholar 

  • Grünbaum, B. (2003). Convex Polytopes. New York: Springer.

    Book  Google Scholar 

  • Hansen, F., & Pedersen, G.K. (2003). Jensen’s operator inequality. Bulletin of the London Mathematical Society, 35(4), 553–564.

    Article  MathSciNet  Google Scholar 

  • Heller, K.A., Williamson, S., & Ghahramani, Z. (2008). Statistical models for partial membership. In Proceedings of the 25th international conference on machine learning, association for computing machinery, New York, NY, USA, ICML ’08 (pp. 392–399).

  • Holzmann, H., Munk, A., & Gneiting, T. (2006). Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33 (4), 753–763.

    Article  MathSciNet  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  • Hunter, D.R., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58(1), 30–37.

    Article  MathSciNet  Google Scholar 

  • Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23(3), 187–200.

    Article  Google Scholar 

  • Klingenberg, C.P., & Spence, J.R. (1993). Heterochrony and allometry: Lessons from the water strider genus limnoporus. Evolution, 47(6), 1834–1853.

    Article  Google Scholar 

  • Laub, A. (1979). A Schur method for solving algebraic Riccati equations. IEEE Transactions on Automatic Control, 24(6), 913–921.

    Article  MathSciNet  Google Scholar 

  • McNicholas, P.D., & Murphy, T.B. (2008). Parsimonious gaussian mixture models. Statistics and Computing, 18(3), 285–296.

    Article  MathSciNet  Google Scholar 

  • Meng, X.L., & Rubin, D.B. (1993). Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika, 80(2), 267–278.

    Article  MathSciNet  Google Scholar 

  • Ortega, J.M., & Rheinboldt, W.C. (2000). Iterative solution of nonlinear equations in several variables. Society for industrial and applied mathematics.

  • Pritchard, J.K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155(2), 945.

    Article  Google Scholar 

  • Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.

    Article  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.

    Article  MathSciNet  Google Scholar 

  • Scrucca, L., Fop, M., Murphy, T.B., & Raftery, A.E. (2016). mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 289–317.

    Article  Google Scholar 

  • Shapiro, A. (1985). Identifiability of factor analysis: some results and open problems. Linear Algebra and its Applications, 70, 1–7.

    Article  MathSciNet  Google Scholar 

  • Symons, M.J. (1981). Clustering criteria and multivariate normal mixtures. Biometrics, 37(1), 35–43.

    Article  MathSciNet  Google Scholar 

  • Teicher, H. (1961). Maximum likelihood characterization of distributions. Annals of Mathematical Statistics, 32(4), 1214–1222.

    Article  MathSciNet  Google Scholar 

  • Wolfe, J.H. (1963). Object cluster analysis of social areas. PhD thesis, University of California.

  • Woodbury, M.A., Clive, J., & Garson, A. (1978). Mathematical typology: a grade of membership technique for obtaining disease definition. Computers and Biomedical Research, 11(3), 277–298.

    Article  Google Scholar 

  • Yakowitz, S.J., & Spragins, J.D. (1968). On the identifiability of finite mixtures. Annals of Mathematical Statistics, 39(1), 209–214.

    Article  MathSciNet  Google Scholar 

  • Zhang, J. (2013). Epistatic clustering: a model-based approach for identifying links between clusters. Journal of the American Statistical Association, 108 (504), 1366–1384.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason Hou-Liu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou-Liu, J., Browne, R.P. Chimeral Clustering. J Classif 39, 171–190 (2022). https://doi.org/10.1007/s00357-021-09396-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-021-09396-3

Keywords