Chimeral Clustering

Hou-Liu, Jason; Browne, Ryan P.

doi:10.1007/s00357-021-09396-3

Chimeral Clustering

Published: 02 October 2021

Volume 39, pages 171–190, (2022)
Cite this article

Journal of Classification Aims and scope Submit manuscript

391 Accesses
Explore all metrics

Abstract

Hybrid species tend to exhibit a mixture of parent characteristics; we propose chimeral clusters as exhibiting a mixture of parent parameters, a type of intercluster structure. Morphometric measurements in the iris dataset describe the hybrid Iris versicolor as intermediate to those of parent species Iris setosa and Iris virginica, which motivates our extension of Gaussian mixture models to allow mixing in the parameter space. We propose a mixing mechanism whereby chimeral clusters are parameterized by a convex combination of fully varying prototype cluster parameters and characterize the identifiability of the postulated mixture model. Estimation of chimeral clustering models is described using variations of the expectation-maximization algorithm and the solution to the continuous-time algebraic Riccati equation. The efficacy of chimeral clustering is demonstrated using morphometric datasets describing iris, Cooper’s hawks, and water striders, with comparisons to typical Gaussian mixture models. We evaluate parameter recovery on a synthetic dataset and demonstrate that parsimonious covariance matrices and chimeral clustering capture different kinds of intercluster structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Models for Jointly Estimating Abundances of Two Unmarked Site-Associated Species Subject to Imperfect Detection

Article 05 December 2017

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Article 14 January 2022

Stochastic analysis of the extra clustering model for animal grouping

Article 31 October 2015

References

Anderson, E. (1936). The species problem in Iris. Annals of the Missouri Botanical Garden, 23(3), 457–509.
Article Google Scholar
Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.
Article MathSciNet Google Scholar
Battle, A., Segal, E., & Koller, D. (2005). Probabilistic discovery of overlapping cellular processes and their regulation. Journal of Computational Biology, 12(7), 909–927. pMID 16201912.
Article Google Scholar
Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the em algorithm for getting the highest likelihood in multivariate gaussian mixture models. Computational Statistics & Data Analysis, 41(3), 561–575. recent Developments in Mixture Model.
Article MathSciNet Google Scholar
Blei, D.M., Ng, A.Y., & Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
MATH Google Scholar
Browne, R.P., & McNicholas, P.D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8(2), 217–226.
Article MathSciNet Google Scholar
Cannon, A., Cobb, G., Hartlaub, B., Legler, J., Lock, R., Moore, T., Rossman, A., & Witmer, J. (2019). Stat2data: Datasets for Stat2. R package version 2.0.0.
Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.
Article Google Scholar
Clarkson, D.B., & Jennrich, R.I. (1988). Quartic rotation criteria and algorithms. Psychometrika, 53(2), 251–259.
Article MathSciNet Google Scholar
De Leeuw, J., & Heiser, W.J. (1977). Convergence of correction matrix algorithms for multidimensional scaling. Geometric Representations of Relational Data, 735–752.
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.
MathSciNet MATH Google Scholar
Erosheva, E.A., Fienberg, S.E., & Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. The Annals of Applied Statistics, 1(2), 346–384. 21687832[pmid].
Article MathSciNet Google Scholar
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.
Article Google Scholar
Fraley, C., & Raftery, A.E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.
Article MathSciNet Google Scholar
Grünbaum, B. (2003). Convex Polytopes. New York: Springer.
Book Google Scholar
Hansen, F., & Pedersen, G.K. (2003). Jensen’s operator inequality. Bulletin of the London Mathematical Society, 35(4), 553–564.
Article MathSciNet Google Scholar
Heller, K.A., Williamson, S., & Ghahramani, Z. (2008). Statistical models for partial membership. In Proceedings of the 25th international conference on machine learning, association for computing machinery, New York, NY, USA, ICML ’08 (pp. 392–399).
Holzmann, H., Munk, A., & Gneiting, T. (2006). Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33 (4), 753–763.
Article MathSciNet Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Article Google Scholar
Hunter, D.R., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58(1), 30–37.
Article MathSciNet Google Scholar
Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23(3), 187–200.
Article Google Scholar
Klingenberg, C.P., & Spence, J.R. (1993). Heterochrony and allometry: Lessons from the water strider genus limnoporus. Evolution, 47(6), 1834–1853.
Article Google Scholar
Laub, A. (1979). A Schur method for solving algebraic Riccati equations. IEEE Transactions on Automatic Control, 24(6), 913–921.
Article MathSciNet Google Scholar
McNicholas, P.D., & Murphy, T.B. (2008). Parsimonious gaussian mixture models. Statistics and Computing, 18(3), 285–296.
Article MathSciNet Google Scholar
Meng, X.L., & Rubin, D.B. (1993). Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika, 80(2), 267–278.
Article MathSciNet Google Scholar
Ortega, J.M., & Rheinboldt, W.C. (2000). Iterative solution of nonlinear equations in several variables. Society for industrial and applied mathematics.
Pritchard, J.K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155(2), 945.
Article Google Scholar
Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.
Article Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.
Article MathSciNet Google Scholar
Scrucca, L., Fop, M., Murphy, T.B., & Raftery, A.E. (2016). mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 289–317.
Article Google Scholar
Shapiro, A. (1985). Identifiability of factor analysis: some results and open problems. Linear Algebra and its Applications, 70, 1–7.
Article MathSciNet Google Scholar
Symons, M.J. (1981). Clustering criteria and multivariate normal mixtures. Biometrics, 37(1), 35–43.
Article MathSciNet Google Scholar
Teicher, H. (1961). Maximum likelihood characterization of distributions. Annals of Mathematical Statistics, 32(4), 1214–1222.
Article MathSciNet Google Scholar
Wolfe, J.H. (1963). Object cluster analysis of social areas. PhD thesis, University of California.
Woodbury, M.A., Clive, J., & Garson, A. (1978). Mathematical typology: a grade of membership technique for obtaining disease definition. Computers and Biomedical Research, 11(3), 277–298.
Article Google Scholar
Yakowitz, S.J., & Spragins, J.D. (1968). On the identifiability of finite mixtures. Annals of Mathematical Statistics, 39(1), 209–214.
Article MathSciNet Google Scholar
Zhang, J. (2013). Epistatic clustering: a model-based approach for identifying links between clusters. Journal of the American Statistical Association, 108 (504), 1366–1384.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
Jason Hou-Liu & Ryan P. Browne

Authors

Jason Hou-Liu
View author publications
You can also search for this author inPubMed Google Scholar
Ryan P. Browne
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jason Hou-Liu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

(PDF 2.00 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hou-Liu, J., Browne, R.P. Chimeral Clustering. J Classif 39, 171–190 (2022). https://doi.org/10.1007/s00357-021-09396-3

Download citation

Accepted: 10 August 2021
Published: 02 October 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00357-021-09396-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Chimeral Clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Models for Jointly Estimating Abundances of Two Unmarked Site-Associated Species Subject to Imperfect Detection

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Stochastic analysis of the extra clustering model for animal grouping

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

(PDF 2.00 MB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now