Factor and hybrid components for model-based clustering

Hou-Liu, Jason; Browne, Ryan P.

doi:10.1007/s11634-021-00483-2

Factor and hybrid components for model-based clustering

Regular Article
Published: 17 January 2022

Volume 16, pages 373–398, (2022)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Jason Hou-Liu¹ &
Ryan P. Browne¹

347 Accesses
Explore all metrics

Abstract

A major challenge when performing model-based clustering is a large increase in the number of free parameters as the data dimensionality increases. To combat this issue, parsimonious methods such allow component covariance matrices to share parameters by exploiting geometric redundancies. The present work considers an additional level of intracluster structure that also captures hybridisation of mean and covariance parameters between components for the multivariate normal distribution. We posit components with heterogeneous parameterisation; a subset are considered factor components and have explicit mean and covariance parameters, and the remainder are considered hybrid components that have means and covariances implied by a set of factor loadings that weight factor component parameters. An estimation procedure is provided using the Expectation-Maximization algorithm, and comparison to Gaussian mixture models with parsimonious covariances is made by evaluation on a collection of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gaussian parsimonious clustering models with covariates and a noise component

Article 20 September 2019

Keefe Murphy & Thomas Brendan Murphy

Cluster-weighted $$t$$ -factor analyzers for robust model-based clustering and dimension reduction

Article 01 March 2015

Sanjeena Subedi, Antonio Punzo, … Paul D. McNicholas

Gaussian mixture model with an extended ultrametric covariance structure

Article 25 February 2022

Carlo Cavicchia, Maurizio Vichi & Giorgia Zaccaria

References

Airoldi EM, Blei DM, Fienberg SE, Xing EP (2008) Mixed membership stochastic blockmodels. J Mach Learn Res
Airoldi EM, Blei D, Erosheva EA, Fienberg SE (2014) Handbook of mixed membership models and their applications. CRC Press
Anderson E (1936) The species problem in Iris. Ann Missouri Botanical Garden 23(3):457–509. https://doi.org/10.2307/23941641
Article Google Scholar
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821. https://doi.org/10.2307/2532201
Battle A, Segal E, Koller D (2005) Probabilistic discovery of overlapping cellular processes and their regulation. J Comput Biol 12(7):909–927. https://doi.org/10.1089/cmb.2005.12.909 (pMID: 16201912)
Article Google Scholar
Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the em algorithm for getting the highest likelihood in multivariate gaussian mixture models. Comput Stat Data Anal 41(3):561–575. https://doi.org/10.1016/S0167-9473(02)00163-9
Article MathSciNet MATH Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46(2):373–388. https://doi.org/10.1007/BF01720593
Article MATH Google Scholar
Browne RP, McNicholas PD (2014) Estimating common principal components in high dimensions. Adv Data Anal Classif 8(2):217–226. https://doi.org/10.1007/s11634-013-0139-1
Article MathSciNet MATH Google Scholar
Celeux G, Govaert G (1993) Comparison of the mixture and the classification maximum likelihood in cluster analysis. J Stat Comput Simul 47(3–4):127–146. https://doi.org/10.1080/00949659308811525
Article Google Scholar
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn 28(5):781–793. https://doi.org/10.1016/0031-3203(94)00125-6
Article Google Scholar
Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4):338–347
Article MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Article MathSciNet MATH Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Article Google Scholar
Forina M, Armanino C, Lanteri S, Tiscornia E (1983) Classification of olive oils from their fatty acid composition. In: Food research and data analysis: proceedings from the IUFoST Symposium, September 20-23, 1982, Oslo, Norway/edited by H. Martens and H. Russwurm, Jr, London: Applied Science Publishers, 1983
Fraley C (1998) Algorithms for model-based gaussian hierarchical clustering. SIAM J Sci Comput 20(1):270–281. https://doi.org/10.1137/s1064827596311451
Article MathSciNet MATH Google Scholar
Ghahramani Z, Hinton GE, et al. (1996) The EM algorithm for mixtures of factor analyzers. Tech. rep., Technical Report CRG-TR-96-1, University of Toronto
Goldfarb D, Idnani A (1983) A numerically stable dual method for solving strictly convex quadratic programs. Math Program 27(1):1–33
Article MathSciNet Google Scholar
Gormley IC, Murphy TB (2009) A grade of membership model for rank data. Bayesian Anal 4(2):265–295. https://doi.org/10.1214/09-BA410
Gruber PM (2007) Convex and discrete geometry. Springer
Grünbaum B (2003) Convex polytopes. Springer
Heller KA, Williamson S, Ghahramani Z (2008) Statistical models for partial membership. In: Proceedings of the 25th international conference on machine learning, association for computing machinery, New York, NY, USA, ICML ’08, p 392–399, https://doi.org/10.1145/1390156.1390206
Holzmann H, Munk A, Gneiting T (2006) Identifiability of finite mixtures of elliptical distributions. Scand J Stat 33(4):753–763. https://doi.org/10.1111/j.1467-9469.2006.00505.x
Article MathSciNet MATH Google Scholar
Horst AM, Hill AP, Gorman KB (2020) palmerpenguins: Palmer Archipelago (Antarctica) penguin data. https://allisonhorst.github.io/palmerpenguins/, r package version 0.1.0
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218. https://doi.org/10.1007/BF01908075
Article MATH Google Scholar
McNicholas P, Murphy T, McDaid A, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious gaussian mixture models. Comput Stat Data Anal 54(3):711–723. https://doi.org/10.1016/j.csda.2009.02.011
Article MathSciNet MATH Google Scholar
McNicholas PD, Murphy TB (2008) Parsimonious gaussian mixture models. Stat Comput 18(3):285–296. https://doi.org/10.1007/s11222-008-9056-0
Article MathSciNet Google Scholar
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278. https://doi.org/10.1093/biomet/80.2.267
Article MathSciNet MATH Google Scholar
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2):945
Article Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464. https://doi.org/10.1214/aos/1176344136
Article MathSciNet MATH Google Scholar
Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: clustering, classification and density estimation using gaussian finite mixture models. The R J 8(1):289–317. https://doi.org/10.32614/RJ-2016-021
Article Google Scholar
Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37(1):35–43. https://doi.org/10.2307/2530520
Article MathSciNet MATH Google Scholar
Teicher H (1961) Maximum likelihood characterization of distributions. Ann Math Statist 32(4):1214–1222. https://doi.org/10.1214/aoms/1177704861
Article MathSciNet MATH Google Scholar
von Weinen MDzS (1986) Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25:189–201
Google Scholar
Wolfe JH (1963) Object cluster analysis of social areas. PhD thesis, University of California
Yakowitz SJ, Spragins JD (1968) On the identifiability of finite mixtures. Ann Math Stat 39(1):209–214. https://doi.org/10.1214/aoms/1177698520
Article MathSciNet MATH Google Scholar
Zhang J (2013) Epistatic clustering: a model-based approach for identifying links between clusters. J Am Stat Assoc 108(504):1366–1384. https://doi.org/10.1080/01621459.2013.835661
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
Jason Hou-Liu & Ryan P. Browne

Authors

Jason Hou-Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ryan P. Browne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason Hou-Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 195 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hou-Liu, J., Browne, R.P. Factor and hybrid components for model-based clustering. Adv Data Anal Classif 16, 373–398 (2022). https://doi.org/10.1007/s11634-021-00483-2

Download citation

Received: 30 January 2021
Revised: 10 September 2021
Accepted: 11 November 2021
Published: 17 January 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11634-021-00483-2

Keywords

Mathematics Subject Classification

62-08: Computational methods for problems pertaining to statistics

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Factor and hybrid components for model-based clustering

Abstract

Access this article

Similar content being viewed by others

Gaussian parsimonious clustering models with covariates and a noise component

Cluster-weighted $$t$$ -factor analyzers for robust model-based clustering and dimension reduction

Gaussian mixture model with an extended ultrametric covariance structure

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 195 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Factor and hybrid components for model-based clustering

Abstract

Access this article

Similar content being viewed by others

Gaussian parsimonious clustering models with covariates and a noise component

Cluster-weighted $$t$$ -factor analyzers for robust model-based clustering and dimension reduction

Gaussian mixture model with an extended ultrametric covariance structure

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 195 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation