Abstract
The use of mixture models for clustering and classification has received renewed attention within the literature since the mid-1990s. The multivariate Gaussian distribution has been at the heart of this body of work, but approaches that utilize the multivariate t-distribution have burgeoned into viable and effective alternatives. In this paper, recent work on classification and clustering using mixtures of multivariate t-distributions is reviewed and discussed, along with related issues. The paper concludes with a summary and suggestions for future work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Andrews, J. L., & McNicholas, P. D. (2011a). Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, 21(3), 361–373.
Andrews, J. L., & McNicholas, P. D. (2011b). Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. Journal of Statistical Planning and Inference, 141(4), 1479–1486.
Andrews, J. L., & McNicholas, P. D. (2012a). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Statistics and Computing, 22(5), 1021–1029.
Andrews, J. L., & McNicholas, P. D. (2012b). teigen: Model-based clustering and classification with the multivariate t-distribution. R package version 1.0.
Andrews, J. L., McNicholas, P. D., & Subedi, S. (2011). Model-based classification via mixtures of multivariate t-distributions. Computational Statistics and Data Analysis, 55(1), 520–529.
Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.
Bartlett, M. S. (1953). Factor analysis in psychology as a statistician sees it. In Uppsala symposium on psychological factor analysis, no. 3 in Nordisk Psykologi’s Monograph Series (pp. 23–43).
Bouveyron, C., Girard, S., & Schmid, C. (2007). High-dimensional data clustering. Computational Statistics and Data Analysis, 52(1), 502–519.
Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.
Dean, N., Murphy, T. B., & Downey, G. (2006). Using unlabelled data to update classification rules with applications in food authenticity studies. Journal of the Royal Statistical Society: Series C, 55(1), 1–14.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B 39(1), 1–38.
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97(458), 611–631.
Fraley, C., & Raftery, A. E. (2006). MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, Department of Statistics, University of Washington, minor revisions January 2007 and November 2007.
Ghahramani, Z., & Hinton, G. E. (1997). The EM algorithm for factor analyzers. Tech. Rep. CRG-TR-96-1, University of Toronto, Toronto.
Greselin, F., & Ingrassia, S. (2010a). Constrained monotone EM algorithms for mixtures of multivariate t distributions. Statistics and Computing, 20(1), 9–22.
Greselin, F., & Ingrassia, S. (2010b). Weakly homoscedastic constraints for mixtures of t-distributions. In A. Fink, B. Lausen, W. Seidel, & A. Ultsch (Eds.) Advances in data analysis, data handling and business intelligence, studies in classification, data analysis, and knowledge organization (pp. 219–228). Berlin/Heidelberg: Springer.
Karlis, D., & Santourian, A. (2009). Model-based clustering with non-elliptically contoured distributions. Statistics and Computing, 19(1), 73–83.
Keribin, C. (1998). Estimation consistante de l’ordre de modèles de mélange. Comptes Rendus de l’Académie des Sciences Série I Mathématique 326(2), 243–248.
Lee, S., & McLachlan, G. J. (2011). On the fitting of mixtures of multivariate skew t-distributions via the EM algorithm. ArXiv:1109.4706.
Lin, T. I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20(3), 343–356.
McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions (2nd edn.). New York: Wiley.
McLachlan, G. J., & Peel, D. (1998). Robust cluster analysis via mixtures of multivariate t-distributions. In Lecture notes in computer science (vol. 1451, pp. 658–666). Berlin: Springer.
McLachlan, G. J., & Peel, D. (2000). Mixtures of factor analyzers. In Proceedings of the seventh international conference on machine learning (pp. 599–606). San Francisco: Morgan Kaufmann.
McLachlan, G. J., Bean, R. W., & Jones, L. B. T. (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Computational Statistics and Data Analysis, 51(11), 5327–5338.
McNicholas, P. D. (2010). Model-based classification using latent Gaussian mixture models. Journal of Statistical Planning and Inference, 140(5), 1175–1181.
McNicholas, P. D., & Murphy, T. B. (2005). Parsimonious Gaussian mixture models. Tech. Rep. 05/11, Department of Statistics, Trinity College Dublin, Dublin, Ireland.
McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.
McNicholas, P. D., & Murphy, T. B. (2010a). Model-based clustering of longitudinal data. The Canadian Journal of Statistics, 38(1), 153–168.
McNicholas, P. D., & Murphy, T. B. (2010b). Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics, 26(21), 2705–2712.
McNicholas, P. D., & Subedi, S. (2012). Clustering gene expression time course data using mixtures of multivariate t-distributions. Journal of Statistical Planning and Inference, 142(5), 1114–1127.
McNicholas, P. D., Jampani, K. R., McDaid, A. F., Murphy, T. B., & Banks, L. (2011). pgmm: Parsimonious Gaussian Mixture Models. R package version 1.0.
McNicholas, P. D., Jampani, K. R., & Subedi, S. (2012). longclust: Model-Based Clustering and Classification for Longitudinal Data. R package version 1.1.
Peel, D., & McLachlan, G. J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10(4), 339–348.
Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika, 86(3), 677–690.
R Development Core Team. (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464.
Shoham, S. (2002). Robust clustering by deterministic agglomeration EM of mixtures of multivariate t-distributions. Pattern Recognition 35(5), 1127–1142.
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101.
Steane, M. A., McNicholas, P. D., & Yada, R. Y. (2012). Model-based classification via mixtures of multivariate t-factor analyzers. Communications in Statistics – Simulation and Computation, 41(4), 510–523.
Tipping, T. E., & Bishop, C. M. (1997). Mixtures of probabilistic principal component analysers. Tech. Rep. NCRG/97/003, Aston University (Neural Computing Research Group), Birmingham, UK.
Tipping, T. E., & Bishop, C. M. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2), 443–482.
Vrbik, I., & McNicholas, P. D. (2012). Analytic calculations for the EM algorithm for multivariate skew-mixture models. Statistics & Probability Letters 82(6), 1169–1174.
Wolfe, J. H. (1963). Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley.
Zhao, J., & Jiang, Q. (2006). Probabilistic PCA for t distributions. Neurocomputing, 69(16–18), 2217–2226.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
McNicholas, P.D. (2013). On Clustering and Classification Via Mixtures of Multivariate t-Distributions. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-00032-9_27
Published:
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00031-2
Online ISBN: 978-3-319-00032-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)