On Clustering and Classification Via Mixtures of Multivariate t-Distributions

McNicholas, Paul D.

doi:10.1007/978-3-319-00032-9_27

On Clustering and Classification Via Mixtures of Multivariate t-Distributions

Paul D. McNicholas⁴

Conference paper
First Online: 01 January 2013

5078 Accesses
3 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

The use of mixture models for clustering and classification has received renewed attention within the literature since the mid-1990s. The multivariate Gaussian distribution has been at the heart of this body of work, but approaches that utilize the multivariate t-distribution have burgeoned into viable and effective alternatives. In this paper, recent work on classification and clustering using mixtures of multivariate t-distributions is reviewed and discussed, along with related issues. The paper concludes with a summary and suggestions for future work.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Andrews, J. L., & McNicholas, P. D. (2011a). Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, 21(3), 361–373.
Article MathSciNet Google Scholar
Andrews, J. L., & McNicholas, P. D. (2011b). Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. Journal of Statistical Planning and Inference, 141(4), 1479–1486.
Article MathSciNet MATH Google Scholar
Andrews, J. L., & McNicholas, P. D. (2012a). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Statistics and Computing, 22(5), 1021–1029.
Article MathSciNet MATH Google Scholar
Andrews, J. L., & McNicholas, P. D. (2012b). teigen: Model-based clustering and classification with the multivariate t-distribution. R package version 1.0.
Google Scholar
Andrews, J. L., McNicholas, P. D., & Subedi, S. (2011). Model-based classification via mixtures of multivariate t-distributions. Computational Statistics and Data Analysis, 55(1), 520–529.
Article MathSciNet MATH Google Scholar
Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.
Article MathSciNet MATH Google Scholar
Bartlett, M. S. (1953). Factor analysis in psychology as a statistician sees it. In Uppsala symposium on psychological factor analysis, no. 3 in Nordisk Psykologi’s Monograph Series (pp. 23–43).
Google Scholar
Bouveyron, C., Girard, S., & Schmid, C. (2007). High-dimensional data clustering. Computational Statistics and Data Analysis, 52(1), 502–519.
Article MathSciNet MATH Google Scholar
Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.
Article Google Scholar
Dean, N., Murphy, T. B., & Downey, G. (2006). Using unlabelled data to update classification rules with applications in food authenticity studies. Journal of the Royal Statistical Society: Series C, 55(1), 1–14.
Article MathSciNet MATH Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B 39(1), 1–38.
MathSciNet MATH Google Scholar
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97(458), 611–631.
Article MathSciNet MATH Google Scholar
Fraley, C., & Raftery, A. E. (2006). MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, Department of Statistics, University of Washington, minor revisions January 2007 and November 2007.
Google Scholar
Ghahramani, Z., & Hinton, G. E. (1997). The EM algorithm for factor analyzers. Tech. Rep. CRG-TR-96-1, University of Toronto, Toronto.
Google Scholar
Greselin, F., & Ingrassia, S. (2010a). Constrained monotone EM algorithms for mixtures of multivariate t distributions. Statistics and Computing, 20(1), 9–22.
Article MathSciNet Google Scholar
Greselin, F., & Ingrassia, S. (2010b). Weakly homoscedastic constraints for mixtures of t-distributions. In A. Fink, B. Lausen, W. Seidel, & A. Ultsch (Eds.) Advances in data analysis, data handling and business intelligence, studies in classification, data analysis, and knowledge organization (pp. 219–228). Berlin/Heidelberg: Springer.
Google Scholar
Karlis, D., & Santourian, A. (2009). Model-based clustering with non-elliptically contoured distributions. Statistics and Computing, 19(1), 73–83.
Article MathSciNet Google Scholar
Keribin, C. (1998). Estimation consistante de l’ordre de modèles de mélange. Comptes Rendus de l’Académie des Sciences Série I Mathématique 326(2), 243–248.
MathSciNet MATH Google Scholar
Lee, S., & McLachlan, G. J. (2011). On the fitting of mixtures of multivariate skew t-distributions via the EM algorithm. ArXiv:1109.4706.
Google Scholar
Lin, T. I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20(3), 343–356.
Article MathSciNet Google Scholar
McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions (2nd edn.). New York: Wiley.
Book MATH Google Scholar
McLachlan, G. J., & Peel, D. (1998). Robust cluster analysis via mixtures of multivariate t-distributions. In Lecture notes in computer science (vol. 1451, pp. 658–666). Berlin: Springer.
Google Scholar
McLachlan, G. J., & Peel, D. (2000). Mixtures of factor analyzers. In Proceedings of the seventh international conference on machine learning (pp. 599–606). San Francisco: Morgan Kaufmann.
Google Scholar
McLachlan, G. J., Bean, R. W., & Jones, L. B. T. (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Computational Statistics and Data Analysis, 51(11), 5327–5338.
Article MathSciNet MATH Google Scholar
McNicholas, P. D. (2010). Model-based classification using latent Gaussian mixture models. Journal of Statistical Planning and Inference, 140(5), 1175–1181.
Article MathSciNet MATH Google Scholar
McNicholas, P. D., & Murphy, T. B. (2005). Parsimonious Gaussian mixture models. Tech. Rep. 05/11, Department of Statistics, Trinity College Dublin, Dublin, Ireland.
Google Scholar
McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.
Article MathSciNet Google Scholar
McNicholas, P. D., & Murphy, T. B. (2010a). Model-based clustering of longitudinal data. The Canadian Journal of Statistics, 38(1), 153–168.
MathSciNet MATH Google Scholar
McNicholas, P. D., & Murphy, T. B. (2010b). Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics, 26(21), 2705–2712.
Article Google Scholar
McNicholas, P. D., & Subedi, S. (2012). Clustering gene expression time course data using mixtures of multivariate t-distributions. Journal of Statistical Planning and Inference, 142(5), 1114–1127.
Article MathSciNet MATH Google Scholar
McNicholas, P. D., Jampani, K. R., McDaid, A. F., Murphy, T. B., & Banks, L. (2011). pgmm: Parsimonious Gaussian Mixture Models. R package version 1.0.
Google Scholar
McNicholas, P. D., Jampani, K. R., & Subedi, S. (2012). longclust: Model-Based Clustering and Classification for Longitudinal Data. R package version 1.1.
Google Scholar
Peel, D., & McLachlan, G. J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10(4), 339–348.
Article Google Scholar
Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika, 86(3), 677–690.
Article MathSciNet MATH Google Scholar
R Development Core Team. (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464.
Article MathSciNet MATH Google Scholar
Shoham, S. (2002). Robust clustering by deterministic agglomeration EM of mixtures of multivariate t-distributions. Pattern Recognition 35(5), 1127–1142.
Article MATH Google Scholar
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101.
Article Google Scholar
Steane, M. A., McNicholas, P. D., & Yada, R. Y. (2012). Model-based classification via mixtures of multivariate t-factor analyzers. Communications in Statistics – Simulation and Computation, 41(4), 510–523.
Article MathSciNet MATH Google Scholar
Tipping, T. E., & Bishop, C. M. (1997). Mixtures of probabilistic principal component analysers. Tech. Rep. NCRG/97/003, Aston University (Neural Computing Research Group), Birmingham, UK.
Google Scholar
Tipping, T. E., & Bishop, C. M. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2), 443–482.
Article Google Scholar
Vrbik, I., & McNicholas, P. D. (2012). Analytic calculations for the EM algorithm for multivariate skew-mixture models. Statistics & Probability Letters 82(6), 1169–1174.
Article MathSciNet MATH Google Scholar
Wolfe, J. H. (1963). Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley.
Google Scholar
Zhao, J., & Jiang, Q. (2006). Probabilistic PCA for t distributions. Neurocomputing, 69(16–18), 2217–2226.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Guelph, Ontario, Canada
Paul D. McNicholas

Authors

Paul D. McNicholas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul D. McNicholas .

Editor information

Editors and Affiliations

Department of Economics, and Management, University of Pavia, Via San Felice 7, Pavia, 27100, Italy
Paolo Giudici
Department of Economics, and Business, University of Catania, Corso Italia 55, Catania, 95129, Italy
Salvatore Ingrassia
, Department of Statistics, University of Rome "La Sapienza", Piazzale Aldo Moro 5, Rome, 00185, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McNicholas, P.D. (2013). On Clustering and Classification Via Mixtures of Multivariate t-Distributions. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-00032-9_27
Published: 22 May 2013
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00031-2
Online ISBN: 978-3-319-00032-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics