Skip to main content

On Clustering and Classification Via Mixtures of Multivariate t-Distributions

  • Conference paper
  • First Online:

Abstract

The use of mixture models for clustering and classification has received renewed attention within the literature since the mid-1990s. The multivariate Gaussian distribution has been at the heart of this body of work, but approaches that utilize the multivariate t-distribution have burgeoned into viable and effective alternatives. In this paper, recent work on classification and clustering using mixtures of multivariate t-distributions is reviewed and discussed, along with related issues. The paper concludes with a summary and suggestions for future work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Andrews, J. L., & McNicholas, P. D. (2011a). Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, 21(3), 361–373.

    Article  MathSciNet  Google Scholar 

  • Andrews, J. L., & McNicholas, P. D. (2011b). Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. Journal of Statistical Planning and Inference, 141(4), 1479–1486.

    Article  MathSciNet  MATH  Google Scholar 

  • Andrews, J. L., & McNicholas, P. D. (2012a). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Statistics and Computing, 22(5), 1021–1029.

    Article  MathSciNet  MATH  Google Scholar 

  • Andrews, J. L., & McNicholas, P. D. (2012b). teigen: Model-based clustering and classification with the multivariate t-distribution. R package version 1.0.

    Google Scholar 

  • Andrews, J. L., McNicholas, P. D., & Subedi, S. (2011). Model-based classification via mixtures of multivariate t-distributions. Computational Statistics and Data Analysis, 55(1), 520–529.

    Article  MathSciNet  MATH  Google Scholar 

  • Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.

    Article  MathSciNet  MATH  Google Scholar 

  • Bartlett, M. S. (1953). Factor analysis in psychology as a statistician sees it. In Uppsala symposium on psychological factor analysis, no. 3 in Nordisk Psykologi’s Monograph Series (pp. 23–43).

    Google Scholar 

  • Bouveyron, C., Girard, S., & Schmid, C. (2007). High-dimensional data clustering. Computational Statistics and Data Analysis, 52(1), 502–519.

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.

    Article  Google Scholar 

  • Dean, N., Murphy, T. B., & Downey, G. (2006). Using unlabelled data to update classification rules with applications in food authenticity studies. Journal of the Royal Statistical Society: Series C, 55(1), 1–14.

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  • Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97(458), 611–631.

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley, C., & Raftery, A. E. (2006). MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, Department of Statistics, University of Washington, minor revisions January 2007 and November 2007.

    Google Scholar 

  • Ghahramani, Z., & Hinton, G. E. (1997). The EM algorithm for factor analyzers. Tech. Rep. CRG-TR-96-1, University of Toronto, Toronto.

    Google Scholar 

  • Greselin, F., & Ingrassia, S. (2010a). Constrained monotone EM algorithms for mixtures of multivariate t distributions. Statistics and Computing, 20(1), 9–22.

    Article  MathSciNet  Google Scholar 

  • Greselin, F., & Ingrassia, S. (2010b). Weakly homoscedastic constraints for mixtures of t-distributions. In A. Fink, B. Lausen, W. Seidel, & A. Ultsch (Eds.) Advances in data analysis, data handling and business intelligence, studies in classification, data analysis, and knowledge organization (pp. 219–228). Berlin/Heidelberg: Springer.

    Google Scholar 

  • Karlis, D., & Santourian, A. (2009). Model-based clustering with non-elliptically contoured distributions. Statistics and Computing, 19(1), 73–83.

    Article  MathSciNet  Google Scholar 

  • Keribin, C. (1998). Estimation consistante de l’ordre de modèles de mélange. Comptes Rendus de l’Académie des Sciences Série I Mathématique 326(2), 243–248.

    MathSciNet  MATH  Google Scholar 

  • Lee, S., & McLachlan, G. J. (2011). On the fitting of mixtures of multivariate skew t-distributions via the EM algorithm. ArXiv:1109.4706.

    Google Scholar 

  • Lin, T. I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20(3), 343–356.

    Article  MathSciNet  Google Scholar 

  • McLachlan, G. J., & Krishnan, T. (2008). The EM algorithm and extensions (2nd edn.). New York: Wiley.

    Book  MATH  Google Scholar 

  • McLachlan, G. J., & Peel, D. (1998). Robust cluster analysis via mixtures of multivariate t-distributions. In Lecture notes in computer science (vol. 1451, pp. 658–666). Berlin: Springer.

    Google Scholar 

  • McLachlan, G. J., & Peel, D. (2000). Mixtures of factor analyzers. In Proceedings of the seventh international conference on machine learning (pp. 599–606). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • McLachlan, G. J., Bean, R. W., & Jones, L. B. T. (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Computational Statistics and Data Analysis, 51(11), 5327–5338.

    Article  MathSciNet  MATH  Google Scholar 

  • McNicholas, P. D. (2010). Model-based classification using latent Gaussian mixture models. Journal of Statistical Planning and Inference, 140(5), 1175–1181.

    Article  MathSciNet  MATH  Google Scholar 

  • McNicholas, P. D., & Murphy, T. B. (2005). Parsimonious Gaussian mixture models. Tech. Rep. 05/11, Department of Statistics, Trinity College Dublin, Dublin, Ireland.

    Google Scholar 

  • McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.

    Article  MathSciNet  Google Scholar 

  • McNicholas, P. D., & Murphy, T. B. (2010a). Model-based clustering of longitudinal data. The Canadian Journal of Statistics, 38(1), 153–168.

    MathSciNet  MATH  Google Scholar 

  • McNicholas, P. D., & Murphy, T. B. (2010b). Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics, 26(21), 2705–2712.

    Article  Google Scholar 

  • McNicholas, P. D., & Subedi, S. (2012). Clustering gene expression time course data using mixtures of multivariate t-distributions. Journal of Statistical Planning and Inference, 142(5), 1114–1127.

    Article  MathSciNet  MATH  Google Scholar 

  • McNicholas, P. D., Jampani, K. R., McDaid, A. F., Murphy, T. B., & Banks, L. (2011). pgmm: Parsimonious Gaussian Mixture Models. R package version 1.0.

    Google Scholar 

  • McNicholas, P. D., Jampani, K. R., & Subedi, S. (2012). longclust: Model-Based Clustering and Classification for Longitudinal Data. R package version 1.1.

    Google Scholar 

  • Peel, D., & McLachlan, G. J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10(4), 339–348.

    Article  Google Scholar 

  • Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika, 86(3), 677–690.

    Article  MathSciNet  MATH  Google Scholar 

  • R Development Core Team. (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org.

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Shoham, S. (2002). Robust clustering by deterministic agglomeration EM of mixtures of multivariate t-distributions. Pattern Recognition 35(5), 1127–1142.

    Article  MATH  Google Scholar 

  • Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101.

    Article  Google Scholar 

  • Steane, M. A., McNicholas, P. D., & Yada, R. Y. (2012). Model-based classification via mixtures of multivariate t-factor analyzers. Communications in Statistics – Simulation and Computation, 41(4), 510–523.

    Article  MathSciNet  MATH  Google Scholar 

  • Tipping, T. E., & Bishop, C. M. (1997). Mixtures of probabilistic principal component analysers. Tech. Rep. NCRG/97/003, Aston University (Neural Computing Research Group), Birmingham, UK.

    Google Scholar 

  • Tipping, T. E., & Bishop, C. M. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2), 443–482.

    Article  Google Scholar 

  • Vrbik, I., & McNicholas, P. D. (2012). Analytic calculations for the EM algorithm for multivariate skew-mixture models. Statistics & Probability Letters 82(6), 1169–1174.

    Article  MathSciNet  MATH  Google Scholar 

  • Wolfe, J. H. (1963). Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley.

    Google Scholar 

  • Zhao, J., & Jiang, Q. (2006). Probabilistic PCA for t distributions. Neurocomputing, 69(16–18), 2217–2226.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul D. McNicholas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

McNicholas, P.D. (2013). On Clustering and Classification Via Mixtures of Multivariate t-Distributions. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_27

Download citation

Publish with us

Policies and ethics