Abstract
This paper presents a robust mixture modeling framework using the multivariate skew t distributions, an extension of the multivariate Student’s t family with additional shape parameters to regulate skewness. The proposed model results in a very complicated likelihood. Two variants of Monte Carlo EM algorithms are developed to carry out maximum likelihood estimation of mixture parameters. In addition, we offer a general information-based method for obtaining the asymptotic covariance matrix of maximum likelihood estimates. Some practical issues including the selection of starting values as well as the stopping criterion are also discussed. The proposed methodology is applied to a subset of the Australian Institute of Sport data for illustration.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arellano-Valle, R.B., Bolfarine, H., Lachos, V.H.: Bayesian inference for skew-normal linear mixed models. J. Appl. Stat. 34, 663–682 (2007)
Azzalini, A.: The skew-normal distribution and related multivariate families (with discussion). Scand. J. Statist. 32, 159–200 (2005)
Azzalini, A., Capitaino, A.: Statistical applications of the multivariate skew-normal distribution. J. R. Stat. Soc. Ser. B 61, 579–602 (1999)
Azzalini, A., Capitaino, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. Ser. B 65, 367–389 (2003)
Azzalini, A., Dalla Valle, A.: The multivariate skew-normal distribution. Biometrika 83, 715–726 (1996)
Basford, K.E., Greenway, D.R., McLachlan, G.J., Peel, D.: Standard errors of fitted means under normal mixture. Comput. Stat. 12, 1–17 (1997)
Booth, G.J., Hobert, P.J.: Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. R. Stat. Soc. Ser. B 61, 265–285 (1999)
Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994)
Dellaportas, P., Papageorgiou, I.: Multivariate mixtures of normals with unknown number of components. Stat. Comput. 16, 57–68 (2006)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
Diebolt, J., Robert, C.P.: Estimation of finite mixture distributions through Bayesian sampling. J. R. Stat. Soc. Ser. B 56, 363–375 (1994)
Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90, 577–588 (1995)
Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1998)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–612 (2002)
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā Ser. 62, 49–66 (2000)
Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100, 257–265 (2009)
Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew t distribution. Stat. Comput. 17, 81–92 (2007a)
Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b)
Lindsay, B.: Mixture Models: Theory, Geometry and Applications. Institute of Mathematical Statistics, Hayward (1995)
Liu, C.H., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81, 633–648 (1994)
Lo, K., Brinkman, R.R., Gottardo, R.: Automated gating of flow cytometry data via robust model-based clustering. Cytometry Part A 73, 321–332 (2008)
Louis, T.A.: Finding the observed information when using the EM algorithm. J. R. Stat. Soc. Ser. B 44, 226–232 (1982)
McCulloch, C.E.: Maximum likelihood variance components estimation for binary data. J. Am. Stat. Assoc. 89, 330–335 (1994)
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Application to Clustering. Dekker, New York (1988)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2008)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008)
Meilijson, I.: A fast improvement to the EM algorithm to its own terms. J. R. Stat. Soc. Ser. B 51, 127–138 (1989)
Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993)
Nadarajah, S., Kotz, S.: Programs in R for computing truncated t distributions. Qual. Reliab. Eng. Int. 23, 273–278 (2007)
Peel, D., McLachlan, G.J.: Robust Mixture modeling using the t distribution. Stat. Comput. 10, 339–348 (2000)
Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirov, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA (2009). doi:10.1073/pnas.0903028106
Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26, 195–239 (1984)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008)
Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. Ser. B 59, 731–792 (1997)
Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with application to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)
Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)
Wei, G.C.G., Tanner, M.A.: A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 85, 699–704 (1990)
Zhang, Z., Chan, K.L., Wu, Y., Cen, C.B.: Learning a multivariate Gaussian mixture model with the reversible Jump MCMC algorithm. Stat. Comput. 14, 343–355 (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, TI. Robust mixture modeling using multivariate skew t distributions. Stat Comput 20, 343–356 (2010). https://doi.org/10.1007/s11222-009-9128-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-009-9128-9