Skip to main content
Log in

Computationally efficient learning of multivariate t mixture models with missing information

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

A finite mixture model using the multivariate t distribution has been well recognized as a robust extension of Gaussian mixtures. This paper presents an efficient PX-EM algorithm for supervised learning of multivariate t mixture models in the presence of missing values. To simplify the development of new theoretic results and facilitate the implementation of the PX-EM algorithm, two auxiliary indicator matrices are incorporated into the model and shown to be effective. The proposed methodology is a flexible mixture analyzer that allows practitioners to handle real-world multivariate data sets with complex missing patterns in a more efficient manner. The performance of computational aspects is investigated through a simulation study and the procedure is also applied to the analysis of real data with varying proportions of synthetic missing values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Basford KE, McLachlan GJ (1985) Estimation of allocation rates in a cluster analysis context. J Am Stat Assoc 80: 286–293

    Article  MathSciNet  Google Scholar 

  • Bensmail H, Celeux G, Raftery AE, Robert CP (1997) Inference in model-based cluster analysis. Stat Comput 7: 1–10

    Article  Google Scholar 

  • Brooks SP, Giudici P, Roberts GO (2003) Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions (with discussion). J R Stat Soc Ser B 65: 3–55

    Article  MATH  MathSciNet  Google Scholar 

  • Campbell NA, Mahon RJ (1974) A multivariate study of variation in two species of rock crab of genus Leptograpsus. Aust J Zool 22: 417–425

    Article  Google Scholar 

  • Chib S, Greenberg E (1995) Understanding the Metropolis–Hastings algorithm. Am Stat 49: 327–335

    Article  Google Scholar 

  • Dellaportas P, Papageorgiou I (2006) Multivariate mixtures of normals with unknown number of components. Stat Comput 16: 57–68

    Article  MathSciNet  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39: 1–38

    MATH  MathSciNet  Google Scholar 

  • Diebolt J, Robert CP (1994) Estimation of finite mixture distributions through Bayesian sampling. J R Stat Soc Ser B 56: 363–375

    MATH  MathSciNet  Google Scholar 

  • Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90: 577–588

    Article  MATH  MathSciNet  Google Scholar 

  • Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York

    MATH  Google Scholar 

  • Ghahramani Z, Jordan MI (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesarro G, Alspector J (eds) Advances in neural information processing systems, vol 6. Morgan Kaufmann, San Francisco, pp 120–127

    Google Scholar 

  • Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–732

    Article  MATH  MathSciNet  Google Scholar 

  • Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57: 97–109

    Article  MATH  Google Scholar 

  • Lin TI, Lee JC, Ho HJ (2006) On fast supervised learning for normal mixture models with missing information. Pattern Recogn 39: 1177–1187

    Article  MATH  Google Scholar 

  • Lin TI, Lee JC, Ni HF (2004) Bayesian analysis of mixture modelling using the multivariate t distribution. Stat Comput 14: 119–130

    Article  MathSciNet  Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Liu CH, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81: 633–648

    Article  MATH  MathSciNet  Google Scholar 

  • Liu CH, Rubin DB (1995) ML estimation of the t distribution using EM and its extensions, ECM and ECME. Stat Sin 5: 19–39

    MATH  MathSciNet  Google Scholar 

  • Liu CH, Rubin DB, Wu YN (1998) Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika 85: 755–770

    Article  MATH  MathSciNet  Google Scholar 

  • McLachlan GJ, Basford KE (1988) Mixture models: inference and application to clustering. Marcel Dekker, New York

    Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiely, New York

    Book  MATH  Google Scholar 

  • Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80: 267–278

    Article  MATH  MathSciNet  Google Scholar 

  • Meng XL, van Dyk D (1997) The EM algorithm—an old folk song sung to a fast new tune (with discussion). J R Stat Soc Ser B 59: 511–567

    Article  MATH  Google Scholar 

  • Peel D, McLachlan GJ (2000) Robust mixture modeling using the t distribution. Stat Comput 10: 339–348

    Article  Google Scholar 

  • Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B 59: 731–792

    Article  MATH  MathSciNet  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63: 581–592

    Article  MATH  MathSciNet  Google Scholar 

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall, London

    MATH  Google Scholar 

  • Shoham S (2002) Robust clustering by deterministic agglomeration EM of mixtures of multivariate t-distributions. Pattern Recogn 35: 1127–1142

    Article  MATH  Google Scholar 

  • Shoham S, Fellows MR, Normann RA (2003) Robust, automatic spike sorting using mixtures of multivariate t-distributions. J Neurosci Methods 127: 111–122

    Article  Google Scholar 

  • Stone M (1974) Cross-validatory choice and assessment of statistical prediction (with discussion). J R Stat Soc Ser B 36: 111–147

    MATH  Google Scholar 

  • Titterington DM, Smith AFM, Markov UE (1985) Statistical analysis of finite mixture distributions. Wiely, New York

    MATH  Google Scholar 

  • Wang HX, Zhang QB, Luo B, Wei S (2004) Robust mixture modelling using multivariate t distribution with missing information. Pattern Recogn Lett 25: 701–710

    Article  Google Scholar 

  • Zhang ZH, Chan KL, Wu YM, Chen CB (2004) Learning a multivariate gaussian mixture model with the reversible jump MCMC algorithm. Stat Comput 14: 343–355

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tsung-I Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, TI., Ho, H.J. & Shen, P.S. Computationally efficient learning of multivariate t mixture models with missing information. Comput Stat 24, 375–392 (2009). https://doi.org/10.1007/s00180-008-0129-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-008-0129-5

Keywords