Computationally efficient learning of multivariate t mixture models with missing information

Lin, Tsung-I; Ho, Hsiu J.; Shen, Pao S.

doi:10.1007/s00180-008-0129-5

Computationally efficient learning of multivariate t mixture models with missing information

Original Paper
Published: 17 July 2008

Volume 24, pages 375–392, (2009)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Tsung-I Lin¹,
Hsiu J. Ho¹ &
Pao S. Shen²

127 Accesses
14 Citations
Explore all metrics

Abstract

A finite mixture model using the multivariate t distribution has been well recognized as a robust extension of Gaussian mixtures. This paper presents an efficient PX-EM algorithm for supervised learning of multivariate t mixture models in the presence of missing values. To simplify the development of new theoretic results and facilitate the implementation of the PX-EM algorithm, two auxiliary indicator matrices are incorporated into the model and shown to be effective. The proposed methodology is a flexible mixture analyzer that allows practitioners to handle real-world multivariate data sets with complex missing patterns in a more efficient manner. The performance of computational aspects is investigated through a simulation study and the procedure is also applied to the analysis of real data with varying proportions of synthetic missing values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust model-based clustering via mixtures of skew-t distributions with missing information

Article 17 November 2015

A Novel Finite Mixture Model Based on the Generalized t Distributions with Two-Sided Censored Data

Article 25 September 2024

Finite Mixture of Linear Regression Models: An Adaptive Constrained Approach to Maximum Likelihood Estimation

References

Basford KE, McLachlan GJ (1985) Estimation of allocation rates in a cluster analysis context. J Am Stat Assoc 80: 286–293
Article MathSciNet Google Scholar
Bensmail H, Celeux G, Raftery AE, Robert CP (1997) Inference in model-based cluster analysis. Stat Comput 7: 1–10
Article Google Scholar
Brooks SP, Giudici P, Roberts GO (2003) Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions (with discussion). J R Stat Soc Ser B 65: 3–55
Article MATH MathSciNet Google Scholar
Campbell NA, Mahon RJ (1974) A multivariate study of variation in two species of rock crab of genus Leptograpsus. Aust J Zool 22: 417–425
Article Google Scholar
Chib S, Greenberg E (1995) Understanding the Metropolis–Hastings algorithm. Am Stat 49: 327–335
Article Google Scholar
Dellaportas P, Papageorgiou I (2006) Multivariate mixtures of normals with unknown number of components. Stat Comput 16: 57–68
Article MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39: 1–38
MATH MathSciNet Google Scholar
Diebolt J, Robert CP (1994) Estimation of finite mixture distributions through Bayesian sampling. J R Stat Soc Ser B 56: 363–375
MATH MathSciNet Google Scholar
Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90: 577–588
Article MATH MathSciNet Google Scholar
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York
MATH Google Scholar
Ghahramani Z, Jordan MI (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesarro G, Alspector J (eds) Advances in neural information processing systems, vol 6. Morgan Kaufmann, San Francisco, pp 120–127
Google Scholar
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–732
Article MATH MathSciNet Google Scholar
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57: 97–109
Article MATH Google Scholar
Lin TI, Lee JC, Ho HJ (2006) On fast supervised learning for normal mixture models with missing information. Pattern Recogn 39: 1177–1187
Article MATH Google Scholar
Lin TI, Lee JC, Ni HF (2004) Bayesian analysis of mixture modelling using the multivariate t distribution. Stat Comput 14: 119–130
Article MathSciNet Google Scholar
Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
MATH Google Scholar
Liu CH, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81: 633–648
Article MATH MathSciNet Google Scholar
Liu CH, Rubin DB (1995) ML estimation of the t distribution using EM and its extensions, ECM and ECME. Stat Sin 5: 19–39
MATH MathSciNet Google Scholar
Liu CH, Rubin DB, Wu YN (1998) Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika 85: 755–770
Article MATH MathSciNet Google Scholar
McLachlan GJ, Basford KE (1988) Mixture models: inference and application to clustering. Marcel Dekker, New York
Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiely, New York
Book MATH Google Scholar
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80: 267–278
Article MATH MathSciNet Google Scholar
Meng XL, van Dyk D (1997) The EM algorithm—an old folk song sung to a fast new tune (with discussion). J R Stat Soc Ser B 59: 511–567
Article MATH Google Scholar
Peel D, McLachlan GJ (2000) Robust mixture modeling using the t distribution. Stat Comput 10: 339–348
Article Google Scholar
Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B 59: 731–792
Article MATH MathSciNet Google Scholar
Rubin DB (1976) Inference and missing data. Biometrika 63: 581–592
Article MATH MathSciNet Google Scholar
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall, London
MATH Google Scholar
Shoham S (2002) Robust clustering by deterministic agglomeration EM of mixtures of multivariate t-distributions. Pattern Recogn 35: 1127–1142
Article MATH Google Scholar
Shoham S, Fellows MR, Normann RA (2003) Robust, automatic spike sorting using mixtures of multivariate t-distributions. J Neurosci Methods 127: 111–122
Article Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical prediction (with discussion). J R Stat Soc Ser B 36: 111–147
MATH Google Scholar
Titterington DM, Smith AFM, Markov UE (1985) Statistical analysis of finite mixture distributions. Wiely, New York
MATH Google Scholar
Wang HX, Zhang QB, Luo B, Wei S (2004) Robust mixture modelling using multivariate t distribution with missing information. Pattern Recogn Lett 25: 701–710
Article Google Scholar
Zhang ZH, Chan KL, Wu YM, Chen CB (2004) Learning a multivariate gaussian mixture model with the reversible jump MCMC algorithm. Stat Comput 14: 343–355
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics, National Chung Hsing University, Taichung, 402, Taiwan
Tsung-I Lin & Hsiu J. Ho
Department of Statistics, Tunghai University, PO Box 823, Taichung, 407, Taiwan
Pao S. Shen

Authors

Tsung-I Lin
View author publications
You can also search for this author inPubMed Google Scholar
Hsiu J. Ho
View author publications
You can also search for this author inPubMed Google Scholar
Pao S. Shen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Tsung-I Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, TI., Ho, H.J. & Shen, P.S. Computationally efficient learning of multivariate t mixture models with missing information. Comput Stat 24, 375–392 (2009). https://doi.org/10.1007/s00180-008-0129-5

Download citation

Received: 15 April 2007
Accepted: 25 June 2008
Published: 17 July 2008
Issue Date: August 2009
DOI: https://doi.org/10.1007/s00180-008-0129-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computationally efficient learning of multivariate t mixture models with missing information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust model-based clustering via mixtures of skew-t distributions with missing information

A Novel Finite Mixture Model Based on the Generalized t Distributions with Two-Sided Censored Data

Finite Mixture of Linear Regression Models: An Adaptive Constrained Approach to Maximum Likelihood Estimation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now