Skip to main content
Log in

Robust skew-t factor analysis models for handling missing data

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

This paper presents a novel framework for maximum likelihood (ML) estimation in skew-t factor analysis (STFA) models in the presence of missing values or nonresponses. As a robust extension of the ordinary factor analysis model, the STFA model assumes a restricted version of the multivariate skew-t distribution for the latent factors and the unobservable errors to accommodate non-normal features such as asymmetry and heavy tails or outliers. An EM-type algorithm is developed to carry out ML estimation and imputation of missing values under a missing at random mechanism. The practical utility of the proposed methodology is illustrated through real and synthetic data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Aitken AC (1926) On Bernoulli’s numerical solution of algebraic equations. Proc R Soc Edinb 46:289–305

    Article  MATH  Google Scholar 

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) 2nd international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281

    Google Scholar 

  • Anderson TW (1957) Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. J Am Stat Assoc 52:200–203

    Article  MATH  MathSciNet  Google Scholar 

  • Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J R Stat Soc Ser B 61:579–602

    Article  MATH  MathSciNet  Google Scholar 

  • Azzalini A, Capitaino A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J R Stat Soc Ser B 65:367–389

    Article  MATH  MathSciNet  Google Scholar 

  • Bai J, Li K (2012) Statistical analysis of factor models of high dimension. Ann Stat 40:436–465

    Article  MATH  MathSciNet  Google Scholar 

  • Barndorff-Nielsen O, Shephard N (2001) Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. J R Stat Soc Ser B 63:167–241

    Article  MATH  MathSciNet  Google Scholar 

  • Basilevsky A (2008) Statistical factor analysis and related methods: theory and applications. Wiley, New York

    MATH  Google Scholar 

  • Beal MJ (2003) Variational algorithms for approximation Bayesian inference. Ph.D. thesis, The University of London, London

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38

    MATH  Google Scholar 

  • Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci 1:54–75

    Article  MATH  MathSciNet  Google Scholar 

  • Healy MJR (1968) Multivariate normal plotting. Appl Stat 17:157–161

    Article  Google Scholar 

  • Hendrickson A, White P (1964) Promax: a quick method for rotation to oblique simple structure. Brit J Stat Psychol 17:65–70

    Article  Google Scholar 

  • Hocking RR, Smith WB (1968) Estimation of parameters in the multivariate normal distribution with missing observations. J Am Stat Assoc 63:159–173

    MathSciNet  Google Scholar 

  • Jamshidian M, Mata M (2008) Post modeling sensitivity analysis to detect the effect of missing data mechanisms. Multivar Behav Res 43:432–452

    Article  Google Scholar 

  • Jamshidian M, Yuan KH (2013) Data-driven sensitivity analysis to detect missing data mechanism with applications to structural equation modeling. J Stat Comput Simul 83:1344–1362

    Article  MathSciNet  Google Scholar 

  • Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Pearson Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  • Jöreskog KG, Sörbom D (1979) Advances in factor analysis and structural equation models. University Press of America, New York

    MATH  Google Scholar 

  • Kim HM, Maadooliat M, Arellano-Valle RB, Genton MG (2016) Skewed factor models using selection mechanisms. J Multivar Anal 145:162–177

    Article  MATH  MathSciNet  Google Scholar 

  • Kim JO, Curry J (1977) The treatment of missing data in multivariate analysis. Soc Methods Res 6:215–240

    Article  Google Scholar 

  • Lawley DN (1940) The estimation of factor loadings by the method of maximum likelihood. Proc R Soc Edinb A 60:64–82

    Article  MATH  MathSciNet  Google Scholar 

  • Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method, 2nd edn. Butterworth, London

    MATH  Google Scholar 

  • Lee SX, McLachlan GJ (2013a) On mixtures of skew normal and skew \(t\)-distributions. Adv Data Anal Classif 7:241–266

    Article  MATH  MathSciNet  Google Scholar 

  • Lee SX, McLachlan GJ (2013b) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22:427–454

    Article  MATH  MathSciNet  Google Scholar 

  • Lin TI, Lin TC (2011) Robust statistical modelling using the multivariate skew \(t\) distribution with complete and incomplete data. Stat Model 11:253–277

    Article  MATH  MathSciNet  Google Scholar 

  • Lin TI, Wu PH, MaLachlan GJ, Lee SX (2015) A robust factor analysis model using the restricted skew-\(t\) distribution. Test 24:510–531

    Article  MATH  MathSciNet  Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Liu C (1999) Efficient ML estimation of the multivariate normal distribution from incomplete data. J Multivar Anal 69:206–217

    Article  MATH  MathSciNet  Google Scholar 

  • Liu M, Lin TI (2015) Skew-normal factor analysis models with incomplete data. J Appl Stat 42:789–805

    Article  MathSciNet  Google Scholar 

  • Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sin 14:41–67

    MATH  MathSciNet  Google Scholar 

  • McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54:711–723

    Article  MATH  MathSciNet  Google Scholar 

  • Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278

    Article  MATH  MathSciNet  Google Scholar 

  • Molenberghs G, Beunckens C, Sotto C, Kenward MG (2008) Every missing not at random model has got a missing at random counterpart with equal fit. J R Stat Soc Series B 70:371–388

    Article  MATH  MathSciNet  Google Scholar 

  • Montanari A, Viroli C (2010) A skew-normal factor model for the analysis of student satisfaction towards university courses. J Appl Statist 37:473–487

    Article  MathSciNet  Google Scholar 

  • Mooijaart A (1985) Factor analysis for non-normal variables. Psychometrika 50:323–342

    Article  MATH  MathSciNet  Google Scholar 

  • Murray PM, Browne RP, McNicholas PD (2014a) Mixtures of skew-\(t\) factor analyzers. Comput Stat Data Anal 77:326–335

    Article  MathSciNet  Google Scholar 

  • Murray PM, McNicholas PD, Browne RP (2014b) Mixtures of common skew-\(t\) factor analyzers. Stat 3:68–82

    Article  Google Scholar 

  • Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirov JP (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci USA 106:8519–8524

    Article  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63:581–592

    Article  MATH  MathSciNet  Google Scholar 

  • Rubin DB (1987) Multiple Imputation for Nonresponse in Surveys. Wiley, New York

    Book  MATH  Google Scholar 

  • Rubin DB, Thayer TT (1982) EM algorithms for ML factor analysis. Psychometrika 47:69–76

    Article  MATH  MathSciNet  Google Scholar 

  • Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with application to Bayesian regression models. Can J Stat 31:129–150

    Article  MATH  MathSciNet  Google Scholar 

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Schafer JL, Yucel RM (2002) Computational strategies for multivariate linear mixed-effects models with missing values. J Comput Gr Stat 11:437–457

    Article  MathSciNet  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MATH  MathSciNet  Google Scholar 

  • Spearman C (1904) General intelligence, objectively determined and measured. Am J Psychol 15:201–292

    Article  Google Scholar 

  • Thurstone LL (1947) Multiple factor analysis. University of Chicago Press, Chicago

    MATH  Google Scholar 

  • Weber T (2005) Discriminant analysis of polythetically described older palaeolithic stone flakes: Possibilities and questions. From data and information analysis to knowledge engineering, part of the series studies in classification, data analysis, and knowledge organization. pp 158–165

Download references

Acknowledgements

The authors would like to express their deepest gratitude to the editors and anonymous reviewers for their insightful comments and suggestions that greatly improved this paper. This work was partially supported by MOST 105-2118-M-035-004-MY2 and MOST 105-2118-M-005-003-MY2 awarded by the Ministry of Science and Technology of Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tsung-I Lin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, WL., Liu, M. & Lin, TI. Robust skew-t factor analysis models for handling missing data. Stat Methods Appl 26, 649–672 (2017). https://doi.org/10.1007/s10260-017-0388-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-017-0388-9

Keywords

Mathematics Subject Classification

Navigation