Abstract
This paper presents a novel framework for maximum likelihood (ML) estimation in skew-t factor analysis (STFA) models in the presence of missing values or nonresponses. As a robust extension of the ordinary factor analysis model, the STFA model assumes a restricted version of the multivariate skew-t distribution for the latent factors and the unobservable errors to accommodate non-normal features such as asymmetry and heavy tails or outliers. An EM-type algorithm is developed to carry out ML estimation and imputation of missing values under a missing at random mechanism. The practical utility of the proposed methodology is illustrated through real and synthetic data examples.








Similar content being viewed by others
References
Aitken AC (1926) On Bernoulli’s numerical solution of algebraic equations. Proc R Soc Edinb 46:289–305
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) 2nd international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281
Anderson TW (1957) Maximum likelihood estimates for a multivariate normal distribution when some observations are missing. J Am Stat Assoc 52:200–203
Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J R Stat Soc Ser B 61:579–602
Azzalini A, Capitaino A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J R Stat Soc Ser B 65:367–389
Bai J, Li K (2012) Statistical analysis of factor models of high dimension. Ann Stat 40:436–465
Barndorff-Nielsen O, Shephard N (2001) Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. J R Stat Soc Ser B 63:167–241
Basilevsky A (2008) Statistical factor analysis and related methods: theory and applications. Wiley, New York
Beal MJ (2003) Variational algorithms for approximation Bayesian inference. Ph.D. thesis, The University of London, London
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38
Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci 1:54–75
Healy MJR (1968) Multivariate normal plotting. Appl Stat 17:157–161
Hendrickson A, White P (1964) Promax: a quick method for rotation to oblique simple structure. Brit J Stat Psychol 17:65–70
Hocking RR, Smith WB (1968) Estimation of parameters in the multivariate normal distribution with missing observations. J Am Stat Assoc 63:159–173
Jamshidian M, Mata M (2008) Post modeling sensitivity analysis to detect the effect of missing data mechanisms. Multivar Behav Res 43:432–452
Jamshidian M, Yuan KH (2013) Data-driven sensitivity analysis to detect missing data mechanism with applications to structural equation modeling. J Stat Comput Simul 83:1344–1362
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Pearson Prentice Hall, Upper Saddle River
Jöreskog KG, Sörbom D (1979) Advances in factor analysis and structural equation models. University Press of America, New York
Kim HM, Maadooliat M, Arellano-Valle RB, Genton MG (2016) Skewed factor models using selection mechanisms. J Multivar Anal 145:162–177
Kim JO, Curry J (1977) The treatment of missing data in multivariate analysis. Soc Methods Res 6:215–240
Lawley DN (1940) The estimation of factor loadings by the method of maximum likelihood. Proc R Soc Edinb A 60:64–82
Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method, 2nd edn. Butterworth, London
Lee SX, McLachlan GJ (2013a) On mixtures of skew normal and skew \(t\)-distributions. Adv Data Anal Classif 7:241–266
Lee SX, McLachlan GJ (2013b) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22:427–454
Lin TI, Lin TC (2011) Robust statistical modelling using the multivariate skew \(t\) distribution with complete and incomplete data. Stat Model 11:253–277
Lin TI, Wu PH, MaLachlan GJ, Lee SX (2015) A robust factor analysis model using the restricted skew-\(t\) distribution. Test 24:510–531
Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
Liu C (1999) Efficient ML estimation of the multivariate normal distribution from incomplete data. J Multivar Anal 69:206–217
Liu M, Lin TI (2015) Skew-normal factor analysis models with incomplete data. J Appl Stat 42:789–805
Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sin 14:41–67
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, New York
McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54:711–723
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278
Molenberghs G, Beunckens C, Sotto C, Kenward MG (2008) Every missing not at random model has got a missing at random counterpart with equal fit. J R Stat Soc Series B 70:371–388
Montanari A, Viroli C (2010) A skew-normal factor model for the analysis of student satisfaction towards university courses. J Appl Statist 37:473–487
Mooijaart A (1985) Factor analysis for non-normal variables. Psychometrika 50:323–342
Murray PM, Browne RP, McNicholas PD (2014a) Mixtures of skew-\(t\) factor analyzers. Comput Stat Data Anal 77:326–335
Murray PM, McNicholas PD, Browne RP (2014b) Mixtures of common skew-\(t\) factor analyzers. Stat 3:68–82
Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirov JP (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci USA 106:8519–8524
Rubin DB (1976) Inference and missing data. Biometrika 63:581–592
Rubin DB (1987) Multiple Imputation for Nonresponse in Surveys. Wiley, New York
Rubin DB, Thayer TT (1982) EM algorithms for ML factor analysis. Psychometrika 47:69–76
Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with application to Bayesian regression models. Can J Stat 31:129–150
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall, London
Schafer JL, Yucel RM (2002) Computational strategies for multivariate linear mixed-effects models with missing values. J Comput Gr Stat 11:437–457
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Spearman C (1904) General intelligence, objectively determined and measured. Am J Psychol 15:201–292
Thurstone LL (1947) Multiple factor analysis. University of Chicago Press, Chicago
Weber T (2005) Discriminant analysis of polythetically described older palaeolithic stone flakes: Possibilities and questions. From data and information analysis to knowledge engineering, part of the series studies in classification, data analysis, and knowledge organization. pp 158–165
Acknowledgements
The authors would like to express their deepest gratitude to the editors and anonymous reviewers for their insightful comments and suggestions that greatly improved this paper. This work was partially supported by MOST 105-2118-M-035-004-MY2 and MOST 105-2118-M-005-003-MY2 awarded by the Ministry of Science and Technology of Taiwan.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, WL., Liu, M. & Lin, TI. Robust skew-t factor analysis models for handling missing data. Stat Methods Appl 26, 649–672 (2017). https://doi.org/10.1007/s10260-017-0388-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-017-0388-9