Abstract
Mixtures of factor analyzers (MFA) provide a powerful tool for modelling high-dimensional datasets. In recent years, several generalizations of MFA have been developed where the normality assumption of the factors and/or of the errors were relaxed to allow for skewness in the data. However, due to the form of the adopted component densities, the distribution of the factors/errors in most of these models is typically limited to modelling skewness concentrated in a single direction. Here, we introduce a more flexible finite mixture of factor analyzers based on the class of scale mixtures of canonical fundamental skew normal (SMCFUSN) distributions. This very general class of skew distributions can capture various types of skewness and asymmetry in the data. In particular, the proposed mixtures of SMCFUSN factor analyzers (SMCFUSNFA) can simultaneously accommodate multiple directions of skewness. As such, it encapsulates many commonly used models as special and/or limiting cases, such as models of some versions of skew normal and skew t-factor analyzers, and skew hyperbolic factor analyzers. For illustration, we focus on the t-distribution member of the class of SMCFUSN distributions, leading to mixtures of canonical fundamental skew t-factor analyzers (CFUSTFA). Parameter estimation can be carried out by maximum likelihood via an EM-type algorithm. The usefulness and potential of the proposed model are demonstrated using four real datasets.
Similar content being viewed by others
References
Arellano-Valle RB, Azzalini A (2006) On the unification of families of skew-normal distributions. Scand J Stat 33:561–574
Arellano-Valle RB, Genton MG (2005) On fundamental skew distributions. J Multivar Anal 96:93–116
Azzalini A, Capitanio A (2014) The Skew-Normal and Related Families. Cambridge University Press, Cambridge
Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83:715–726
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719–725
Browne RP, McNicholas PD (2015) A mixture of generalized hyperbolic distributions. Can J Stat 43:176–198
Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142
Codella N, Gutman D, Celebi ME, Helba B, Marchetti MA, Dusza S, Kalloo A, Liopyris K, Mishra N, Kittler H, Halpern A (2017) Skin lesion analysis toward melanoma detection: A challenge at the 2017 In: International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). arXiv:1710.05006
Cook RD, Weisberg S (1994) An Introduction to Regression Graphics. Wiley, New York
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38
Ferris LK, Harkes JA, Gilbert B, Winger DG, Golubets K, Akilov O, Satyanarayanan M (2015) Computer-aided classification of melanocytic lesions using dermoscopic images. J Am Acad Dermatol 73:769–776
Forina M, Tiscornia E (1982) Pattern recognition methods in the prediction of italian olive oil origin by their fatty acid content. Annali di Chimica 72:143–155
Genton MG (ed) (2004) Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman & Hall, CRC, Boca Raton, Florida
Ghahramani Z, Hinton G (1997) The EM algorithm for factor analyzers. Technical Report No CRG-TR-96-1 The University of Toronto: Toronto
Ho HJ, Lin TI, Chen HY, Wang WL (2012) Some results on the truncated multivariate \(t\) distribution. J Stat Plan Inference 142:25–40
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Karlis D, Santourian A (2009) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19:73–83
Kim HM, Maadooliat M, Arellano-Valle RB, Genton MG (2016) Skewed factor models using selection mechanisms. J Multivar Anal 145:162–177
Kim SG (2016) An approximate fitting for mixture of multivariate skew normal distribution via EM algorithm. Korean J Appl Stat 29:513–523
Lee S, McLachlan GJ (2014) Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results. Stat Comput 24:181–202
Lee SX, McLachlan GJ (2013) On mixtures of skew-normal and skew \(t\)-distributions. Adv Data Anal Classif 7:241–266
Lee SX, McLachlan GJ (2016) Finite mixtures of canonical fundamental skew \(t\)-distributions: The unification of the restricted and unrestricted skew \(t\)-mixture models. Stat Comput 26:573–589
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100:257–265
Lin TI (2010) Robust mixture modeling using multivariate skew-\(t\) distribution. Stat Comput 20:343–356
Lin TI, Wu PH, McLachlan GJ, Lee SX (2015) A robust factor analysis model using the restricted skew \(t\)-distribution. TEST 24:510–531
Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413
Lin TI, Wang WL, McLachlan GJ, Lee SX (2018) Robust mixtures of factor analysis models using the restricted multivariate skew-\(t\) distribution. Stat Modell 18:50–72
Maleki M, Wraith D, Arellano-Valle RB (2019) Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions. Stat Comput 29:425–428
Maruotti A, Bulla J, Lagona F, Picone M, Martella F (2017) Dynamic mixtures of factor analyzers to characterize multivariate air pollutant exposures. Ann Appl Stat 3:1617–1648
McLachlan GJ, Krishnan T (2008) The EM Algorithm and Extensions, 2nd edn. Wiley, Hoboken, New Jersey
McLachlan GJ, Lee SX (2016) Comment on “On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas Stat Probab Lett 116:1–5
McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York
McLachlan GJ, Peel D, Bean RW (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41:379–388
McLachlan GJ, Bean RW, Jones BT (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\)-distribution. Comput Stat Data Anal 51:5327–5338
Meng X, Rubin D (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278
Montanari A, Viroli C (2010) A skew-normal factor model for the analysis of student satisfaction towards university courses. J Appl Stat 37:463–487
Murray P, Browne R, McNicholas P (2014a) Mixtures of skew-\(t\) factor analyzers. Comput Stat Data Anal 77:326–335
Murray P, McNicholas P, Browne R (2014b) Mixtures of common skew-\(t\) factor analyzers. Statistics 3:68–82
Murray PM (2016) Detecting non-elliptical clusters. PhD thesis, Department of Mathematics & Statistics, McMaster University, Canada
Murray PM, Browne RP, McNicholas PD (2017a) Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. J Multivar Anal 161:141–156
Murray PM, Browne RP, McNicholas PD (2017b) A mixture of SDB skew-\(t\) factor analyzers. Econom Stat 3:160–168
Murray PM, Browne RP, McNicholas PD (2017c) Mixtures of hidden truncation hyperbolic factor analyzers. arXiv:1711.01504
O’Hagan A (1976) Moments of the truncated multivariate-\(t\) distribution. http://www.tonyohagan.co.uk/academic/pdf/trunc_multi_t.PDF
Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirow JP (2009) Automated high-dimensional flow cytometric data analysis. Proc National Acad Sci USA 106:8519–8524
R Core Team (2016) R: A Language and Environment for Statistical Computing. http://www.R-project.org/, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850
Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to Bayesian regression models. Can J Stat 31:129–150
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Seshadri V (1997) Halphen’s laws. In: Kotz S, Read CB, Banks DL (eds) Encyclopedia of Statistical Sciences. Wiley, New York, pp 302–306
Tortora C, Browne RP, Franczak BC, McNicholas PD (2015) MixGHD: Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions. http://cran.r-project.org/web/packages/MixGHD, r package version 1.7
Tortora C, McNicholas P, Browne R (2016) A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10:423–440
Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. J Mach Learn Res 11:2227–2240
Wall MM, Guo J, Amemiya Y (2012) Mixture factor analysis for approximating a non-normally distributed continuous latent factor with continuous and dichotomous observed variables. Multivar Behav Res 47:276–313
Yamamoto H, Nankaku Y, Miyajima C, Tokuda K, Kitamura T (2005) Parameter sharing in mixture of factor analyzers for speaker identification. IEICE Trans Inf Syst 88:418–424
Zhoe YK, Mobasher B (2006) Web user segmentation based on a mixture of factor analyzers. Lect Notes Comput Sci 4082:11–20
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: The class of CFUSS distributions
The class of canonical fundamental skew symmetric (CFUSS) distributions (Arellano-Valle and Genton 2005) is one of the more general formulations of skew distributions. We begin by examining the fundamental skew distribution. Its density can be expressed as the product of a symmetric density and a skewing function. Formally, the density of \({\varvec{Y}}\), a p-dimensional random vector following a CFUSS distribution, is given by
where \(f_p({\varvec{y}}; \varvec{\theta })\) is a symmetric density on \(\mathbb {R}^p\), \(Q_r({\varvec{y}}; \varvec{\theta })\) is a skewing function that maps \({\varvec{y}}\) into the unit interval, and \(\varvec{\theta }\) is the vector containing the parameters of \({\varvec{Y}}\). Let \({\varvec{U}}\) be a \(r\times 1\) random vector, where \({\varvec{Y}}\) and \({\varvec{U}}\) follow a joint distribution such that \({\varvec{Y}}\) has marginal density \(f_p({\varvec{y}}; \varvec{\theta })\) and \(Q_r({\varvec{y}}; \varvec{\theta }) = P({\varvec{U}}> \mathbf{0}\mid {\varvec{Y}}= {\varvec{y}})\). If the latent random vector \({\varvec{U}}\) has its canonical distribution (that is, with location vector \(\mathbf{0}\) and scale matrix \({\varvec{I}}_r\)), we obtain the canonical form of (33), namely the CFUSS distribution. The class of CFUSS distributions encapsulates many existing distributions, including most of those mentioned earlier in this paper. We shall consider some particular cases of the class of CFUSS distributions here.
1.1 A.1 The CFUSN distribution
The skew normal member of the class of CFUSS distributions is the canonical fundamental skew normal (CFUSN) distribution. This can be obtained by taking \(f_p\) to be a normal density, leading to \(Q_r\) being a normal cdf. It follows that the density of the CFUSN distribution is given by
where \(\varvec{\Omega }= \varvec{\Sigma }+ \varvec{\Delta }\varvec{\Delta }^T\) and \(\varvec{\Lambda }= {\varvec{I}}_r - \varvec{\Delta }^T \varvec{\Omega }^{-1} \varvec{\Delta }\). In the above, \(\varvec{\mu }\) is a \(p\times 1\) vector of location parameters, \(\varvec{\Sigma }\) is a \(p\times p\) positive definite scale matrix, and \(\varvec{\Delta }\) is a \(p\times r\) matrix of skewness parameters. We shall adopt the notation \({\varvec{Y}}\sim \hbox {CFUSN}_{p,r}(\varvec{\mu }, \varvec{\Sigma }, \varvec{\Delta })\) if \({\varvec{Y}}\) has the density given by (34). Note that when \(\varvec{\Delta }= \mathbf{0}\), we obtain the (multivariate) normal distribution. In addition, a number of skew normal distributions are nested within the CFUSN distribution, including the version proposed by Azzalini and Dalla Valle (1996) and the version proposed by Sahu et al. (2003). We shall follow the terminology of Lee and McLachlan (2013) and refer to them as the restricted and unrestricted skew normal distribution, respectively.
It is of interest to note that \({\varvec{Y}}\) admits a convolution-type stochastic representation that facilitates the derivation of properties and parameter estimation via the EM algorithm. This is given by
where \({\varvec{U}}\) follows a standard r-dimensional normal distribution, independently of \({\varvec{e}}\sim N_p(\mathbf{0}, \varvec{\Sigma })\). Hence, \(|{\varvec{U}}|\) has a standard half-normal distribution.
1.2 A.2 Scale mixture of CFUSN distributions
In the next two subsections, we shall consider two skew distributions that were recently employed by Lee and McLachlan (2016) and Murray et al. (2017a) for their mixture models, namely the CFUST and HTH distributions, respectively. They are special cases of the class of the CFUSS distributions that can be obtained as a scale mixture of the CFUSN (SMCFUSN) distribution. By a normal scale mixture, we mean a distribution that can be defined by the stochastic representation
where \({\varvec{Y}}_0\) follows a central CFUSN distribution and W is a positive (univariate) random variable independent of \({\varvec{Y}}_0\). Thus, conditional on \(W = w\), the density of \({\varvec{Y}}\) is a CFUSN distribution with scale matrix \({w}\varvec{\Sigma }\). It follows that the marginal density of \({\varvec{Y}}\) is given by (1), or equivalently,
where \(F_{{\varvec{\zeta }}}\) is defined in Sect. 2.2.
The class of SMCFUSN distributions is a generalization of the scale mixture of skew normal (SMSS) distributions considered by Cabral et al. (2012). The latter adopts a restricted skew normal distribution in place of the CFUSN distribution here. This class can be obtained from the SMCFUSN distribution by taking \(r=1\) (after reparameterization). Some special cases of the SMCFUSN distribution are listed in Table 10.
1.3 A.3 The CFUSH distribution
If the latent variable W in (36) follows a generalized inverse Gaussian (GIG) distribution (Seshadri 1997), we obtain the canonical fundamental skew hyperbolic (CFUSH) distribution. In this case, the symmetric density \(f_p\) in (33) is a symmetric GH distribution \(h_p(\cdot )\) and the skewing function becomes the cdf of a symmetric GH distribution \(H_r(\cdot )\). The GIG density can be expressed as
where \(W > 0\), the parameters \(\psi \) and \(\chi \) are positive, and \(\lambda \) is a real parameter. In the above, \(K_\lambda (\cdot )\) denotes the modified Bessel function of the third kind of order \(\lambda \). The density of a p-dimensional symmetric generalized hyperbolic distribution is given by
It is well known that the GH distribution has an identifiability issue in that the parameter vectors \(\varvec{\theta }=(\varvec{\mu }, c\varvec{\Sigma }, c\psi , \chi /c, \lambda )\) and \(\varvec{\theta }^*=(\varvec{\mu }, \varvec{\Sigma }, \psi , \chi , \lambda )\) both yield the same symmetric GH distribution (39) for any \(c>0\). It is therefore not surprising that the CFUSH distribution also suffers from such an issue. To handle this, restrictions are imposed on some of the parameters of the CFUSH distribution. An example is the HTH distribution considered by Murray et al. (2017a), where the constraint \(\psi =\chi =\omega \) is used, leading to the density
where \(\gamma = \sqrt{\psi (\omega +\eta )}\). Note that in their terminology, they are using ‘hidden truncation’ to describe the latent skewing variable that follows a truncated distribution in the convolution-type characterization of the CFUSH distribution. Another alternative is to restrict the parameters of W so that, for example, \(E(W)=1\). A commonly used constraint on the GH distribution is to set \(|\varvec{\Sigma }|=1\). This can be applied to the CFUSH distribution to achieve identifiability; see also the unrestricted skew normal generalized hyperbolic (SUNGH) distribution considered by Maleki et al. (2019).
1.4 A.4 The CFUST distribution
The CFUST distribution is the skew t-distribution member of the class of CFUSS distributions, where the symmetric distribution is taken to be a (multivariate) t-distribution. This can be obtained by letting \(\frac{1}{W}\) be a random variable that has a \(\hbox {gamma}(\frac{\nu }{2}, \frac{\nu }{2})\) distribution. Thus, its density is given by
We shall adopt the notation \({\varvec{Y}}\sim CFUST_{p,r}(\varvec{\mu }, \varvec{\Sigma }, \varvec{\Delta }, \nu )\) if \({\varvec{Y}}\) has the density given by (41).
The CFUST distribution can be represented by a number of stochastic representations, including the convolution of a half t-random vector \(|{\varvec{U}}|\) and a t-random vector \({\varvec{e}}\), given by
where \({\varvec{U}}\) and \({\varvec{e}}\) have a joint t-distribution given by
From (42), we can obtain the mean and covariance matrix \({\varvec{X}}\), which are given by
and
where \(a(\nu ) = \sqrt{\frac{\nu }{2}} \Gamma (\frac{\nu -1}{2}) \left[ \Gamma (\frac{\nu }{2})\right] ^{-1}\).
In addition to the CFUSN distribution (and its nested special/limiting cases), the CFUST distribution embeds a number of commonly used distributions as special or limiting cases. This includes the unrestricted t-distribution by Sahu et al. (2003) (obtained by taking \(\varvec{\Delta }\) to be a diagonal \(p\times p\) matrix, and letting \(\nu \rightarrow \infty \) for the skew normal case), the restricted skew t-distributions (obtained by setting \(r=1\)), and the t-distribution (obtained by setting \(\varvec{\Delta }=0\)). Concerning the identifiability of the CFUST model, it can be observed from (42) that it bears a resemblance to the FA model (2). Indeed, it can be viewed as a FA model with latent factors following a half t-distribution and the skewness matrix acting as the factor loading matrix. However, unlike the FA model, the term \(\varvec{\Delta }|{\varvec{U}}|\) in the CFUST distribution is not rotational invariant. However, it is invariant to permutations of the columns of \(\varvec{\Delta }\), but this does not affect the number of free parameters in the CFUST model.
Appendix B: Expressions for the E-step of the ECM algorithm for the CFUSTFA model
For the CFUSTFA model, the E-step of the ECM algorithms involves four conditional expressions that are analogous to the case of mixtures of CFUST distributions. Technical details can be found in Lee and McLachlan (2016). The expressions for (19) to (22) are similar to that for (12), (13), (15), and (16), respectively, in Lee and McLachlan (2016). However, the scale matrices and skewness matrices in our case are given by \(\varvec{\Sigma }_i^{*^{(k)}} = {\varvec{B}}_i^{(k)}{\varvec{B}}_i^{(k)}+{\varvec{D}}_i^{(k)}\) and \(\varvec{\Delta }_i^{*^{(k)}} = {\varvec{B}}_i^{(k)} \varvec{\Delta }_i^{(k)}\) \((i=1, \ldots , g)\), respectively. Thus, the expressions for the conditional expectations (19) to (22) are given by
where
and where \({\varvec{a}}_{ij}^{(k)}\) is a r-variate truncated t-random variable given by
The last term in expressions (45) and (46) correspond to the first and second moments of \({\varvec{a}}_{ij}^{(k)}\) and can be evaluated using formulae described in, for example, O’Hagan (1976), Ho et al. (2012), and in the appendix of Lee and McLachlan (2014).
Rights and permissions
About this article
Cite this article
Lee, S.X., Lin, TI. & McLachlan, G.J. Mixtures of factor analyzers with scale mixtures of fundamental skew normal distributions. Adv Data Anal Classif 15, 481–512 (2021). https://doi.org/10.1007/s11634-020-00420-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-020-00420-9