Skip to main content
Log in

Mixtures of factor analyzers with scale mixtures of fundamental skew normal distributions

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Mixtures of factor analyzers (MFA) provide a powerful tool for modelling high-dimensional datasets. In recent years, several generalizations of MFA have been developed where the normality assumption of the factors and/or of the errors were relaxed to allow for skewness in the data. However, due to the form of the adopted component densities, the distribution of the factors/errors in most of these models is typically limited to modelling skewness concentrated in a single direction. Here, we introduce a more flexible finite mixture of factor analyzers based on the class of scale mixtures of canonical fundamental skew normal (SMCFUSN) distributions. This very general class of skew distributions can capture various types of skewness and asymmetry in the data. In particular, the proposed mixtures of SMCFUSN factor analyzers (SMCFUSNFA) can simultaneously accommodate multiple directions of skewness. As such, it encapsulates many commonly used models as special and/or limiting cases, such as models of some versions of skew normal and skew t-factor analyzers, and skew hyperbolic factor analyzers. For illustration, we focus on the t-distribution member of the class of SMCFUSN distributions, leading to mixtures of canonical fundamental skew t-factor analyzers (CFUSTFA). Parameter estimation can be carried out by maximum likelihood via an EM-type algorithm. The usefulness and potential of the proposed model are demonstrated using four real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Arellano-Valle RB, Azzalini A (2006) On the unification of families of skew-normal distributions. Scand J Stat 33:561–574

    MathSciNet  MATH  Google Scholar 

  • Arellano-Valle RB, Genton MG (2005) On fundamental skew distributions. J Multivar Anal 96:93–116

    MathSciNet  MATH  Google Scholar 

  • Azzalini A, Capitanio A (2014) The Skew-Normal and Related Families. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83:715–726

    MathSciNet  MATH  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719–725

    Google Scholar 

  • Browne RP, McNicholas PD (2015) A mixture of generalized hyperbolic distributions. Can J Stat 43:176–198

    MathSciNet  MATH  Google Scholar 

  • Cabral CRB, Lachos VH, Prates MO (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56:126–142

    MathSciNet  MATH  Google Scholar 

  • Codella N, Gutman D, Celebi ME, Helba B, Marchetti MA, Dusza S, Kalloo A, Liopyris K, Mishra N, Kittler H, Halpern A (2017) Skin lesion analysis toward melanoma detection: A challenge at the 2017 In: International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). arXiv:1710.05006

  • Cook RD, Weisberg S (1994) An Introduction to Regression Graphics. Wiley, New York

    MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Ferris LK, Harkes JA, Gilbert B, Winger DG, Golubets K, Akilov O, Satyanarayanan M (2015) Computer-aided classification of melanocytic lesions using dermoscopic images. J Am Acad Dermatol 73:769–776

    Google Scholar 

  • Forina M, Tiscornia E (1982) Pattern recognition methods in the prediction of italian olive oil origin by their fatty acid content. Annali di Chimica 72:143–155

    Google Scholar 

  • Genton MG (ed) (2004) Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman & Hall, CRC, Boca Raton, Florida

  • Ghahramani Z, Hinton G (1997) The EM algorithm for factor analyzers. Technical Report No CRG-TR-96-1 The University of Toronto: Toronto

  • Ho HJ, Lin TI, Chen HY, Wang WL (2012) Some results on the truncated multivariate \(t\) distribution. J Stat Plan Inference 142:25–40

    MathSciNet  MATH  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    MATH  Google Scholar 

  • Karlis D, Santourian A (2009) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19:73–83

    MathSciNet  Google Scholar 

  • Kim HM, Maadooliat M, Arellano-Valle RB, Genton MG (2016) Skewed factor models using selection mechanisms. J Multivar Anal 145:162–177

    MathSciNet  MATH  Google Scholar 

  • Kim SG (2016) An approximate fitting for mixture of multivariate skew normal distribution via EM algorithm. Korean J Appl Stat 29:513–523

    Google Scholar 

  • Lee S, McLachlan GJ (2014) Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results. Stat Comput 24:181–202

    MathSciNet  MATH  Google Scholar 

  • Lee SX, McLachlan GJ (2013) On mixtures of skew-normal and skew \(t\)-distributions. Adv Data Anal Classif 7:241–266

    MathSciNet  MATH  Google Scholar 

  • Lee SX, McLachlan GJ (2016) Finite mixtures of canonical fundamental skew \(t\)-distributions: The unification of the restricted and unrestricted skew \(t\)-mixture models. Stat Comput 26:573–589

    MathSciNet  MATH  Google Scholar 

  • Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100:257–265

    MathSciNet  MATH  Google Scholar 

  • Lin TI (2010) Robust mixture modeling using multivariate skew-\(t\) distribution. Stat Comput 20:343–356

    MathSciNet  Google Scholar 

  • Lin TI, Wu PH, McLachlan GJ, Lee SX (2015) A robust factor analysis model using the restricted skew \(t\)-distribution. TEST 24:510–531

    MathSciNet  MATH  Google Scholar 

  • Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413

    MathSciNet  MATH  Google Scholar 

  • Lin TI, Wang WL, McLachlan GJ, Lee SX (2018) Robust mixtures of factor analysis models using the restricted multivariate skew-\(t\) distribution. Stat Modell 18:50–72

    MathSciNet  MATH  Google Scholar 

  • Maleki M, Wraith D, Arellano-Valle RB (2019) Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions. Stat Comput 29:425–428

    MathSciNet  MATH  Google Scholar 

  • Maruotti A, Bulla J, Lagona F, Picone M, Martella F (2017) Dynamic mixtures of factor analyzers to characterize multivariate air pollutant exposures. Ann Appl Stat 3:1617–1648

    MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Krishnan T (2008) The EM Algorithm and Extensions, 2nd edn. Wiley, Hoboken, New Jersey

    MATH  Google Scholar 

  • McLachlan GJ, Lee SX (2016) Comment on “On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas Stat Probab Lett 116:1–5

    MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York

    MATH  Google Scholar 

  • McLachlan GJ, Peel D, Bean RW (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41:379–388

    MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Bean RW, Jones BT (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\)-distribution. Comput Stat Data Anal 51:5327–5338

    MathSciNet  MATH  Google Scholar 

  • Meng X, Rubin D (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278

    MathSciNet  MATH  Google Scholar 

  • Montanari A, Viroli C (2010) A skew-normal factor model for the analysis of student satisfaction towards university courses. J Appl Stat 37:463–487

    MathSciNet  MATH  Google Scholar 

  • Murray P, Browne R, McNicholas P (2014a) Mixtures of skew-\(t\) factor analyzers. Comput Stat Data Anal 77:326–335

    MathSciNet  MATH  Google Scholar 

  • Murray P, McNicholas P, Browne R (2014b) Mixtures of common skew-\(t\) factor analyzers. Statistics 3:68–82

    MATH  Google Scholar 

  • Murray PM (2016) Detecting non-elliptical clusters. PhD thesis, Department of Mathematics & Statistics, McMaster University, Canada

  • Murray PM, Browne RP, McNicholas PD (2017a) Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. J Multivar Anal 161:141–156

    MathSciNet  MATH  Google Scholar 

  • Murray PM, Browne RP, McNicholas PD (2017b) A mixture of SDB skew-\(t\) factor analyzers. Econom Stat 3:160–168

    MathSciNet  Google Scholar 

  • Murray PM, Browne RP, McNicholas PD (2017c) Mixtures of hidden truncation hyperbolic factor analyzers. arXiv:1711.01504

  • O’Hagan A (1976) Moments of the truncated multivariate-\(t\) distribution. http://www.tonyohagan.co.uk/academic/pdf/trunc_multi_t.PDF

  • Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirow JP (2009) Automated high-dimensional flow cytometric data analysis. Proc National Acad Sci USA 106:8519–8524

    Google Scholar 

  • R Core Team (2016) R: A Language and Environment for Statistical Computing. http://www.R-project.org/, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850

    Google Scholar 

  • Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with applications to Bayesian regression models. Can J Stat 31:129–150

    MathSciNet  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    MathSciNet  MATH  Google Scholar 

  • Seshadri V (1997) Halphen’s laws. In: Kotz S, Read CB, Banks DL (eds) Encyclopedia of Statistical Sciences. Wiley, New York, pp 302–306

    Google Scholar 

  • Tortora C, Browne RP, Franczak BC, McNicholas PD (2015) MixGHD: Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions. http://cran.r-project.org/web/packages/MixGHD, r package version 1.7

  • Tortora C, McNicholas P, Browne R (2016) A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10:423–440

    MathSciNet  MATH  Google Scholar 

  • Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. J Mach Learn Res 11:2227–2240

    MathSciNet  MATH  Google Scholar 

  • Wall MM, Guo J, Amemiya Y (2012) Mixture factor analysis for approximating a non-normally distributed continuous latent factor with continuous and dichotomous observed variables. Multivar Behav Res 47:276–313

    Google Scholar 

  • Yamamoto H, Nankaku Y, Miyajima C, Tokuda K, Kitamura T (2005) Parameter sharing in mixture of factor analyzers for speaker identification. IEICE Trans Inf Syst 88:418–424

    Google Scholar 

  • Zhoe YK, Mobasher B (2006) Web user segmentation based on a mixture of factor analyzers. Lect Notes Comput Sci 4082:11–20

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geoffrey J. McLachlan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: The class of CFUSS distributions

The class of canonical fundamental skew symmetric (CFUSS) distributions (Arellano-Valle and Genton 2005) is one of the more general formulations of skew distributions. We begin by examining the fundamental skew distribution. Its density can be expressed as the product of a symmetric density and a skewing function. Formally, the density of \({\varvec{Y}}\), a p-dimensional random vector following a CFUSS distribution, is given by

$$\begin{aligned} f({\varvec{y}}; \varvec{\theta })= & {} 2^{r} f_p({\varvec{y}}; \varvec{\theta }) \, Q_r({\varvec{y}}; \varvec{\theta }), \end{aligned}$$
(33)

where \(f_p({\varvec{y}}; \varvec{\theta })\) is a symmetric density on \(\mathbb {R}^p\), \(Q_r({\varvec{y}}; \varvec{\theta })\) is a skewing function that maps \({\varvec{y}}\) into the unit interval, and \(\varvec{\theta }\) is the vector containing the parameters of \({\varvec{Y}}\). Let \({\varvec{U}}\) be a \(r\times 1\) random vector, where \({\varvec{Y}}\) and \({\varvec{U}}\) follow a joint distribution such that \({\varvec{Y}}\) has marginal density \(f_p({\varvec{y}}; \varvec{\theta })\) and \(Q_r({\varvec{y}}; \varvec{\theta }) = P({\varvec{U}}> \mathbf{0}\mid {\varvec{Y}}= {\varvec{y}})\). If the latent random vector \({\varvec{U}}\) has its canonical distribution (that is, with location vector \(\mathbf{0}\) and scale matrix \({\varvec{I}}_r\)), we obtain the canonical form of (33), namely the CFUSS distribution. The class of CFUSS distributions encapsulates many existing distributions, including most of those mentioned earlier in this paper. We shall consider some particular cases of the class of CFUSS distributions here.

1.1 A.1 The CFUSN distribution

The skew normal member of the class of CFUSS distributions is the canonical fundamental skew normal (CFUSN) distribution. This can be obtained by taking \(f_p\) to be a normal density, leading to \(Q_r\) being a normal cdf. It follows that the density of the CFUSN distribution is given by

$$\begin{aligned} f_{{\mathrm{CFUSN}}}({\varvec{y}}; \varvec{\mu }, \varvec{\Sigma }, \varvec{\Delta })= & {} 2^{r} \phi _p({\varvec{y}}; \varvec{\mu }, \varvec{\Omega }) \; \Phi _r\left( \varvec{\Delta }^T\varvec{\Omega }^{-1}({\varvec{y}}-\varvec{\mu }); \mathbf{0}, \varvec{\Lambda }\right) , \end{aligned}$$
(34)

where \(\varvec{\Omega }= \varvec{\Sigma }+ \varvec{\Delta }\varvec{\Delta }^T\) and \(\varvec{\Lambda }= {\varvec{I}}_r - \varvec{\Delta }^T \varvec{\Omega }^{-1} \varvec{\Delta }\). In the above, \(\varvec{\mu }\) is a \(p\times 1\) vector of location parameters, \(\varvec{\Sigma }\) is a \(p\times p\) positive definite scale matrix, and \(\varvec{\Delta }\) is a \(p\times r\) matrix of skewness parameters. We shall adopt the notation \({\varvec{Y}}\sim \hbox {CFUSN}_{p,r}(\varvec{\mu }, \varvec{\Sigma }, \varvec{\Delta })\) if \({\varvec{Y}}\) has the density given by (34). Note that when \(\varvec{\Delta }= \mathbf{0}\), we obtain the (multivariate) normal distribution. In addition, a number of skew normal distributions are nested within the CFUSN distribution, including the version proposed by Azzalini and Dalla Valle (1996) and the version proposed by Sahu et al. (2003). We shall follow the terminology of Lee and McLachlan (2013) and refer to them as the restricted and unrestricted skew normal distribution, respectively.

It is of interest to note that \({\varvec{Y}}\) admits a convolution-type stochastic representation that facilitates the derivation of properties and parameter estimation via the EM algorithm. This is given by

$$\begin{aligned} {\varvec{Y}}= & {} \varvec{\mu }+ \varvec{\Delta }|{\varvec{U}}| + {\varvec{e}}, \end{aligned}$$
(35)

where \({\varvec{U}}\) follows a standard r-dimensional normal distribution, independently of \({\varvec{e}}\sim N_p(\mathbf{0}, \varvec{\Sigma })\). Hence, \(|{\varvec{U}}|\) has a standard half-normal distribution.

1.2 A.2 Scale mixture of CFUSN distributions

In the next two subsections, we shall consider two skew distributions that were recently employed by Lee and McLachlan (2016) and Murray et al. (2017a) for their mixture models, namely the CFUST and HTH distributions, respectively. They are special cases of the class of the CFUSS distributions that can be obtained as a scale mixture of the CFUSN (SMCFUSN) distribution. By a normal scale mixture, we mean a distribution that can be defined by the stochastic representation

$$\begin{aligned} {\varvec{Y}}= & {} \varvec{\mu }+ W^{\frac{1}{2}} {\varvec{Y}}_0, \end{aligned}$$
(36)

where \({\varvec{Y}}_0\) follows a central CFUSN distribution and W is a positive (univariate) random variable independent of \({\varvec{Y}}_0\). Thus, conditional on \(W = w\), the density of \({\varvec{Y}}\) is a CFUSN distribution with scale matrix \({w}\varvec{\Sigma }\). It follows that the marginal density of \({\varvec{Y}}\) is given by (1), or equivalently,

$$\begin{aligned}&f_{{\mathrm{SMCFUSN}}} ({\varvec{y}}; \varvec{\mu }, \varvec{\Sigma }, \varvec{\Delta }; F_{{\varvec{\zeta }}}) \nonumber \\& = 2^{r} \int _0^\infty \phi _p \left( {\varvec{y}}; \varvec{\mu }, {w}\varvec{\Omega }\right) \, \Phi _r\left( \frac{1}{\sqrt{w}}\varvec{\Delta }^T\varvec{\Omega }^{-1}({\varvec{y}}-\varvec{\mu }); \mathbf{0}, \varvec{\Lambda }\right) dF_{{\varvec{\zeta }}}(w), \end{aligned}$$
(37)

where \(F_{{\varvec{\zeta }}}\) is defined in Sect. 2.2.

The class of SMCFUSN distributions is a generalization of the scale mixture of skew normal (SMSS) distributions considered by Cabral et al. (2012). The latter adopts a restricted skew normal distribution in place of the CFUSN distribution here. This class can be obtained from the SMCFUSN distribution by taking \(r=1\) (after reparameterization). Some special cases of the SMCFUSN distribution are listed in Table 10.

Table 10 Some special cases of the scale mixture of CFUSN distributions

1.3 A.3 The CFUSH distribution

If the latent variable W in (36) follows a generalized inverse Gaussian (GIG) distribution (Seshadri 1997), we obtain the canonical fundamental skew hyperbolic (CFUSH) distribution. In this case, the symmetric density \(f_p\) in (33) is a symmetric GH distribution \(h_p(\cdot )\) and the skewing function becomes the cdf of a symmetric GH distribution \(H_r(\cdot )\). The GIG density can be expressed as

$$\begin{aligned} f_{{\mathrm{GIG}}} (w; \psi , \chi , \lambda )= & {} \frac{\left( \frac{\psi }{\chi }\right) ^{\frac{\lambda }{2}} w^{\lambda -1}}{2 K_\lambda (\sqrt{\chi \psi })} e^{-\frac{\psi w + \frac{\chi }{w}}{2}}, \end{aligned}$$
(38)

where \(W > 0\), the parameters \(\psi \) and \(\chi \) are positive, and \(\lambda \) is a real parameter. In the above, \(K_\lambda (\cdot )\) denotes the modified Bessel function of the third kind of order \(\lambda \). The density of a p-dimensional symmetric generalized hyperbolic distribution is given by

$$\begin{aligned} h_p({\varvec{y}}; \varvec{\mu }, \varvec{\Sigma },\varvec{\psi },\chi ,\lambda )= & {} \left( \frac{\chi +\eta }{\psi }\right) ^{\frac{\lambda }{2}-\frac{p}{4}} \frac{\left( \frac{\psi }{\chi }\right) ^{\frac{\lambda }{2}} K_{\lambda -\frac{p}{2}}(\sqrt{(\chi +\eta )\psi })}{(2\pi )^{\frac{p}{2}} |\varvec{\Sigma }|^{\frac{1}{2}} K_\lambda (\sqrt{\chi \psi })}. \end{aligned}$$
(39)

It is well known that the GH distribution has an identifiability issue in that the parameter vectors \(\varvec{\theta }=(\varvec{\mu }, c\varvec{\Sigma }, c\psi , \chi /c, \lambda )\) and \(\varvec{\theta }^*=(\varvec{\mu }, \varvec{\Sigma }, \psi , \chi , \lambda )\) both yield the same symmetric GH distribution (39) for any \(c>0\). It is therefore not surprising that the CFUSH distribution also suffers from such an issue. To handle this, restrictions are imposed on some of the parameters of the CFUSH distribution. An example is the HTH distribution considered by Murray et al. (2017a), where the constraint \(\psi =\chi =\omega \) is used, leading to the density

$$\begin{aligned}&f_{{\mathrm{HTH}}} ({\varvec{y}}; \varvec{\mu }, \varvec{\Sigma }, \varvec{\Delta }, \omega , \lambda ) \nonumber \\&\quad = 2^r h_p\left( {\varvec{y}}; \varvec{\mu }, \varvec{\Omega }, \omega , \omega , \lambda \right) H_r\left( \varvec{\Delta }^T\varvec{\Omega }^{-1}({\varvec{y}}-\varvec{\mu }) \left( \frac{\omega }{\omega +\eta }\right) ^{\frac{1}{4}}; \mathbf{0}, \varvec{\Lambda }, \lambda -\textstyle \frac{p}{2}, \gamma , \gamma \right) ,\nonumber \\ \end{aligned}$$
(40)

where \(\gamma = \sqrt{\psi (\omega +\eta )}\). Note that in their terminology, they are using ‘hidden truncation’ to describe the latent skewing variable that follows a truncated distribution in the convolution-type characterization of the CFUSH distribution. Another alternative is to restrict the parameters of W so that, for example, \(E(W)=1\). A commonly used constraint on the GH distribution is to set \(|\varvec{\Sigma }|=1\). This can be applied to the CFUSH distribution to achieve identifiability; see also the unrestricted skew normal generalized hyperbolic (SUNGH) distribution considered by Maleki et al. (2019).

1.4 A.4 The CFUST distribution

The CFUST distribution is the skew t-distribution member of the class of CFUSS distributions, where the symmetric distribution is taken to be a (multivariate) t-distribution. This can be obtained by letting \(\frac{1}{W}\) be a random variable that has a \(\hbox {gamma}(\frac{\nu }{2}, \frac{\nu }{2})\) distribution. Thus, its density is given by

$$\begin{aligned}&f_{{\mathrm{CFUST}}}({\varvec{y}}; \varvec{\mu }, \varvec{\Sigma }, \varvec{\Delta }, \nu ) \nonumber \\&\quad = 2^r t_p({\varvec{y}}; \varvec{\mu }, \varvec{\Omega }, \nu ) T_r\left( \varvec{\Delta }^T\varvec{\Omega }^{-1}({\varvec{y}}-\varvec{\mu }); \mathbf{0}, \left( \frac{\nu +\eta }{\nu +p}\right) \varvec{\Lambda }, \nu +p\right) . \end{aligned}$$
(41)

We shall adopt the notation \({\varvec{Y}}\sim CFUST_{p,r}(\varvec{\mu }, \varvec{\Sigma }, \varvec{\Delta }, \nu )\) if \({\varvec{Y}}\) has the density given by (41).

The CFUST distribution can be represented by a number of stochastic representations, including the convolution of a half t-random vector \(|{\varvec{U}}|\) and a t-random vector \({\varvec{e}}\), given by

$$\begin{aligned} {\varvec{Y}}= & {} \varvec{\mu }+ \varvec{\Delta }|{\varvec{U}}| + {\varvec{e}}, \end{aligned}$$
(42)

where \({\varvec{U}}\) and \({\varvec{e}}\) have a joint t-distribution given by

$$\begin{aligned} \left[ \begin{array}{c} {\varvec{U}}\\ {\varvec{e}}\end{array}\right]\sim & {} t_{r+p} \left( \left[ \begin{array}{c} \mathbf{0}\\ \mathbf{0}\end{array}\right] , \left[ \begin{array}{cc} {\varvec{I}}_r &{} \quad \mathbf{0}\\ \mathbf{0}&{} \quad \varvec{\Sigma }\end{array}\right] , \nu \right) . \end{aligned}$$

From (42), we can obtain the mean and covariance matrix \({\varvec{X}}\), which are given by

$$\begin{aligned} E({\varvec{Y}})= & {} \varvec{\mu }+ a(\nu ) \varvec{\Delta }{\varvec{1}}_r \end{aligned}$$

and

$$\begin{aligned} \hbox {cov}({\varvec{Y}})= & {} \left( \frac{\nu }{\nu -2}\right) \left[ \varvec{\Sigma }+ \left( 1-\frac{2}{\pi }\right) \varvec{\Delta }\varvec{\Delta }^T\right] + \left[ \frac{2\nu }{\pi (\nu -2)} + a(\nu )^2\right] \varvec{\Delta }{\varvec{J}}_r \varvec{\Delta }^T, \end{aligned}$$

where \(a(\nu ) = \sqrt{\frac{\nu }{2}} \Gamma (\frac{\nu -1}{2}) \left[ \Gamma (\frac{\nu }{2})\right] ^{-1}\).

In addition to the CFUSN distribution (and its nested special/limiting cases), the CFUST distribution embeds a number of commonly used distributions as special or limiting cases. This includes the unrestricted t-distribution by Sahu et al. (2003) (obtained by taking \(\varvec{\Delta }\) to be a diagonal \(p\times p\) matrix, and letting \(\nu \rightarrow \infty \) for the skew normal case), the restricted skew t-distributions (obtained by setting \(r=1\)), and the t-distribution (obtained by setting \(\varvec{\Delta }=0\)). Concerning the identifiability of the CFUST model, it can be observed from (42) that it bears a resemblance to the FA model (2). Indeed, it can be viewed as a FA model with latent factors following a half t-distribution and the skewness matrix acting as the factor loading matrix. However, unlike the FA model, the term \(\varvec{\Delta }|{\varvec{U}}|\) in the CFUST distribution is not rotational invariant. However, it is invariant to permutations of the columns of \(\varvec{\Delta }\), but this does not affect the number of free parameters in the CFUST model.

Appendix B: Expressions for the E-step of the ECM algorithm for the CFUSTFA model

For the CFUSTFA model, the E-step of the ECM algorithms involves four conditional expressions that are analogous to the case of mixtures of CFUST distributions. Technical details can be found in Lee and McLachlan (2016). The expressions for (19) to (22) are similar to that for (12), (13), (15), and (16), respectively, in Lee and McLachlan (2016). However, the scale matrices and skewness matrices in our case are given by \(\varvec{\Sigma }_i^{*^{(k)}} = {\varvec{B}}_i^{(k)}{\varvec{B}}_i^{(k)}+{\varvec{D}}_i^{(k)}\) and \(\varvec{\Delta }_i^{*^{(k)}} = {\varvec{B}}_i^{(k)} \varvec{\Delta }_i^{(k)}\) \((i=1, \ldots , g)\), respectively. Thus, the expressions for the conditional expectations (19) to (22) are given by

$$\begin{aligned} z_{ij}^{(k)}= & {} \frac{\pi _i^{(k)} f_{{\mathrm{CFUST}}_{p,r}} ({\varvec{y}}_j; \varvec{\mu }_i^{(k)}, \varvec{\Sigma }_i^{*^{(k)}}, \varvec{\Delta }_i^{*^{(k)}}, \nu _i^{(k)})}{\sum _{i=1}^g \pi _i^{(k)} f_{{\mathrm{CFUST}}_{p,r}} ({\varvec{y}}_j; \varvec{\mu }_i^{(k)}, \varvec{\Sigma }_i^{*^{(k)}}, \varvec{\Delta }_i^{*^{(k)}}, \nu _i^{(k)})}, \end{aligned}$$
(43)
$$\begin{aligned} w_{ij}^{(k)}= & {} \left( \frac{\nu _i^{(k)} + p}{\nu _i^{(k)} + d_{ij}^{(k)}}\right) \frac{T_r\left( {\varvec{c}}_{ij}^{(k)} \sqrt{\frac{\nu _i^{(k)}+p+2}{\nu _i+d_{ij}^{(k)}}}; \mathbf{0}, \varvec{\Lambda }_i^{(k)}, \nu _i^{(k)}+p+2\right) }{T_r\left( {\varvec{c}}_{ij}^{(k)} \sqrt{\frac{\nu _i^{(k)}+p}{\nu _i+d_{ij}^{(k)}}}; \mathbf{0}, \varvec{\Lambda }_i^{(k)}, \nu _i^{(k)}+p\right) }, \end{aligned}$$
(44)
$$\begin{aligned} {\varvec{e}}_{1ij}^{(k)}= & {} w_{ij}^{(k)} E\left[ {\varvec{a}}_{ij}^{(k)}\right] , \end{aligned}$$
(45)
$$\begin{aligned} {\varvec{e}}_{2ij}^{{(k)}}= & {} w_{ij}^{(k)} E\left[ {\varvec{a}}_{ij}^{(k)} {\varvec{a}}_{ij}^{(k)^T} \right] , \end{aligned}$$
(46)

where

$$\begin{aligned} d_{ij}^{(k)}= & {} ({\varvec{y}}_j - \varvec{\mu }_i^{(k)})^T \varvec{\Omega }_i^{(k)^{-1}} ({\varvec{y}}_i - \varvec{\mu }_i^{(k)}),\\ {\varvec{c}}_{ij}^{(k)}= & {} \varvec{\Delta }_i^{*^{(k)^T}} \varvec{\Omega }_i^{(k)^{-1}} ({\varvec{y}}_j-\varvec{\mu }_i^{(k)}),\\ \varvec{\Lambda }_i^{(k)}= & {} {\varvec{I}}_r - \varvec{\Delta }_i^{*^{(k)^T}} \varvec{\Omega }_i^{(k)^{-1}} \varvec{\Delta }_i^{(k)},\\ \varvec{\Omega }_i^{(k)}= & {} \varvec{\Sigma }_i^{*^{(k)}} + \varvec{\Delta }_i^{*^{(k)}} \varvec{\Delta }_i^{*^{(k)^T}}, \end{aligned}$$

and where \({\varvec{a}}_{ij}^{(k)}\) is a r-variate truncated t-random variable given by

$$\begin{aligned} {\varvec{a}}_{ij}^{(k)}\sim & {} Tt_r\left( {\varvec{c}}_{ij}^{(k)}, \left( \frac{\nu _i^{(k)} + d_{ij}^{(k)}}{\nu _i^{(k)}+p+2}\right) \varvec{\Lambda }_i^{(k)}, \nu _i^{(k)}+p+2; \mathbb {R}^+\right) . \end{aligned}$$

The last term in expressions (45) and (46) correspond to the first and second moments of \({\varvec{a}}_{ij}^{(k)}\) and can be evaluated using formulae described in, for example, O’Hagan (1976), Ho et al. (2012), and in the appendix of Lee and McLachlan (2014).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, S.X., Lin, TI. & McLachlan, G.J. Mixtures of factor analyzers with scale mixtures of fundamental skew normal distributions. Adv Data Anal Classif 15, 481–512 (2021). https://doi.org/10.1007/s11634-020-00420-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-020-00420-9

Keywords

Mathematics Subject Classification

Navigation