Abstract
Finite mixtures of multivariate distributions play a fundamental role in model-based clustering. However, they pose several problems, especially in the presence of many irrelevant variables. Dimension reduction methods, such as projection pursuit, are commonly used to address these problems. In this paper, we use skewness-maximizing projections to recover the subspace which optimally separates the cluster means. Skewness might then be removed in order to search for other potentially interesting data structures or to perform skewness-sensitive statistical analyses, such as the Hotelling’s \( T^{2}\) test. Our approach is algebraic in nature and deals with the symmetric tensor rank of the third multivariate cumulant. We also derive closed-form expressions for the symmetric tensor rank of the third cumulants of several multivariate mixture models, including mixtures of skew-normal distributions and mixtures of two symmetric components with proportional covariance matrices. Theoretical results in this paper shed some light on the connection between the estimated number of mixture components and their skewness.






Similar content being viewed by others
References
Adcock C, Eling M, Loperfido N (2015) Skewed distributions in finance and actuarial science: a review. Eur J Finance 21:1253–1281
Ambagaspitiya RS (1999) On the distributions of two classes of correlated aggregate claims. Insur Math Econ 24:301–308
Arellano-Valle RB, Genton MG, Loschi RH (2009) Shape mixtures of multivariate skew-normal distributions. J Multivar Anal 100:91–101
Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\) distribution. J R Stat Soc B 65:367–389
Azzalini A, Genton MG (2008) Robust likelihood methods based on the skew-t and related distributions. Int Stat Rev 76:106–129
Bartoletti S, Loperfido N (2010) Modelling air pollution data by the skew-normal distribution. Stoch Environ Res Risk Assess 24:513–517
Blough DK (1989) Multivariate symmetry and asymmetry. Inst Stat Math 24:513–517
Bolton RJ, Krzanowski WJ (2003) Projection pursuit clustering for exploratory data analysis. J Comput Graph Stat 12:121–142
Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78
Branco MD, Dey DK (2001) A general class of skew-elliptical distributions. J Multivar Anal 79:99–113
Comon P (2014) Tensors: a brief introduction. IEEE Sig Process Mag Inst Electr Electron Eng 31:44–53
Comon P, Golub G, Lim L-H, Mourrain B (2008) Symmetric tensors and symmetric tensor rank. SIAM J Matrix Anal Appl 30:1254–1279
Fraley C, Raftery Adrian E, Scrucca L (2017) mclust: Gaussian mixture modelling for model-based clustering, classification, and density estimation. https://CRAN.R-project.org/package=mclust. R package version 5.3
Franceschini C, Loperfido N (2017a) MaxSkew: skewness-based projection pursuit. https://CRAN.R-project.org/package=MaxSkew. R package version 1.1
Franceschini C, Loperfido N (2017b) MultiSkew: measures, tests and removes multivariate skewness. https://CRAN.R-project.org/package=MultiSkew. R package version 1.1.1
Friedman J (1987) Exploratory projection pursuit. J. Am Stat Assoc 82:249–266
Friedman JH, Tukey JW (1974) A projection pursuit algorithm for exploratory data analysis. IEEE Trans Comput Ser C 23:881–890
Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew\(-t\) distributions. Biostatistics 11:317–336
Grasman RPPP, Huizenga HM, Geurts HM (2010) Departure from normality in multivariate normative comparison: the Cramé r alternative for Hotelling’s \(T^{2}\). Neuropsychologia 48:1510–1516
Hennig C (2004) Asymmetric linear dimension reduction for classification. J Comput Graph Stat 13:930–945
Hennig C (2005) A method for visual cluster validation. In: Weihs C, Gaul W (eds) Classification—the ubiquitous challenge. Springer, Heidelberg, pp 153–160
Hui G, Lindsay BG (2010) Projection pursuit via white noise matrices. Sankhya B 72:123–153
Jondeau E, Rockinger M (2006) Optimal portfolio allocation under higher moments. Eur Financ Manag 12:29–55
Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41:577–590
Kim H-M, Mallick BK (2003) Moments of random vectors with skew \(t\) distribution and their quadratic forms. Stat Probab Lett 63:417–423
Landsberg JM, Michalek M (2017) On the geometry of border rank decompositions for matrix multiplication and other tensors with symmetry. SIAM J Appl Algebra Geom 1:2–19
Lee S, McLachlan GJ (2013) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22:427–454 (with discussion)
Lin XS (2004) Compound distributions. In: Encyclopedia of actuarial science, vol 1. Wiley, pp 314–317
Lindsay BG, Yao W (2012) Fisher information matrix: a tool for dimension reduction, projection pursuit, independent component analysis, and more. Can J Stat 40:712–730
Loperfido N (2004) Generalized skew-normal distributions. Skew-elliptical distributions and their applications: a journey beyond normality. CRC, Boca Raton, pp 65–80
Loperfido N (2013) Skewness and the linear discriminant function. Stat Probab Lett 83:93–99
Loperfido N (2014) Linear transformations to symmetry. J Multivar Anal 129:186–192
Loperfido N (2015a) Vector-valued skewness for model-based clustering. Stat Probab Lett 99:230–237
Loperfido N (2015b) Singular value decomposition of the third multivariate moment. Linear Algebra Appl 473:202–216
Loperfido N (2018) Skewness-based projection pursuit: a computational approach. Comput Stat Data Anal 120:42–57
Loperfido N, Mazur S, Podgorski K (2018) Third cumulant for multivariate aggregate claims models. Scand Actuar J 2018:109–128
Mardia K (1970) Measures of multivariate skewness and kurtosis with applications. Biometrika 57:519–530
McNicholas PD (2016) Model-based clustering. J Class 33:331–373
Melnykov V, Maitra R (2010) Finite mixture models and model-based clustering. Stat Surv 4:80–116
Miettinen J, Taskinen S, Nordhausen K, Oja H (2015) Fourth moments and independent component analysis. Stat Sci 3:372–390
Mòri T, Rohatgi V, Székely G (1993) On multivariate skewness and kurtosis. Theory Probab Appl 38:547–551
Morris K, McNicholas PD, Scrucca L (2013) Dimension reduction for model-based clustering via mixtures of multivariate t-distributions. Adv Data Anal Classif 7:321–338
Oeding L, Ottaviani G (2013) Eigenvectors of tensors and algorithms for Waring decomposition. J Symb Comput 54:9–35
Paajarvi P, Leblanc J (2004) Skewness maximization for impulsive sources in blind deconvolution. In: Proceedings of the 6th Nordic signal processing symposium—NORSIG, Espoo, Finland
Peña D, Prieto FJ (2001) Cluster identification using projections. J Am Stat Assoc 96:1433–1445
Rao CR, Rao MB (1998) Matrix algebra and its applications to statistics and econometrics. World Scientific Co. Pte. Ltd, Singapore
Sakata T, Sumi T, Miyazaki M (2016) Algebraic and computational aspects of real tensor ranks. Springer, Tokyo
Scrucca L (2010) Dimension reduction for model-based clustering. Stat Comput 20:471–484
Scrucca L (2014) Graphical tools for model-based mixture discriminant analysis. Adv Data Anal Classif 8:147–165
Tarpey T, Yun D, Petkova E (2009) Model misspecification: Finite mixture or homogeneous? Stat Model 8:199–218
Tyler DE, Critchley F, Dümbgen L, Oja H (2009) Invariant co-ordinate selection. J R Stat Soc B 71:1–27 (with discussion)
Acknowledgements
The author would like to thank an anonymous Associate Editor and two anonymous Reviewers for their care in handling this paper and for their precious comments which greatly helped in increasing its quality.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The Kronecker product and the vectorization operator The Kronecker (or tensor) product acts on matrices \(A=\left\{ a_{ij}\right\} \in {\mathbb {R}} ^{p}\times {\mathbb {R}}^{q}\) and \(B=\left\{ b_{ij}\right\} \in {\mathbb {R}} ^{m}\times {\mathbb {R}}^{n}\) by obtaining a block matrix \(A\otimes B\in {\mathbb {R}}^{pm}\times {\mathbb {R}}^{qn}\) whose i, j-th block is the matrix \( a_{ij}B\) (see, for example, Rao and Rao 1998, p 193). As an example, consider the matrices
The Kronecker product \(A\otimes B\) is
We shall recall some fundamental properties of the Kronecker product (see, for example, Rao and Rao 1998, pp 194–201).
-
1.
The Kronecker product is associative: \(\left( A\otimes B\right) \otimes C=A\otimes \left( B\otimes C\right) =A\otimes B\otimes C\).
-
2.
If matrices A, B, C and D are of appropriate size, then \( \left( A\otimes B\right) \left( C\otimes D\right) =AC\otimes BD\).
-
3.
If the inverses \(A^{-1}\) and \(B^{-1}\) of matrices A and B exist then \(\left( A\otimes B\right) ^{-1}=A^{-1}\otimes B^{-1}\).
-
4.
If a and b are two vectors, then \(ab^{\top }\), \(a\otimes b^{\top }\) and \(b^{\top }\otimes a\) denote the same matrix.
The matrix vectorization (also known as vec operator or just vectorization) converts a matrix \(A=\left\{ a_{ij}\right\} \in {\mathbb {R}}^{p}\times \mathbb { R}^{q}\) into a pq-dimensional vector \(A^{V}=\text{ vec }\left( A\right) =\left( \alpha _{1}, \ldots ,\alpha _{pq}\right) ^{\top }\), where \(a_{ij}=\alpha _{\left( j-1\right) p+i}\), by stacking its columns on top of each other, i.e.
We shall now recall some fundamental properties of the matrix vectorization (see, for example, Rao and Rao 1998, pp 194–201).
-
1.
For any two \(m\times n\) matrices A and B it holds true that \( \text{ tr }\left( A^{\top }B\right) =\text{ vec }^{\top }(B)\text{ vec }(A)\).
-
2.
If \(A\in {\mathbb {R}}^{p}\times {\mathbb {R}}^{q},\;B\in {\mathbb {R}} ^{q}\times {\mathbb {R}}^{r}\) and \(C\in {\mathbb {R}}^{r}\times {\mathbb {R}}^{s}\) then \(\text{ vec }\left( ABC\right) =\left( C^{\top }\otimes A\right) \text{ vec }\left( B\right) \).
-
3.
For any \(m\times n\) matrix A it holds true that \(\text{ vec }\left( A^{\top }\right) =A^{\top V}\) and \(\left[ \text{ vec }\left( A\right) \right] ^{\top }=A^{V\top }\).
-
4.
If A is an invertible matrix, then \(\text{ vec }\left( A^{-1}\right) ^{V}=A^{-V}\) and \(\left[ \text{ vec }\left( A^{-1}\right) \right] ^{\top }=A^{-V\top }\).
Rank of \({\mathcal {A}}\) We shall first prove by contradiction that the tensor rank of \({\mathcal {A}}\) is three. If the tensor rank of \({\mathcal {A}}\) were 2 its unfolding might be represented as
for some 3-dimensional real vectors \(u_{1}\), \(v_{1}\), \(w_{1}\), \(u_{2}\), \( v_{2}\), \(w_{2}\). As a direct consequence, we would have
The rank of \({\mathcal {A}}_{\left( 1\right) }{\mathcal {A}}_{\left( 1\right) }^{\top }\) would then be two, since the quadratic form \(a^{\top }{\mathcal {A}} _{\left( 1\right) }{\mathcal {A}}_{\left( 1\right) }^{\top }a\) would be zero for any 3-dimensional vector orthogonal to both \(v_{1}\) and \(v_{2}\). This would lead to a contradiction, since \({\mathcal {A}}_{\left( 1\right) }\mathcal { A}_{\left( 1\right) }^{\top }=2I_{3}\), where \(I_{3}\) is the \(3\times 3\) identity matrix and hence of full rank. In a similar way we can prove that the tensor rank of \({\mathcal {A}}\) is not one. It is neither zero, since \( {\mathcal {A}}\) is not a null tensor. Hence it must be greater or equal than three. We shall prove that it is three using the vectors \(e_{1}=\left( 1,0,0\right) ^{\top }\), \(e_{2}=\left( 0,1,0\right) ^{\top }\) and \( e_{3}=\left( 0,0,1\right) ^{\top }\): elementary matrix algebra shows that
We shall now prove by contradiction that the symmetric tensor rank of \( {\mathcal {A}}\) is four. If the symmetric tensor rank of \({\mathcal {A}}\) were three its unfolding might be represented as
for some 3-dimensional real vectors \(b_{1}\), \(b_{2}\) and \(b_{3}\). These vectors are linearly independent, otherwise there would exist a 3-dimensional real vector v orthogonal to all of them, making the quadratic form
equal to zero. This would lead to a contradiction, since \({\mathcal {A}} _{\left( 1\right) }{\mathcal {A}}_{\left( 1\right) }^{\top }\) is proportional to the \(3\times 3\) identity matrix, and therefore positive definite. Let u and w be a 9-dimensional and a 3-dimensional real vectors whose first components are one while the others are zero, so that \({\mathcal {A}}_{\left( 1\right) }u=\left( 0,0,0\right) ^{\top }\) and \(u=w\otimes w\otimes 1\). Apply now standard properties of the Kronecker product to obtain
Since \(b_{1}\), \(b_{2}\) and \(b_{3}\) are linearly independent at least one of the scalar products \(b_{1}^{\top }w\), \(b_{2}^{\top }w\) and \(b_{3}^{\top }w\) is different from zero. This would lead to a contradiction, since the null vector \(\left( 0,0,0\right) ^{\top }\) may be obtained as a linear combination of the linearly independent vectors \(b_{1}\), \(b_{2}\) and \(b_{3}\) only if all the coefficients \(b_{1}^{\top }w\), \(b_{2}^{\top }w\) and \( b_{3}^{\top }w\) of the linear combination were zero. We conclude that the symmetric tensor rank of \({\mathcal {A}}\) cannot be three. In a similar way we can prove that the symmetric tensor rank of \({\mathcal {A}}\) is not smaller than three. We shall prove that it is four using the vectors \(a_{1}=\left( 1,1,1\right) ^{\top }\), \(a_{2}=\left( 1,-1,-1\right) ^{\top }\), \( a_{3}=\left( -1,1,-1\right) ^{\top }\)and \(a_{4}=\left( -1,-1,1\right) ^{\top }\): elementary matrix algebra shows that
Proof of Theorem 1
Let \(\mu _{i,j}\) (\(\overline{\mu }_{i,j}\)) be the j-th moment (centered moment) matrix of the i-th mixture’s component, with cdf \(F_{i}\) and weight \(\pi _{i}\), for \(j=1,2,3\) and \( i=1, \ldots ,g\). Also, let \(\mu \) be the expected value of m: \(\mu =\mu _{1,1}\pi _{1}+ \cdots +\mu _{g,1}\pi _{g}\). Finally, let \(\lambda _{i}=\mu _{i,1}\pi _{i}-\mu \), for \(i=1, \ldots ,g\). By definition, the expected value of \( \varXi \) and the third cumulant matrix of M are
The tensor product \(\left( y+c\right) \otimes \left( y+c\right) ^{\top }\otimes \left( y+c\right) \) might be decomposed into
(Loperfido 2013). The third moment matrix about \(\mu \) of a random vector with cdf \(F_{i}\) is \(\mu _{i,3}\left( x-\mu \right) =\mu _{i,3}\left[ \left( x-\mu _{i,1}\right) +\lambda _{i}\right] \). By taking expectations with respect to the cdf \(F_{i}\), after letting \(y=x-\mu _{i,1}\) and \(c=\lambda _{i}\) we obtain another expression for \(\mu _{i,3}\left( x-\mu \right) \):
where \(A^{V}\) denotes the vectorization of the matrix A. By assumption, \( \overline{\mu }_{i,3}\) and \(\overline{\mu }_{i,2}\) equal \(K_{i,3}\) and \( \varOmega \), thus leading to the simplified expression
By definition, the cdf of X is the mixture of the distribution functions \( F_{1}\), ..., \(F_{g}\), with weights \(\pi _{1}\), ..., \(\pi _{g}\). Hence \( K_{3,X}\), i.e. the third cumulant matrix of X, is
It might be simplified into \(K_{3,x}=E\left( \varXi \right) +K_{3,M}\) by noticing that \(E\left( M-\mu \right) =\lambda _{1}\pi _{1}+ \cdots +\lambda _{g}\pi _{g}\) is a null vector and by recalling the definitions of \(E\left( \varXi \right) \) and \(K_{3,M}\).
Proof of Theorem 2
Without loss of generality we can assume that the location vector is a null vector: \(\xi =0_{d}\). Let \(X_{i}\sim SN_{d}\left( 0_{d},\varOmega ,\alpha _{i}\right) \) be a d-dimensional skew-normal random vector with null location vector, scale matrix \(\varOmega \) and shape parameter \(\alpha _{i}\), for \(i=1\), \(\ldots \), g. The first and second moments of \(x_{i}\) are \(E\left( X_{i}\right) =\sqrt{2/\pi }\delta _{i} \) and \(E\left( X_{i}X_{i}^{\top }\right) =\varOmega \), for \(i=1\), \(\ldots \), g. The third moment of \(X_{i}\) is
where \(\varOmega ^{V}\) denotes the vectorization of the matrix \(\varOmega \). It might be simplified into
by recalling that \(\left( A\otimes B\right) \left( C\otimes D\right) =AC\otimes BD\), if matrices A, B, C and D are of appropriate size. The third moment matrix of X is a weighted average of \(\mu _{3,1}\),..., \( \mu _{3,g}\) with weights \(\pi _{1}\),..., \(\pi _{g}\):
where \(\eta =E\left( X\right) =\sqrt{2/\pi }\left( \pi _{1}\delta _{1}+ \cdots +\pi _{g}\delta _{g}\right) \) is the mean of X. A similar argument shows that the second moment of X is just the common scale matrix \(\varOmega \) : \(E\left( XX^{\top }\right) =\pi _{1}E\left( X_{1}X_{1}^{\top }\right) + \cdots +\pi _{g}E\left( X_{g}X_{g}^{\top }\right) =\pi _{1}\varOmega + \cdots +\pi _{g}\varOmega =\varOmega \). The third cumulant of X is the difference of \( E\left( X\otimes X^{\top }\otimes X\right) +2E\left( X\right) \otimes E^{\top }\left( X\right) \otimes E\left( X\right) \) and \(E\left( XX^{\top }\right) \otimes E\left( X\right) +E\left( X\right) \otimes E\left( XX^{\top }\right) +E^{V}\left( XX^{\top }\right) E^{\top }\left( X\right) \). Hence the third cumulant matrix of X might be represented as
The proof is completed by recalling the definition of \(\eta \).
Proof of Theorem 3
We shall first recall two well-known properties of the Kronecker product (see, for example, Rao and Rao 1998, p 197). If A , B and C are three matrices with B and C being of the same size, then \(A\otimes \left( B+C\right) =A\otimes B+A\otimes C\). The Kronecker product is associative, too: \(\left( A\otimes B\right) \otimes C=A\otimes \left( B\otimes C\right) =A\otimes B\otimes C\). The two properties lead to
where x and y are two d-dimensional real vectors. The two properties also lead to
Both identities lead to the following one:
The above identity, together with the definitions of the vectors \(\alpha _{i}=\lambda +\gamma _{i}\) and \(\beta _{i}=\lambda -\gamma _{i}\), implies
We shall now recall another property of the Kronecker product: if a and b are two vectors, then \(ab^{\top }\), \(a\otimes b^{\top }\) and \(b^{\top }\otimes a\) denote the same matrix (see, for example, Rao and Rao 1998, p 199). This property, together with the definitions of \(\varGamma \) and \(\lambda \), leads to
As a direct consequence, the following matrix decomposition holds true:
Finally, we shall recall a third property of the Kronecker product: if A and B are any two matrices then \(\left( A\otimes B\right) ^{\top }=A^{\top }\otimes B^{\top }\) (see, for example, Rao and Rao 1998, page 194). Therefore we have
By assumption, the left-hand side of the above identity equals the matrix unfolding of \({\mathcal {T}}\), and this completes the proof.
Proof of Theorem 4
We shall first prove the theorem for a location mixture of \(g\le d\) weakly symmetric components. By Theorem 1, the third cumulant matrix of X is
where \(\mu _{i}\) and \(\pi _{i}>0\) are the mean and the weight of the i-th mixture’s component, for \(i=1, \ldots ,d\), while \(\mu =\pi _{1}\mu _{1}+ \cdots +\pi _{g}\mu _{g}\) is the mean of X. The best linear discriminant subspace \( \left\{ \mu _{1}-\mu , \ldots ,\mu _{g}-\mu \right\} \) is spanned by the columns of the matrix \(H=\left( \eta _{1}, \ldots ,\eta _{g}\right) \), where \(\eta _{i}=\pi _{i}^{1/3}\left( \mu _{i}-\mu \right) \), for \(i=1, \ldots ,g\). Also, let \(Z=\left( Z_{1}, \ldots ,Z_{d}\right) ^{\top }=\varSigma ^{-1/2}\left( X-\mu \right) \) be the standardized version of X, where \(\varSigma ^{-1/2}\) is the symmetric, positive definite square root of the concentration matrix \(\varSigma ^{-1}\), that is the inverse of the covariance matrix \(\varSigma \) of X. The third cumulant of Z is
where \(\gamma _{i}=\varSigma ^{-1/2}\eta _{i}\), for \(i=1, \ldots ,g\). Since \(\varSigma ^{-1/2}\) is a full-rank matrix, the columns of the matrix \(\varGamma =\left( \gamma _{1}, \ldots ,\gamma _{g}\right) \) span the best linear discriminant subspace, too. We shall denote its rank by \(h\le g-1\). The skewness of the linear combination \(c^{\top }Z\) is \(\beta _{1}\left( c^{\top }Z\right) =cK_{3,Z}^{\top }\left( c\otimes c\right) /\left\| c\right\| \), where \( \left\| c\right\| \) is the euclidean norm of the d-dimensional, real vector c. Hence the vector c which maximizes the skewness of \(c^{\top }Z\) is proportional to the dominant tensor eigenvector of \({\mathcal {K}}_{3,Z}\), that is the unit-norm vector \(v_{1}\) which satisfies \(K_{3,z}^{\top }\left( v_{1}\otimes v_{1}\right) =\lambda _{1}v_{v}\) for the highest possible value of the scalar \(\lambda _{1}\). We shall now recall two fundamental properties of the Kronecker product. It is associative: \(\left( A\otimes B\right) \otimes C=A\otimes \left( B\otimes C\right) =A\otimes B\otimes C\); if matrices A, B, C and D are of appropriate size, then \(\left( A\otimes B\right) \left( C\otimes D\right) =AC\otimes BD\). These properties, together with the identities
lead to a convenient representation for \(v_{1}\):
It follows that \(v_{1}\) is a linear combination of \(\gamma _{1}, \ldots , \gamma _{g}\) and hence belongs to the best linear discriminant subspace. It also follows that \(Y_{1}=v_{1}^{\top }\varSigma ^{-1/2}X\) is the linear projection of X with maximal skewness, and therefore \(v_{1}^{\top }\varSigma ^{-1/2}\) might be taken as the first row of A. The linear projections \( Y_{2}\), ..., \(Y_{h}\) might be found using similar arguments.
We shall now consider shape mixtures with skew-normal components, and use the same notation as in Theorem 2, which allows us to represent the third cumulant matrix of X as
We have \(\mu _{i}=\xi +\sqrt{2/\pi }\delta _{i}\) and \(\mu _{i}-\mu =\sqrt{ 2/\pi }\left( \delta _{i}-\pi _{1}\delta _{1}- \cdots -\pi _{g}\delta _{g}\right) \), for \(i=1, \ldots ,g\). Hence the best linear discriminant subspace \(\left\{ \mu _{1}-\mu , \ldots ,\mu _{g}-\mu \right\} \) is spanned by the columns of the matrix \(H=\left( \eta _{1}, \ldots ,\eta _{g+1}\right) \), where
for \(i=2, \ldots ,d\). An argument similar to the one used for location mixtures completes the proof.
Proof of Theorem 5
Without loss of generality we shall assume that the random vector X is centred : \(E\left( X\right) =0_{d}\). Then its third cumulant matrix coincides with its third moment matrix \(M_{3,X}\), and might be represented as
where \(\eta _{1}, \ldots , \eta _{d}\) are d-dimensional, real vectors. Clearly, if the tensor rank of \(M_{3,X}\) is \(k\le d\), the last \(\eta _{k+1} , \ldots , \eta _{d}\) are null vectors, for \(k=0,\ldots ,d-1\). The third moment matrix of \(Y=\left( Y_{1},\ldots ,Y_{d}\right) ^{\top }=BX\), where B is a \( d\times d\) real matrix of full rank, is
The matrix B might be chosen to make the product BH a diagonal matrix: \( BH=diag\left( \psi _{1},\ldots ,\psi _{d}\right) \), where \(H=\left\{ \eta _{1},\ldots ,\eta _{d}\right\} \) is the \(d\times d\) matrix whose columns are \( \eta _{1}\),..., \(\eta _{d}\). The third moment of Y would then be a third-order, symmetric and diagonal tensor:
where \(e_{i}\) is the i-th column of the d-dimensional identity matrix and \(\psi _{i}=E\left( Y_{i}^{3}\right) \), for \(i=1,\ldots ,d\). Let \(W=\left( W_{1},\ldots ,W_{d}\right) ^{\top }=CY\), where C is again a \(d\times d\) real matrix of full rank. The third moment of the i-th component of W, denoted by \(\omega _{i}=E\left( W_{i}^{3}\right) \), is just the sum of all products \(c_{ijh}E\left( Y_{i}Y_{j}Y_{h}\right) \). The expectation \(E\left( Y_{i}Y_{j}Y_{h}\right) \) is zero whenever at least one index differs from the remaining ones, so that \(\omega _{i}=c_{i1}^{3}\psi _{1}+ \cdots +c_{id}^{3}\psi _{d}\), for \(i=1, \ldots ,d\). In matrix notation, we might write \(\omega =\left( C\circ C\circ C\right) \psi \), where \(\omega =\left( \omega _{1}, \ldots ,\omega _{d}\right) ^{\top }\), \(\psi =\left( \psi _{1}, \ldots ,\psi _{d}\right) ^{\top }\) and “\(\circ \)” denotes the Hadamard, or elementwise, product (see, for example, Rao and Rao 1998, p 203). The theorem is trivially true when \(\psi \) is a null vector, which happens only when the third moment of X is a null matrix. Otherwise, the entries of C might be chosen to make \(\omega \) the d-dimensional real vector whose first entry is one while all others are zero: \(\omega =\left( 1,0, \ldots ,0\right) ^{\top }\in {\mathbb {R}}^{d}\). Then the matrix A is might be obtained by removing the first row of the matrix product CB.
Rights and permissions
About this article
Cite this article
Loperfido, N. Finite mixtures, projection pursuit and tensor rank: a triangulation. Adv Data Anal Classif 13, 145–173 (2019). https://doi.org/10.1007/s11634-018-0336-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-018-0336-z