Finite mixtures, projection pursuit and tensor rank: a triangulation

Loperfido, Nicola

doi:10.1007/s11634-018-0336-z

Finite mixtures, projection pursuit and tensor rank: a triangulation

Regular Article
Published: 06 September 2018

Volume 13, pages 145–173, (2019)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Nicola Loperfido¹

318 Accesses
Explore all metrics

Abstract

Finite mixtures of multivariate distributions play a fundamental role in model-based clustering. However, they pose several problems, especially in the presence of many irrelevant variables. Dimension reduction methods, such as projection pursuit, are commonly used to address these problems. In this paper, we use skewness-maximizing projections to recover the subspace which optimally separates the cluster means. Skewness might then be removed in order to search for other potentially interesting data structures or to perform skewness-sensitive statistical analyses, such as the Hotelling’s $ T^{2}$ test. Our approach is algebraic in nature and deals with the symmetric tensor rank of the third multivariate cumulant. We also derive closed-form expressions for the symmetric tensor rank of the third cumulants of several multivariate mixture models, including mixtures of skew-normal distributions and mixtures of two symmetric components with proportional covariance matrices. Theoretical results in this paper shed some light on the connection between the estimated number of mixture components and their skewness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tensor eigenvectors for projection pursuit

Article 11 December 2023

Data projections by skewness maximization under scale mixtures of skew-normal vectors

Article 10 March 2020

A dual subspace parsimonious mixture of matrix normal distributions

Article 16 November 2022

References

Adcock C, Eling M, Loperfido N (2015) Skewed distributions in finance and actuarial science: a review. Eur J Finance 21:1253–1281
Article Google Scholar
Ambagaspitiya RS (1999) On the distributions of two classes of correlated aggregate claims. Insur Math Econ 24:301–308
Article MathSciNet MATH Google Scholar
Arellano-Valle RB, Genton MG, Loschi RH (2009) Shape mixtures of multivariate skew-normal distributions. J Multivar Anal 100:91–101
Article MathSciNet MATH Google Scholar
Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew $t$ distribution. J R Stat Soc B 65:367–389
Article MathSciNet MATH Google Scholar
Azzalini A, Genton MG (2008) Robust likelihood methods based on the skew-t and related distributions. Int Stat Rev 76:106–129
Article MATH Google Scholar
Bartoletti S, Loperfido N (2010) Modelling air pollution data by the skew-normal distribution. Stoch Environ Res Risk Assess 24:513–517
Article Google Scholar
Blough DK (1989) Multivariate symmetry and asymmetry. Inst Stat Math 24:513–517
Google Scholar
Bolton RJ, Krzanowski WJ (2003) Projection pursuit clustering for exploratory data analysis. J Comput Graph Stat 12:121–142
Article MathSciNet Google Scholar
Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78
Article MathSciNet MATH Google Scholar
Branco MD, Dey DK (2001) A general class of skew-elliptical distributions. J Multivar Anal 79:99–113
Article MathSciNet MATH Google Scholar
Comon P (2014) Tensors: a brief introduction. IEEE Sig Process Mag Inst Electr Electron Eng 31:44–53
Google Scholar
Comon P, Golub G, Lim L-H, Mourrain B (2008) Symmetric tensors and symmetric tensor rank. SIAM J Matrix Anal Appl 30:1254–1279
Article MathSciNet MATH Google Scholar
Fraley C, Raftery Adrian E, Scrucca L (2017) mclust: Gaussian mixture modelling for model-based clustering, classification, and density estimation. https://CRAN.R-project.org/package=mclust. R package version 5.3
Franceschini C, Loperfido N (2017a) MaxSkew: skewness-based projection pursuit. https://CRAN.R-project.org/package=MaxSkew. R package version 1.1
Franceschini C, Loperfido N (2017b) MultiSkew: measures, tests and removes multivariate skewness. https://CRAN.R-project.org/package=MultiSkew. R package version 1.1.1
Friedman J (1987) Exploratory projection pursuit. J. Am Stat Assoc 82:249–266
Article MathSciNet MATH Google Scholar
Friedman JH, Tukey JW (1974) A projection pursuit algorithm for exploratory data analysis. IEEE Trans Comput Ser C 23:881–890
Article MATH Google Scholar
Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew$-t$ distributions. Biostatistics 11:317–336
Article Google Scholar
Grasman RPPP, Huizenga HM, Geurts HM (2010) Departure from normality in multivariate normative comparison: the Cramé r alternative for Hotelling’s $T^{2}$. Neuropsychologia 48:1510–1516
Article Google Scholar
Hennig C (2004) Asymmetric linear dimension reduction for classification. J Comput Graph Stat 13:930–945
Article MathSciNet Google Scholar
Hennig C (2005) A method for visual cluster validation. In: Weihs C, Gaul W (eds) Classification—the ubiquitous challenge. Springer, Heidelberg, pp 153–160
Chapter Google Scholar
Hui G, Lindsay BG (2010) Projection pursuit via white noise matrices. Sankhya B 72:123–153
Article MathSciNet MATH Google Scholar
Jondeau E, Rockinger M (2006) Optimal portfolio allocation under higher moments. Eur Financ Manag 12:29–55
Article Google Scholar
Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41:577–590
Article MathSciNet MATH Google Scholar
Kim H-M, Mallick BK (2003) Moments of random vectors with skew $t$ distribution and their quadratic forms. Stat Probab Lett 63:417–423
Article MathSciNet MATH Google Scholar
Landsberg JM, Michalek M (2017) On the geometry of border rank decompositions for matrix multiplication and other tensors with symmetry. SIAM J Appl Algebra Geom 1:2–19
Article MathSciNet MATH Google Scholar
Lee S, McLachlan GJ (2013) Model-based clustering and classification with non-normal mixture distributions. Stat Methods Appl 22:427–454 (with discussion)
Article MathSciNet MATH Google Scholar
Lin XS (2004) Compound distributions. In: Encyclopedia of actuarial science, vol 1. Wiley, pp 314–317
Lindsay BG, Yao W (2012) Fisher information matrix: a tool for dimension reduction, projection pursuit, independent component analysis, and more. Can J Stat 40:712–730
Article MathSciNet MATH Google Scholar
Loperfido N (2004) Generalized skew-normal distributions. Skew-elliptical distributions and their applications: a journey beyond normality. CRC, Boca Raton, pp 65–80
Google Scholar
Loperfido N (2013) Skewness and the linear discriminant function. Stat Probab Lett 83:93–99
Article MathSciNet MATH Google Scholar
Loperfido N (2014) Linear transformations to symmetry. J Multivar Anal 129:186–192
Article MathSciNet MATH Google Scholar
Loperfido N (2015a) Vector-valued skewness for model-based clustering. Stat Probab Lett 99:230–237
Article MathSciNet MATH Google Scholar
Loperfido N (2015b) Singular value decomposition of the third multivariate moment. Linear Algebra Appl 473:202–216
Article MathSciNet MATH Google Scholar
Loperfido N (2018) Skewness-based projection pursuit: a computational approach. Comput Stat Data Anal 120:42–57
Article MathSciNet MATH Google Scholar
Loperfido N, Mazur S, Podgorski K (2018) Third cumulant for multivariate aggregate claims models. Scand Actuar J 2018:109–128
Article MathSciNet MATH Google Scholar
Mardia K (1970) Measures of multivariate skewness and kurtosis with applications. Biometrika 57:519–530
Article MathSciNet MATH Google Scholar
McNicholas PD (2016) Model-based clustering. J Class 33:331–373
Article MathSciNet MATH Google Scholar
Melnykov V, Maitra R (2010) Finite mixture models and model-based clustering. Stat Surv 4:80–116
Article MathSciNet MATH Google Scholar
Miettinen J, Taskinen S, Nordhausen K, Oja H (2015) Fourth moments and independent component analysis. Stat Sci 3:372–390
Article MathSciNet MATH Google Scholar
Mòri T, Rohatgi V, Székely G (1993) On multivariate skewness and kurtosis. Theory Probab Appl 38:547–551
Article MathSciNet MATH Google Scholar
Morris K, McNicholas PD, Scrucca L (2013) Dimension reduction for model-based clustering via mixtures of multivariate t-distributions. Adv Data Anal Classif 7:321–338
Article MathSciNet MATH Google Scholar
Oeding L, Ottaviani G (2013) Eigenvectors of tensors and algorithms for Waring decomposition. J Symb Comput 54:9–35
Article MathSciNet MATH Google Scholar
Paajarvi P, Leblanc J (2004) Skewness maximization for impulsive sources in blind deconvolution. In: Proceedings of the 6th Nordic signal processing symposium—NORSIG, Espoo, Finland
Peña D, Prieto FJ (2001) Cluster identification using projections. J Am Stat Assoc 96:1433–1445
Article MathSciNet MATH Google Scholar
Rao CR, Rao MB (1998) Matrix algebra and its applications to statistics and econometrics. World Scientific Co. Pte. Ltd, Singapore
Book MATH Google Scholar
Sakata T, Sumi T, Miyazaki M (2016) Algebraic and computational aspects of real tensor ranks. Springer, Tokyo
Book MATH Google Scholar
Scrucca L (2010) Dimension reduction for model-based clustering. Stat Comput 20:471–484
Article MathSciNet Google Scholar
Scrucca L (2014) Graphical tools for model-based mixture discriminant analysis. Adv Data Anal Classif 8:147–165
Article MathSciNet Google Scholar
Tarpey T, Yun D, Petkova E (2009) Model misspecification: Finite mixture or homogeneous? Stat Model 8:199–218
Article MathSciNet Google Scholar
Tyler DE, Critchley F, Dümbgen L, Oja H (2009) Invariant co-ordinate selection. J R Stat Soc B 71:1–27 (with discussion)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The author would like to thank an anonymous Associate Editor and two anonymous Reviewers for their care in handling this paper and for their precious comments which greatly helped in increasing its quality.

Author information

Authors and Affiliations

Dipartimento di Economia, Società e Politica, Università degli Studi di Urbino “Carlo Bo”, Via Saffi 42, Urbino, PU, Italy
Nicola Loperfido

Authors

Nicola Loperfido
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicola Loperfido.

Appendix

The Kronecker product and the vectorization operator The Kronecker (or tensor) product acts on matrices $A=\left\{ a_{ij}\right\} \in {\mathbb {R}} ^{p}\times {\mathbb {R}}^{q}$ and $B=\left\{ b_{ij}\right\} \in {\mathbb {R}} ^{m}\times {\mathbb {R}}^{n}$ by obtaining a block matrix $A\otimes B\in {\mathbb {R}}^{pm}\times {\mathbb {R}}^{qn}$ whose i, j-th block is the matrix $ a_{ij}B$ (see, for example, Rao and Rao 1998, p 193). As an example, consider the matrices

$$\begin{aligned} A=\left( \begin{array}{cc} 2 &{}\quad 1 \\ 4 &{}\quad 3 \end{array} \right) \quad \text { and }\quad B=\left( \begin{array}{lll} 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 2 \end{array} \right) . \end{aligned}$$

The Kronecker product $A\otimes B$ is

$$\begin{aligned} \left( \begin{array}{rrrrrr} 2 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 2 &{} 4 &{} 0 &{} 1 &{} 2 \\ 4 &{} 0 &{} 0 &{} 3 &{} 0 &{} 0 \\ 0 &{} 4 &{} 8 &{} 0 &{} 3 &{} 6 \end{array} \right) . \end{aligned}$$

We shall recall some fundamental properties of the Kronecker product (see, for example, Rao and Rao 1998, pp 194–201).

1.
The Kronecker product is associative: $\left( A\otimes B\right) \otimes C=A\otimes \left( B\otimes C\right) =A\otimes B\otimes C$.
2.
If matrices A, B, C and D are of appropriate size, then $ \left( A\otimes B\right) \left( C\otimes D\right) =AC\otimes BD$.
3.
If the inverses $A^{-1}$ and $B^{-1}$ of matrices A and B exist then $\left( A\otimes B\right) ^{-1}=A^{-1}\otimes B^{-1}$.
4.
If a and b are two vectors, then $ab^{\top }$, $a\otimes b^{\top }$ and $b^{\top }\otimes a$ denote the same matrix.

The matrix vectorization (also known as vec operator or just vectorization) converts a matrix $A=\left\{ a_{ij}\right\} \in {\mathbb {R}}^{p}\times \mathbb { R}^{q}$ into a pq-dimensional vector $A^{V}=\text{ vec }\left( A\right) =\left( \alpha _{1}, \ldots ,\alpha _{pq}\right) ^{\top }$, where $a_{ij}=\alpha _{\left( j-1\right) p+i}$, by stacking its columns on top of each other, i.e.

$$\begin{aligned} A=\left( \begin{array}{cc} 2 &{} 1 \\ 4 &{} 3 \end{array} \right) \quad \text { and }\quad A^{V}=\left( \begin{array}{c} 2 \\ 4 \\ 1 \\ 3 \end{array} \right) . \end{aligned}$$

We shall now recall some fundamental properties of the matrix vectorization (see, for example, Rao and Rao 1998, pp 194–201).

1.
For any two $m\times n$ matrices A and B it holds true that $ \text{ tr }\left( A^{\top }B\right) =\text{ vec }^{\top }(B)\text{ vec }(A)$.
2.
If $A\in {\mathbb {R}}^{p}\times {\mathbb {R}}^{q},\;B\in {\mathbb {R}} ^{q}\times {\mathbb {R}}^{r}$ and $C\in {\mathbb {R}}^{r}\times {\mathbb {R}}^{s}$ then $\text{ vec }\left( ABC\right) =\left( C^{\top }\otimes A\right) \text{ vec }\left( B\right) $.
3.
For any $m\times n$ matrix A it holds true that $\text{ vec }\left( A^{\top }\right) =A^{\top V}$ and $\left[ \text{ vec }\left( A\right) \right] ^{\top }=A^{V\top }$.
4.
If A is an invertible matrix, then $\text{ vec }\left( A^{-1}\right) ^{V}=A^{-V}$ and $\left[ \text{ vec }\left( A^{-1}\right) \right] ^{\top }=A^{-V\top }$.

Rank of ${\mathcal {A}}$ We shall first prove by contradiction that the tensor rank of ${\mathcal {A}}$ is three. If the tensor rank of ${\mathcal {A}}$ were 2 its unfolding might be represented as

$$\begin{aligned} {\mathcal {A}}_{\left( 1\right) }=u_{1}^{\top }\otimes v_{1}\otimes w_{1}^{\top }+u_{2}^{\top }\otimes v_{2}\otimes w_{2}^{\top } \end{aligned}$$

for some 3-dimensional real vectors $u_{1}$, $v_{1}$, $w_{1}$, $u_{2}$, $ v_{2}$, $w_{2}$. As a direct consequence, we would have

$$\begin{aligned} {\mathcal {A}}_{\left( 1\right) }{\mathcal {A}}_{\left( 1\right) }^{\top }= & {} \left( u_{1}^{\top }\otimes v_{1}\otimes w_{1}^{\top }+u_{2}^{\top }\otimes v_{2}\otimes w_{2}^{\top }\right) \left( u_{1}\otimes v_{1}^{\top }\otimes w_{1}+u_{2}\otimes v_{2}^{\top }\otimes w_{2}\right) \\= & {} \left( u_{1}^{\top }u_{1}\right) \left( w_{1}^{\top }w_{1}\right) v_{1}v_{1}^{\top }+\left( u_{1}^{\top }u_{2}\right) \left( w_{1}^{\top }w_{2}\right) \left[ v_{1}v_{2}^{\top }+v_{2}v_{1}^{\top }\right] \\&\quad +\,\left( u_{2}^{\top }u_{2}\right) \left( w_{2}^{\top }w_{2}\right) v_{2}v_{2}^{\top }. \end{aligned}$$

The rank of ${\mathcal {A}}_{\left( 1\right) }{\mathcal {A}}_{\left( 1\right) }^{\top }$ would then be two, since the quadratic form $a^{\top }{\mathcal {A}} _{\left( 1\right) }{\mathcal {A}}_{\left( 1\right) }^{\top }a$ would be zero for any 3-dimensional vector orthogonal to both $v_{1}$ and $v_{2}$. This would lead to a contradiction, since ${\mathcal {A}}_{\left( 1\right) }\mathcal { A}_{\left( 1\right) }^{\top }=2I_{3}$, where $I_{3}$ is the $3\times 3$ identity matrix and hence of full rank. In a similar way we can prove that the tensor rank of ${\mathcal {A}}$ is not one. It is neither zero, since $ {\mathcal {A}}$ is not a null tensor. Hence it must be greater or equal than three. We shall prove that it is three using the vectors $e_{1}=\left( 1,0,0\right) ^{\top }$, $e_{2}=\left( 0,1,0\right) ^{\top }$ and $ e_{3}=\left( 0,0,1\right) ^{\top }$: elementary matrix algebra shows that

$$\begin{aligned} {\mathcal {A}}_{\left( 1\right) }=e_{1}^{\top }\otimes e_{2}\otimes e_{3}^{\top }+e_{2}^{\top }\otimes e_{1}\otimes e_{3}^{\top }+e_{3}^{\top }\otimes e_{2}\otimes e_{1}^{\top }. \end{aligned}$$

We shall now prove by contradiction that the symmetric tensor rank of $ {\mathcal {A}}$ is four. If the symmetric tensor rank of ${\mathcal {A}}$ were three its unfolding might be represented as

$$\begin{aligned} {\mathcal {A}}_{\left( 1\right) }=\underset{i=1}{\overset{3}{\sum }}b_{i}^{\top }\otimes b_{i}^{\top }\otimes b_{i} \end{aligned}$$

for some 3-dimensional real vectors $b_{1}$, $b_{2}$ and $b_{3}$. These vectors are linearly independent, otherwise there would exist a 3-dimensional real vector v orthogonal to all of them, making the quadratic form

$$\begin{aligned} v^{\top }{\mathcal {A}}_{\left( 1\right) }{\mathcal {A}}_{\left( 1\right) }^{\top }v=\underset{i=1}{\overset{3}{\sum }}\underset{j=1}{\overset{3}{\sum }} \left( b_{i}^{\top }b_{j}\right) ^{2}\left( v^{\top }b_{j}\right) \left( b_{i}^{\top }v\right) \end{aligned}$$

equal to zero. This would lead to a contradiction, since ${\mathcal {A}} _{\left( 1\right) }{\mathcal {A}}_{\left( 1\right) }^{\top }$ is proportional to the $3\times 3$ identity matrix, and therefore positive definite. Let u and w be a 9-dimensional and a 3-dimensional real vectors whose first components are one while the others are zero, so that ${\mathcal {A}}_{\left( 1\right) }u=\left( 0,0,0\right) ^{\top }$ and $u=w\otimes w\otimes 1$. Apply now standard properties of the Kronecker product to obtain

$$\begin{aligned} \left( \underset{i=1}{\overset{3}{\sum }}b_{i}^{\top }\otimes b_{i}^{\top }\otimes b_{i}\right) \left( w\otimes w\otimes 1\right) =\underset{i=1}{ \overset{3}{\sum }}b_{i}^{\top }w\otimes b_{i}^{\top }w\otimes v_{i}= \underset{i=1}{\overset{3}{\sum }}\left( b_{i}^{\top }w\right) ^{2}b=\left( \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right) . \end{aligned}$$

Since $b_{1}$, $b_{2}$ and $b_{3}$ are linearly independent at least one of the scalar products $b_{1}^{\top }w$, $b_{2}^{\top }w$ and $b_{3}^{\top }w$ is different from zero. This would lead to a contradiction, since the null vector $\left( 0,0,0\right) ^{\top }$ may be obtained as a linear combination of the linearly independent vectors $b_{1}$, $b_{2}$ and $b_{3}$ only if all the coefficients $b_{1}^{\top }w$, $b_{2}^{\top }w$ and $ b_{3}^{\top }w$ of the linear combination were zero. We conclude that the symmetric tensor rank of ${\mathcal {A}}$ cannot be three. In a similar way we can prove that the symmetric tensor rank of ${\mathcal {A}}$ is not smaller than three. We shall prove that it is four using the vectors $a_{1}=\left( 1,1,1\right) ^{\top }$, $a_{2}=\left( 1,-1,-1\right) ^{\top }$, $ a_{3}=\left( -1,1,-1\right) ^{\top }$and $a_{4}=\left( -1,-1,1\right) ^{\top }$: elementary matrix algebra shows that

$$\begin{aligned} {\mathcal {A}}_{\left( 1\right) }=\overset{4}{\underset{i=1}{\sum }}\left( 2^{-2/3}a_{i}^{\top }\right) \otimes \left( 2^{-2/3}a_{i}\right) \otimes \left( 2^{-2/3}a_{i}^{\top }\right) . \end{aligned}$$

Proof of Theorem 1

Let $\mu _{i,j}$ ($\overline{\mu }_{i,j}$) be the j-th moment (centered moment) matrix of the i-th mixture’s component, with cdf $F_{i}$ and weight $\pi _{i}$, for $j=1,2,3$ and $ i=1, \ldots ,g$. Also, let $\mu $ be the expected value of m: $\mu =\mu _{1,1}\pi _{1}+ \cdots +\mu _{g,1}\pi _{g}$. Finally, let $\lambda _{i}=\mu _{i,1}\pi _{i}-\mu $, for $i=1, \ldots ,g$. By definition, the expected value of $ \varXi $ and the third cumulant matrix of M are

$$\begin{aligned} E\left( \varXi \right) =K_{3,1}\pi _{1}+ \cdots +K_{3,g}\pi _{g}\text { and } K_{3,M}=\lambda _{1}\otimes \lambda _{1}^{\top }\otimes \lambda _{1}\pi _{1}+ \cdots +\lambda _{g}\otimes \lambda _{g}^{\top }\otimes \lambda _{g}\pi _{g}. \end{aligned}$$

The tensor product $\left( y+c\right) \otimes \left( y+c\right) ^{\top }\otimes \left( y+c\right) $ might be decomposed into

$$\begin{aligned} yy^{\top }\otimes y+yy^{\top }\otimes c+c\otimes yy^{\top }+c\otimes c\otimes y^{\top }+y\otimes y\otimes c^{\top }+y\otimes cc^{\top }+cc^{\top }\otimes y+cc^{\top }\otimes c \end{aligned}$$

(3)

(Loperfido 2013). The third moment matrix about $\mu $ of a random vector with cdf $F_{i}$ is $\mu _{i,3}\left( x-\mu \right) =\mu _{i,3}\left[ \left( x-\mu _{i,1}\right) +\lambda _{i}\right] $. By taking expectations with respect to the cdf $F_{i}$, after letting $y=x-\mu _{i,1}$ and $c=\lambda _{i}$ we obtain another expression for $\mu _{i,3}\left( x-\mu \right) $:

$$\begin{aligned}&\overline{\mu }_{i,3}+\overline{\mu }_{i,2}\otimes \lambda _{i}+\lambda _{i}\otimes \overline{\mu }_{i,2}+\lambda _{i}\otimes \lambda _{i}\otimes \overline{\mu }_{i,1}^{\top }+\overline{\mu }_{i,2}^{V}\otimes \lambda _{i}^{\top }+\,\overline{\mu }_{i,1}\otimes \lambda _{i}\lambda _{i}^{\top }\nonumber \\&\quad +\lambda _{i}\lambda _{i}^{\top }\otimes \overline{\mu }_{i,1}+\lambda _{i}\lambda _{i}^{\top }\otimes \lambda _{i}, \end{aligned}$$

(4)

where $A^{V}$ denotes the vectorization of the matrix A. By assumption, $ \overline{\mu }_{i,3}$ and $\overline{\mu }_{i,2}$ equal $K_{i,3}$ and $ \varOmega $, thus leading to the simplified expression

$$\begin{aligned} \mu _{i,3}\left( X-\mu \right) =\kappa _{i,3}+\varOmega \otimes \lambda _{i}+\lambda _{i}\otimes \varOmega +\varOmega ^{V}\otimes \lambda _{i}^{\top }+\lambda _{i}\lambda _{i}^{\top }\otimes \lambda _{i}. \end{aligned}$$

(5)

By definition, the cdf of X is the mixture of the distribution functions $ F_{1}$, ..., $F_{g}$, with weights $\pi _{1}$, ..., $\pi _{g}$. Hence $ K_{3,X}$, i.e. the third cumulant matrix of X, is

$$\begin{aligned} \sum \limits _{i=1}^{g}K_{i,3}\pi _{i}+\varOmega \otimes \sum \limits _{i=1}^{g}\lambda _{i}\pi _{i}+\sum \limits _{i=1}^{g}\lambda _{i}\pi _{i}\otimes \varOmega +\varOmega ^{V}\otimes \sum \limits _{i=1}^{g}\lambda _{i}^{\top }\pi _{i}+\sum \limits _{i=1}^{g}\lambda _{i}\lambda _{i}^{\top }\otimes \lambda _{i}\pi _{i}. \end{aligned}$$

(6)

It might be simplified into $K_{3,x}=E\left( \varXi \right) +K_{3,M}$ by noticing that $E\left( M-\mu \right) =\lambda _{1}\pi _{1}+ \cdots +\lambda _{g}\pi _{g}$ is a null vector and by recalling the definitions of $E\left( \varXi \right) $ and $K_{3,M}$.

Proof of Theorem 2

Without loss of generality we can assume that the location vector is a null vector: $\xi =0_{d}$. Let $X_{i}\sim SN_{d}\left( 0_{d},\varOmega ,\alpha _{i}\right) $ be a d-dimensional skew-normal random vector with null location vector, scale matrix $\varOmega $ and shape parameter $\alpha _{i}$, for $i=1$, $\ldots $, g. The first and second moments of $x_{i}$ are $E\left( X_{i}\right) =\sqrt{2/\pi }\delta _{i} $ and $E\left( X_{i}X_{i}^{\top }\right) =\varOmega $, for $i=1$, $\ldots $, g. The third moment of $X_{i}$ is

$$\begin{aligned} \mu _{3,i}=\sqrt{2/\pi }\left[ \delta _{i}\otimes \varOmega +\varOmega ^{V}\delta _{i}^{\top }+\left( I_{d}\otimes \delta _{i}\right) \varOmega -\left( I_{d}\otimes \delta _{i}\right) \left( \delta _{i}\otimes \delta _{i}^{\top }\right) \right] , \end{aligned}$$

(7)

where $\varOmega ^{V}$ denotes the vectorization of the matrix $\varOmega $. It might be simplified into

$$\begin{aligned} \mu _{3,i}=\sqrt{2/\pi }\left( \delta _{i}\otimes \varOmega +\varOmega ^{V}\delta _{i}^{\top }+\varOmega \otimes \delta _{i}-\delta _{i}\otimes \delta _{i}^{\top }\otimes \delta _{i}\right) , \end{aligned}$$

(8)

by recalling that $\left( A\otimes B\right) \left( C\otimes D\right) =AC\otimes BD$, if matrices A, B, C and D are of appropriate size. The third moment matrix of X is a weighted average of $\mu _{3,1}$,..., $ \mu _{3,g}$ with weights $\pi _{1}$,..., $\pi _{g}$:

$$\begin{aligned} \mu _{3}=\sum \limits _{i=1}^{g}\pi _{i}\mu _{3,i}=\varOmega \otimes \eta +\eta \otimes \varOmega +\varOmega ^{V}\eta ^{\top }-\sqrt{\frac{2}{\pi }} \sum \limits _{i=1}^{g}\pi _{i}\delta _{i}\otimes \delta _{i}^{\top }\otimes \delta _{i}, \end{aligned}$$

(9)

where $\eta =E\left( X\right) =\sqrt{2/\pi }\left( \pi _{1}\delta _{1}+ \cdots +\pi _{g}\delta _{g}\right) $ is the mean of X. A similar argument shows that the second moment of X is just the common scale matrix $\varOmega $ : $E\left( XX^{\top }\right) =\pi _{1}E\left( X_{1}X_{1}^{\top }\right) + \cdots +\pi _{g}E\left( X_{g}X_{g}^{\top }\right) =\pi _{1}\varOmega + \cdots +\pi _{g}\varOmega =\varOmega $. The third cumulant of X is the difference of $ E\left( X\otimes X^{\top }\otimes X\right) +2E\left( X\right) \otimes E^{\top }\left( X\right) \otimes E\left( X\right) $ and $E\left( XX^{\top }\right) \otimes E\left( X\right) +E\left( X\right) \otimes E\left( XX^{\top }\right) +E^{V}\left( XX^{\top }\right) E^{\top }\left( X\right) $. Hence the third cumulant matrix of X might be represented as

$$\begin{aligned} \mu _{3}-\varOmega \otimes \eta -\eta \otimes \varOmega -\varOmega ^{V}\eta ^{\top }+2\eta \otimes \eta ^{\top }\otimes \eta =2\eta \otimes \eta ^{\top }\otimes \eta -\sqrt{\frac{2}{\pi }}\sum \limits _{i=1}^{g}\pi _{i}\delta _{i}\otimes \delta _{i}^{\top }\otimes \delta _{i}. \end{aligned}$$

(10)

The proof is completed by recalling the definition of $\eta $.

Proof of Theorem 3

We shall first recall two well-known properties of the Kronecker product (see, for example, Rao and Rao 1998, p 197). If A , B and C are three matrices with B and C being of the same size, then $A\otimes \left( B+C\right) =A\otimes B+A\otimes C$. The Kronecker product is associative, too: $\left( A\otimes B\right) \otimes C=A\otimes \left( B\otimes C\right) =A\otimes B\otimes C$. The two properties lead to

$$\begin{aligned}&\left( x+y\right) \otimes \left( x+y\right) ^{\top }\otimes \left( x+y\right) =x\otimes x^{\top }\otimes x+x\otimes x^{\top }\otimes y+x\otimes y^{\top }\otimes x \\&\quad +\,x\otimes y^{\top }\otimes y+y\otimes x^{\top }\otimes x+y\otimes x^{\top }\otimes y+y\otimes y^{\top }\otimes x+y\otimes y^{\top }\otimes y, \end{aligned}$$

where x and y are two d-dimensional real vectors. The two properties also lead to

$$\begin{aligned}&\left( x-y\right) \otimes \left( x-y\right) ^{\top }\otimes \left( x-y\right) =x\otimes x^{\top }\otimes x-x\otimes x^{\top }\otimes y-x\otimes y^{\top }\otimes x \\&\quad +\,x\otimes y^{\top }\otimes y-y\otimes x^{\top }\otimes x+y\otimes x^{\top }\otimes y+y\otimes y^{\top }\otimes x-y\otimes y^{\top }\otimes y. \end{aligned}$$

Both identities lead to the following one:

$$\begin{aligned}&\left( x+y\right) \otimes \left( x+y\right) ^{\top }\otimes \left( x+y\right) +\left( x-y\right) \otimes \left( x-y\right) ^{\top }\otimes \left( x-y\right) \\&\quad =2x\otimes x^{\top }\otimes x+2x\otimes y^{\top }\otimes y+2y\otimes x^{\top }\otimes y+2y\otimes y^{\top }\otimes x. \end{aligned}$$

The above identity, together with the definitions of the vectors $\alpha _{i}=\lambda +\gamma _{i}$ and $\beta _{i}=\lambda -\gamma _{i}$, implies

$$\begin{aligned} \frac{1}{2}\left( \alpha _{i}\otimes \alpha _{i}^{\top }\otimes \alpha _{i}+\beta _{i}\otimes \beta _{i}^{\top }\otimes \beta _{i}\right) =\lambda \otimes \lambda ^{\top }\otimes \lambda +\lambda \otimes \gamma _{i}^{\top }\otimes \gamma _{i}+\gamma _{i}\otimes \lambda ^{\top }\otimes \gamma _{i}+\gamma _{i}\otimes \gamma _{i}^{\top }\otimes \lambda . \end{aligned}$$

We shall now recall another property of the Kronecker product: if a and b are two vectors, then $ab^{\top }$, $a\otimes b^{\top }$ and $b^{\top }\otimes a$ denote the same matrix (see, for example, Rao and Rao 1998, p 199). This property, together with the definitions of $\varGamma $ and $\lambda $, leads to

$$\begin{aligned} \varGamma \otimes \lambda= & {} \left( \overset{d}{\underset{i=1}{\sum }}\gamma _{i}^{\top }\gamma _{i}\right) \otimes \lambda =\left( \overset{d}{\underset{ i=1}{\sum }}\gamma _{i}\otimes \gamma _{i}^{\top }\right) \otimes \lambda = \overset{d}{\underset{i=1}{\sum }}\gamma _{i}\otimes \gamma _{i}^{\top }\otimes \lambda , \\ \lambda \otimes \varGamma= & {} \lambda \otimes \overset{d}{\underset{i=1}{\sum }} \gamma _{i}\gamma _{i}^{\top }=\overset{d}{\underset{i=1}{\sum }}\lambda \otimes \gamma _{i}\otimes \gamma _{i}^{\top }=\overset{d}{\underset{i=1}{ \sum }}\lambda \otimes \gamma _{i}^{\top }\otimes \gamma _{i}\text { and} \\ \varGamma ^{V}\lambda ^{\top }= & {} \left( \overset{d}{\underset{i=1}{\sum }} \gamma _{i}\otimes \gamma _{i}\right) \lambda ^{\top }=\overset{d}{\underset{ i=1}{\sum }}\gamma _{i}\otimes \gamma _{i}\otimes \lambda ^{\top }=\overset{d }{\underset{i=1}{\sum }}\gamma _{i}\otimes \lambda ^{\top }\otimes \gamma _{i}. \end{aligned}$$

As a direct consequence, the following matrix decomposition holds true:

$$\begin{aligned} \varGamma \otimes \lambda +\lambda \otimes \varGamma +\varGamma ^{V}\lambda ^{\top }=- \frac{d}{2}\lambda \otimes \lambda ^{\top }\otimes \lambda +\frac{1}{2} \overset{d}{\underset{i=1}{\sum }}\left( \alpha _{i}\otimes \alpha _{i}^{\top }\otimes \alpha _{i}+\beta _{i}\otimes \beta _{i}^{\top }\otimes \beta _{i}\right) . \end{aligned}$$

Finally, we shall recall a third property of the Kronecker product: if A and B are any two matrices then $\left( A\otimes B\right) ^{\top }=A^{\top }\otimes B^{\top }$ (see, for example, Rao and Rao 1998, page 194). Therefore we have

$$\begin{aligned} \varGamma \otimes \lambda ^{\top }+\lambda ^{\top }\otimes \varGamma +\lambda \varGamma ^{V\top }=-\frac{d}{2}\lambda ^{\top }\otimes \lambda \otimes \lambda ^{\top }+\frac{1}{2}\overset{d}{\underset{i=1}{\sum }}\left( \alpha _{i}^{\top }\otimes \alpha _{i}\otimes \alpha _{i}^{\top }+\beta _{i}^{\top }\otimes \beta _{i}\otimes \beta _{i}^{\top }\right) . \end{aligned}$$

By assumption, the left-hand side of the above identity equals the matrix unfolding of ${\mathcal {T}}$, and this completes the proof.

Proof of Theorem 4

We shall first prove the theorem for a location mixture of $g\le d$ weakly symmetric components. By Theorem 1, the third cumulant matrix of X is

$$\begin{aligned} K_{3,X}=\sum \limits _{i=1}^{g}\pi _{i}\left( \mu _{i}-\mu \right) \otimes \left( \mu _{i}-\mu \right) ^{\top }\otimes \left( \mu _{i}-\mu \right) , \end{aligned}$$

(11)

where $\mu _{i}$ and $\pi _{i}>0$ are the mean and the weight of the i-th mixture’s component, for $i=1, \ldots ,d$, while $\mu =\pi _{1}\mu _{1}+ \cdots +\pi _{g}\mu _{g}$ is the mean of X. The best linear discriminant subspace $ \left\{ \mu _{1}-\mu , \ldots ,\mu _{g}-\mu \right\} $ is spanned by the columns of the matrix $H=\left( \eta _{1}, \ldots ,\eta _{g}\right) $, where $\eta _{i}=\pi _{i}^{1/3}\left( \mu _{i}-\mu \right) $, for $i=1, \ldots ,g$. Also, let $Z=\left( Z_{1}, \ldots ,Z_{d}\right) ^{\top }=\varSigma ^{-1/2}\left( X-\mu \right) $ be the standardized version of X, where $\varSigma ^{-1/2}$ is the symmetric, positive definite square root of the concentration matrix $\varSigma ^{-1}$, that is the inverse of the covariance matrix $\varSigma $ of X. The third cumulant of Z is

$$\begin{aligned} K_{3,Z}=\gamma _{1}\otimes \gamma _{1}^{\top }\otimes \gamma _{1}+ \cdots +\gamma _{g}\otimes \gamma _{g}^{\top }\otimes \gamma _{g}, \end{aligned}$$

where $\gamma _{i}=\varSigma ^{-1/2}\eta _{i}$, for $i=1, \ldots ,g$. Since $\varSigma ^{-1/2}$ is a full-rank matrix, the columns of the matrix $\varGamma =\left( \gamma _{1}, \ldots ,\gamma _{g}\right) $ span the best linear discriminant subspace, too. We shall denote its rank by $h\le g-1$. The skewness of the linear combination $c^{\top }Z$ is $\beta _{1}\left( c^{\top }Z\right) =cK_{3,Z}^{\top }\left( c\otimes c\right) /\left\| c\right\| $, where $ \left\| c\right\| $ is the euclidean norm of the d-dimensional, real vector c. Hence the vector c which maximizes the skewness of $c^{\top }Z$ is proportional to the dominant tensor eigenvector of ${\mathcal {K}}_{3,Z}$, that is the unit-norm vector $v_{1}$ which satisfies $K_{3,z}^{\top }\left( v_{1}\otimes v_{1}\right) =\lambda _{1}v_{v}$ for the highest possible value of the scalar $\lambda _{1}$. We shall now recall two fundamental properties of the Kronecker product. It is associative: $\left( A\otimes B\right) \otimes C=A\otimes \left( B\otimes C\right) =A\otimes B\otimes C$; if matrices A, B, C and D are of appropriate size, then $\left( A\otimes B\right) \left( C\otimes D\right) =AC\otimes BD$. These properties, together with the identities

$$\begin{aligned} v_{1}\otimes v_{1}=\left( v_{1}\otimes v_{1}\otimes 1\right) \text { and } \gamma _{i}\otimes \gamma _{i}^{\top }\otimes \gamma _{i}=\gamma _{i}\otimes \gamma _{i}\otimes \gamma _{i}^{\top } \end{aligned}$$

lead to a convenient representation for $v_{1}$:

$$\begin{aligned} v_{1}=\frac{\left( \gamma _{1}^{\top }v_{1}\right) ^{2}\gamma _{1}+ \cdots +\left( \gamma _{g}^{\top }v_{1}\right) ^{2}\gamma _{g}}{\lambda _{1} }. \end{aligned}$$

It follows that $v_{1}$ is a linear combination of $\gamma _{1}, \ldots , \gamma _{g}$ and hence belongs to the best linear discriminant subspace. It also follows that $Y_{1}=v_{1}^{\top }\varSigma ^{-1/2}X$ is the linear projection of X with maximal skewness, and therefore $v_{1}^{\top }\varSigma ^{-1/2}$ might be taken as the first row of A. The linear projections $ Y_{2}$, ..., $Y_{h}$ might be found using similar arguments.

We shall now consider shape mixtures with skew-normal components, and use the same notation as in Theorem 2, which allows us to represent the third cumulant matrix of X as

$$\begin{aligned} 2\left( \frac{2}{\pi }\right) ^{3/2}\sum \limits _{i=1}^{g}\pi _{i}\delta _{i}\otimes \sum \limits _{i=1}^{g}\pi _{i}\delta _{i}^{\top }\otimes \sum \limits _{i=1}^{g}\pi _{i}\delta _{i}-\sqrt{\frac{2}{\pi }} \sum \limits _{i=1}^{g}\pi _{i}\delta _{i}\otimes \delta _{i}^{\top }\otimes \delta _{i}. \end{aligned}$$

(12)

We have $\mu _{i}=\xi +\sqrt{2/\pi }\delta _{i}$ and $\mu _{i}-\mu =\sqrt{ 2/\pi }\left( \delta _{i}-\pi _{1}\delta _{1}- \cdots -\pi _{g}\delta _{g}\right) $, for $i=1, \ldots ,g$. Hence the best linear discriminant subspace $\left\{ \mu _{1}-\mu , \ldots ,\mu _{g}-\mu \right\} $ is spanned by the columns of the matrix $H=\left( \eta _{1}, \ldots ,\eta _{g+1}\right) $, where

$$\begin{aligned} \eta _{1}=\root 3 \of {2}\sqrt{\frac{2}{\pi }}\left( \pi _{1}\delta _{1}+ \cdots +\pi _{g}\delta _{g}\right) \text { and }\eta _{i}=-\root 3 \of {\pi _{i}\sqrt{\frac{2 }{\pi }/}}\delta _{i}, \end{aligned}$$

for $i=2, \ldots ,d$. An argument similar to the one used for location mixtures completes the proof.

Proof of Theorem 5

Without loss of generality we shall assume that the random vector X is centred : $E\left( X\right) =0_{d}$. Then its third cumulant matrix coincides with its third moment matrix $M_{3,X}$, and might be represented as

$$\begin{aligned} \eta _{1}\otimes \eta _{1}\otimes \eta _{1}^{\top }+ \cdots +\eta _{d}\otimes \eta _{d}\otimes \eta _{d}^{\top }, \end{aligned}$$

where $\eta _{1}, \ldots , \eta _{d}$ are d-dimensional, real vectors. Clearly, if the tensor rank of $M_{3,X}$ is $k\le d$, the last $\eta _{k+1} , \ldots , \eta _{d}$ are null vectors, for $k=0,\ldots ,d-1$. The third moment matrix of $Y=\left( Y_{1},\ldots ,Y_{d}\right) ^{\top }=BX$, where B is a $ d\times d$ real matrix of full rank, is

$$\begin{aligned} M_{3,Y}=B\eta _{1}\otimes B\eta _{1}\otimes \eta _{1}^{\top }B^{\top }+ \cdots +B\eta _{d}\otimes B\eta _{d}\otimes \eta _{d}^{\top }B^{\top }. \end{aligned}$$

The matrix B might be chosen to make the product BH a diagonal matrix: $ BH=diag\left( \psi _{1},\ldots ,\psi _{d}\right) $, where $H=\left\{ \eta _{1},\ldots ,\eta _{d}\right\} $ is the $d\times d$ matrix whose columns are $ \eta _{1}$,..., $\eta _{d}$. The third moment of Y would then be a third-order, symmetric and diagonal tensor:

$$\begin{aligned} M_{3,Y}=\psi _{1}e_{1}\otimes e_{1}\otimes e_{1}^{\top }+ \cdots +\psi _{d}e_{d}\otimes e_{d}\otimes e_{d}^{\top }, \end{aligned}$$

where $e_{i}$ is the i-th column of the d-dimensional identity matrix and $\psi _{i}=E\left( Y_{i}^{3}\right) $, for $i=1,\ldots ,d$. Let $W=\left( W_{1},\ldots ,W_{d}\right) ^{\top }=CY$, where C is again a $d\times d$ real matrix of full rank. The third moment of the i-th component of W, denoted by $\omega _{i}=E\left( W_{i}^{3}\right) $, is just the sum of all products $c_{ijh}E\left( Y_{i}Y_{j}Y_{h}\right) $. The expectation $E\left( Y_{i}Y_{j}Y_{h}\right) $ is zero whenever at least one index differs from the remaining ones, so that $\omega _{i}=c_{i1}^{3}\psi _{1}+ \cdots +c_{id}^{3}\psi _{d}$, for $i=1, \ldots ,d$. In matrix notation, we might write $\omega =\left( C\circ C\circ C\right) \psi $, where $\omega =\left( \omega _{1}, \ldots ,\omega _{d}\right) ^{\top }$, $\psi =\left( \psi _{1}, \ldots ,\psi _{d}\right) ^{\top }$ and “$\circ $” denotes the Hadamard, or elementwise, product (see, for example, Rao and Rao 1998, p 203). The theorem is trivially true when $\psi $ is a null vector, which happens only when the third moment of X is a null matrix. Otherwise, the entries of C might be chosen to make $\omega $ the d-dimensional real vector whose first entry is one while all others are zero: $\omega =\left( 1,0, \ldots ,0\right) ^{\top }\in {\mathbb {R}}^{d}$. Then the matrix A is might be obtained by removing the first row of the matrix product CB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Loperfido, N. Finite mixtures, projection pursuit and tensor rank: a triangulation. Adv Data Anal Classif 13, 145–173 (2019). https://doi.org/10.1007/s11634-018-0336-z

Download citation

Received: 30 March 2017
Revised: 02 July 2018
Accepted: 11 August 2018
Published: 06 September 2018
Issue Date: 08 March 2019
DOI: https://doi.org/10.1007/s11634-018-0336-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finite mixtures, projection pursuit and tensor rank: a triangulation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Tensor eigenvectors for projection pursuit

Data projections by skewness maximization under scale mixtures of skew-normal vectors

A dual subspace parsimonious mixture of matrix normal distributions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now