Abstract
In the framework of model-based cluster analysis, finite mixtures of Gaussian components represent an important class of statistical models widely employed for dealing with quantitative variables. Within this class, we propose novel models in which constraints on the component-specific variance matrices allow us to define Gaussian parsimonious clustering models. Specifically, the proposed models are obtained by assuming that the variables can be partitioned into groups resulting to be conditionally independent within components, thus producing component-specific variance matrices with a block diagonal structure. This approach allows us to extend the methods for model-based cluster analysis and to make them more flexible and versatile. In this paper, Gaussian mixture models are studied under the above mentioned assumption. Identifiability conditions are proved and the model parameters are estimated through the maximum likelihood method by using the Expectation-Maximization algorithm. The Bayesian information criterion is proposed for selecting the partition of the variables into conditionally independent groups. The consistency of the use of this criterion is proved under regularity conditions. In order to examine and compare models with different partitions of the set of variables a hierarchical algorithm is suggested. A wide class of parsimonious Gaussian models is also presented by parameterizing the component-variance matrices according to their spectral decomposition. The effectiveness and usefulness of the proposed methodology are illustrated with two examples based on real datasets.
Similar content being viewed by others
References
Baek, J., McLachlan, G.J.: Mixtures of factor analyzers with common factor loadings for the clustering and visualisation of high-dimensional data. Technical report NI08018-SCH, Preprint, Series of the Isaac Newton Institute for Mathematical Sciences, Cambridge (2008)
Baek, J., McLachlan, G.J., Flack, L.: Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1298–1309 (2010)
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Bartholomew, D., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis: A Unified Approach, 3rd edn. Wiley, Chichester (2011)
Basso, R.M., Lachos, V.H., Barbosa Cabral, C.R., Ghosh, P.: Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput. Stat. Data Anal. 54, 2926–2941 (2010)
Biernacki, C., Govaert, G.: Choosing models in model-based clustering and discriminant analysis. J. Stat. Comput. Simul. 64, 49–71 (1999)
Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41, 561–575 (2003)
Biernacki, C., Celeux, G., Govaert, G., Langrognet, F.: Model-based cluster and discriminant analysis with the MIXMOD software. Comput. Stat. Data Anal. 51, 587–600 (2006)
Böhning, D., Seidel, W.: Editorial: recent developments in mixture models. Comput. Stat. Data Anal. 41, 349–357 (2003)
Böhning, D., Seidel, W., Alfò, M., Garel, B., Patilea, V., Walther, G.: Advances in mixture models. Comput. Stat. Data Anal. 51, 5205–5210 (2007)
Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52, 502–519 (2007)
Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 79, 99–113 (2001)
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognit. 28, 781–793 (1995)
Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994)
Coretto, P., Hennig, C.: Maximum likelihood estimation of heterogeneous mixtures of Gaussian and uniform distributions. J. Stat. Plan. Inference 141, 462–473 (2011)
Cutler, A., Windham, M.P.: Information-based validity functionals for mixture analysis. In: Bozdogan, H. (ed.) Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, pp. 149–170. Kluwer Academic, Dordrecht (1994)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood for incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–22 (1977)
Dias, J.G.: Latent class analysis and model selection. In: Spilopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 95–102. Springer, Berlin (2006)
Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1998)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)
Fraley, C., Raftery, A.E.: Enhanced software for model-based clustering. J. Classif. 20, 263–286 (2003)
Fraley, C., Raftery, A.E.: MCLUST version 3 for R: normal mixture modeling and model-based clustering. Technical report No. 504, Department of Statistics, University of Washington (2006)
Frank, A., Asuncion, A.: UCI machine learning repository. School of Information and Computer Science, University of California, Irvine, CA (2010). http://archive.ics.uci.edu/ml
Galimberti, G., Soffritti, G.: Model-based methods to identify multiple cluster structures in a data set. Comput. Stat. Data Anal. 52, 520–536 (2007)
Galimberti, G., Montanari, A., Viroli, C.: Penalized factor mixture analysis for variable selection in clustered data. Comput. Stat. Data Anal. 53, 4301–4310 (2009)
Ghahramani, Z., Hinton, G.E.: The EM algorithm for factor analyzers. Technical report CRG-TR-96-1, University of Toronto (1997)
Gordon, A.D.: Classification, 2nd edn. Chapman & Hall, Boca Raton (1999)
Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2009)
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā Ser. A 62, 49–66 (2000)
Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100, 257–265 (2009)
Lin, T.I.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20, 343–356 (2010)
Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew t distribution. Stat. Comput. 17, 81–92 (2007a)
Lin, T.I., Lee, J.C., Yen, S.Y., Shu, Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b)
Lütkepohl, H.: Handbook of Matrices. Wiley, Chichester (1996)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection for clustering with Gaussian mixture models. Technical report RR-6211, Inria, France (2007)
Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection in model-based clustering: a general variable role modeling. Comput. Stat. Data Anal. 53, 3872–3882 (2009a)
Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection for clustering with Gaussian mixture models. Biometrics 65, 701–709 (2009b)
McColl, J.H.: Multivariate Probability. Arnold, London (2004)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, Chichester (2008)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000a)
McLachlan, G.J., Peel, D.: Mixtures of factor analyzers. In: Langley, P. (ed.) Proceedings of the Seventeenth International Conference on Machine Learning, pp. 599–606. Morgan Kaufmann, San Francisco (2000b)
McLachlan, G.J., Peel, D., Basford, K.E., Adams, P.: The EMMIX software for the fitting of mixtures of normal and t-components. J. Stat. Softw. 4, 2 (1999)
McLachlan, G.J., Peel, D., Bean, R.W.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41, 379–388 (2003)
McLachlan, G.J., Bean, R.W., Ben-Tovim Jones, L.: Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput. Stat. Data Anal. 51, 5327–5338 (2007)
McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008)
McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D.: Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput. Stat. Data Anal. 54, 711–723 (2010)
Melnykov, V., Maitra, R.: Finite mixture models and model-based clustering. Stat. Surv. 4, 80–116 (2010)
Melnykov, V., Melnykov, I.: Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Comput. Stat. Data Anal. (2011). doi:10.1016/j.csda.2011.11.002
Miloslavsky, M., van der Laan, M.J.: Fitting of mixtures with unspecified number of components using cross validation distance estimate. Comput. Stat. Data Anal. 41, 413–428 (2003)
Montanari, A., Viroli, C.: Heteroscedastic factor mixture analysis. Stat. Model. 10, 441–460 (2010a)
Montanari, A., Viroli, C.: The independent factor analysis approach to latent variable modelling. Statistics 44, 397–416 (2010b)
Peel, D., McLachlan, G.J.: Robust mixture modeling using the t-distribution. Stat. Comput. 10, 339–348 (2000)
R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2010). http://www.R-project.org
Raftery, A.E., Dean, N.: Variable selection for model-based cluster analysis. J. Am. Stat. Assoc. 101, 168–178 (2006)
Ray, S., Lindsay, B.G.: Model selection in high dimensions: a quadratic-risk-based approach. J. R. Stat. Soc. Ser. B 70, 95–118 (2008)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Symposium on Computer Applications and Medical Care, pp. 261–265. IEEE Computer Society, Los Alamitos (1988)
Teicher, H.: Identifiability of mixture models. Ann. Math. Stat. 34, 1265–1269 (1963)
Tipping, M.E., Bishop, C.M.: Mixture of probabilistic principal component analysers. Neural Comput. 11, 443–482 (1999)
Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester (1985)
Viroli, C.: Dimensionally reduced model-based clustering through mixtures of factor mixture analyzers. J. Classif. 27, 363–388 (2010)
Wang, K., Ng, S.-K., McLachlan, G.J.: Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. In: Shi, H., Zhang, Y., Bottema, M.J., Lovell, B.C., Maeder, A.J. (eds.) Proceedings of the 2009 Conference of Digital Image Computing: Techniques and Applications, pp. 526–531. IEEE Computer Society, Los Alamitos (2009)
Yakowitz, S.J., Spragins, J.D.: On the identifiability of finite mixtures. Ann. Math. Stat. 39, 209–214 (1968)
Yang, C.C.: Evaluating latent class analysis models in qualitative phenotype identification. Comput. Stat. Data Anal. 50, 1090–1104 (2006)
Yoshida, R., Higuchi, T., Imoto, S.: A mixed factors model for dimension reduction and extraction of a group structure in gene expression data. In: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, pp. 161–172 (2004)
Author information
Authors and Affiliations
Corresponding author
Appendix A
Appendix A
1.1 A.1 Proof of Theorem 1
In order to prove Theorem 1 it is possible to exploit arguments similar to the ones used in Maugis et al. (2009b) to prove identifiability of Gaussian mixture models with irrelevant variables.
Given Eq. (3), both f(.|θ M ) and \(f(.|\boldsymbol{\theta}^{*}_{M^{*}})\) can be written as Gaussian mixture models. Namely:
where \(\mathbf{\tilde{x}}_{i}=(\mathbf{x}_{i}^{S_{1}\top}, \ldots, \mathbf{x}_{i}^{S_{g}\top}, \ldots, \mathbf{x}_{i}^{S_{G}\top})^{\top}\) is obtained by permuting vector x i so that the P variables are listed according to the order established by (S 1,…,S G ), \(\boldsymbol{\mu}_{k}=(\boldsymbol{\mu}_{k1}^{\top}, \ldots, \boldsymbol{\mu}_{kg}^{\top}, \ldots, \boldsymbol{\mu}_{kG}^{\top})^{\top}\), and \(\boldsymbol{\varSigma}_{k}=\bigoplus_{g=1}^{G} \boldsymbol{\varSigma}_{kg}\). Analogously,
Then, since the couples (μ k ,Σ k ) for k=1,…,K are distinct as well as \((\boldsymbol{\mu}_{k}^{*},\boldsymbol{\varSigma}_{k}^{*})\) for k=1,…,K ∗, the identifiability of Gaussian mixture models gives that K=K ∗ and, up to a permutation of mixture components and of the elements of x i , \(\pi_{k}=\pi_{k}^{*}\), \(\boldsymbol{\mu}_{k}=\boldsymbol{\mu}_{k}^{*}\) and \(\boldsymbol{\varSigma}_{k}=\boldsymbol{\varSigma}_{k}^{*}\) (see, for example, Yakowitz and Spragins 1968).
In order to complete the proof it is now proven by contradiction that, under the constraint (5), G=G ∗ and each element of (S 1,…,S G ) coincides with one and only one element of \((S_{1}^{*}, \ldots, S_{G^{*}}^{*})\).
Consider S g ∈(S 1,…,S G ). Since both (S 1,…,S G ) and \((S_{1}^{*}, \ldots, S_{G^{*}}^{*})\) are partitions of the variable index set \(\mathcal{I}\), there exists at least one \(S_{h}^{*} \in (S_{1}^{*}, \ldots, S_{G^{*}}^{*})\) such that \(S_{g} \cap S_{h}^{*} \neq \emptyset\). Let \(s=S_{g} \cap S_{h}^{*}\), \(t=S_{g}^{c} \cap S_{h}^{*}\), and \(\bar{s}=S_{g} \cap S_{h}^{*c}\).
Suppose that t≠∅. Since t∩S g =∅, according to model M=(G,K,S 1,…,S G ) Σ k,ts =0 ts ∀k. Due to identifiability of Gaussian mixture models, this implies that \(\boldsymbol{\varSigma}^{*}_{k,ts}=\boldsymbol{\varSigma}^{*}_{kh,ts}=\mathbf{0}_{ts}\) ∀k. However, this result contradicts the constraint (5). Thus, t=∅.
Analogously, suppose now that \(\bar{s} \neq \emptyset\). Since \(\bar{s} \cap S_{h}^{*} = \emptyset\), model \(M^{*}=(G^{*}, K^{*}, S_{1}^{*}, \ldots, S_{G^{*}}^{*})\) implies that \(\boldsymbol{\varSigma}^{*}_{k,s\bar{s}}=\mathbf{0}_{s\bar{s}}\) ∀k. Together with the identifiability of Gaussian mixture models, this also implies that \(\boldsymbol{\varSigma}_{k,s\bar{s}}=\boldsymbol{\varSigma}_{kg,s\bar{s}}=\mathbf{0}_{s\bar{s}}\) ∀k. However, since this result contradicts the constraint (5), it follows from this that \(\bar{s} = \emptyset\).
These two results imply that, under the constraint (5), \(S_{g} \cap S_{h}^{*} \neq \emptyset\) ⇒ \(S_{g} = S_{h}^{*}\). Thus, since S g and \(S_{h}^{*}\) belong to two partitions of the variable index set \(\mathcal{I}\), there exists a one-to-one correspondence between the two partitions, and G=G ∗.
1.2 A.2 Proof of Corollary 1
In order to prove Corollary 1 it is sufficient to note that under the constraint (5) the parameter space of models in \(\mathcal{M}_{I}\) is a subset of \(\varTheta_{(G, K, S_{1}, \ldots, S_{G})}\) and, hence, Theorem 1 holds also for this subclass of models.
1.3 A.3 Proof of Theorem 2
A proof of Theorem 2 can be obtained by exploiting some results from Maugis et al. (2007) and by suitably modifying the proof of the consistency of the BIC criterion in selecting relevant variables for clustering with Gaussian mixture models (Maugis et al. 2009b).
Since the true number of mixture components K 0 is assumed to be known, by adapting the notation previously introduced to this assumption let now denote M=(G,K 0,S 1,…,S G ), \(M^{0} = (G^{0}, K^{0}, S^{0}_{1}, \ldots, S^{0}_{G^{0}})\), and
where \(\mathcal{M}' \subset \mathcal{M}\) is the subclass of the models obtainable from Eq. (4) with K=K 0. Furthermore, consider Δ BIC(M)=BIC(M 0)−BIC(M); after some straightforward algebra it is possible to write
where \(\gamma_{M}=\lambda_{M}-\lambda_{M^{0}}\), with λ M and \(\lambda_{M^{0}}\) denoting the number of free parameters of models M and M 0, respectively, and
Since \(P (\hat{M}=M^{0} ) = P (\Delta_{\mathit{BIC}(M)} \geq 0, \forall M \in \mathcal{M}' )\), in order to prove Theorem 2 we have to show that
Note that, when M=M 0, we have \(\Delta_{\mathit{BIC}(M^{0})}=0\), thus \(P (\Delta_{\mathit{BIC}(M^{0})} < 0 )=0\) ∀n.
Let now consider M≠M 0 and, for making easier the reading of this proof, let \(D_{M}=-KL[h,f(\cdot|\boldsymbol{\breve{\theta}}_{M})]\), and \(T_{nM}= \mathbb{D}_{nM^{0}}-\mathbb{D}_{nM}+\frac{\gamma_{M}\ln(n)}{2n}\). Given Eq. (8), it is possible to write P(Δ BIC(M)<0)= P(T nM <0). This probability is also equal to
According to Lemma 5 in Maugis et al. (2007), the following inequality holds ∀ϵ>0:
According to Proposition 1 (see below), we also have \(\mathbb{D}_{nM} \stackrel{P}{\rightarrow} D_{M}\) \(\forall M \in \mathcal{M}'\). Thus, ∀ϵ>0
Furthermore, according to the assumption (H1), \(D_{M^{0}}=0\), and −D M >0 since M≠M 0. Then,
Taking \(\epsilon = \frac{-D_{M}}{4}\), since \(\frac{\gamma_{M}\ln(n)}{2n} \operatorname{\longrightarrow}\limits_{n \rightarrow \infty} 0\) we also obtain
These results imply (9), thus proving the theorem.
1.4 A.4 Proposition 1
Under assumptions (H1) and (H2) the following convergence holds \(\forall M \in \mathcal{M}'\):
Proof
According to (H2), \(\varTheta'_{M}\) is a compact metric space, and ln[f(x|θ M )] is a continuous function of θ M ∀x∈ℝP. Furthermore, it is possible to show that there exists an envelope function \(H \in \mathcal{H}_{M}=\{\ln[f(\cdot|\boldsymbol{\theta}_{M})]; \boldsymbol{\theta}_{M} \in \varTheta'_{M}\}\) which is h-integrable. This latter result can be proved as follows.
Since Σ kg is positive definite, \(\|\mathbf{x}^{S_{g}}-\boldsymbol{\mu}_{kg}\|^{2}_{\boldsymbol{\varSigma}_{kg}^{-1}}\geq 0\), where \(\|\mathbf{x}^{S_{g}}-\boldsymbol{\mu}_{kg}\|^{2}_{\boldsymbol{\varSigma}_{kg}^{-1}}= (\mathbf{x}^{S_{g}}-\boldsymbol{\mu}_{kg} )^{\top}\boldsymbol{\varSigma}_{kg}^{-1} (\mathbf{x}^{S_{g}}-\boldsymbol{\mu}_{kg} )\). Furthermore, \(|\boldsymbol{\varSigma}_{kg}|^{-\frac{1}{2}} \leq a^{-\frac{1}{2}}\) (see Maugis et al. 2007, Lemma 3). Writing f(x|θ M )= \(\sum_{k=1}^{K} \pi_{k} g(\mathbf{x}|\boldsymbol{\vartheta}_{k})\), where
with ϑ k =(μ k1,…,μ kG ,Σ k1,…,Σ kG ), and recalling that \(\sum_{k=1}^{K} \pi_{k} =1\), the following upper bound of ln[f(x|θ M )] holds:
For making shorter the following equations, let \(d^{2}_{kg}=\|\mathbf{x}^{S_{g}}-\boldsymbol{\mu}_{kg}\|^{2}_{\boldsymbol{\varSigma}_{kg}^{-1}}\). Using the concavity of the logarithm function we obtain:
Since \(\boldsymbol{\mu}_{kg} \in \mathcal{B}(\eta,P_{g})\) and using Lemma 3 in Maugis et al. (2007) it is possible to write:
Furthermore, since \(|\boldsymbol{\varSigma}_{kg}| \leq b^{P_{g}}\) (see Maugis et al. 2007, Lemma 3), the lower bound of ln[f(x|θ M )] is given by:
Thus, since each function of the family \(\mathcal{H}_{M}\) is bounded by
for all \(\boldsymbol{\theta}_{M} \in \varTheta'_{M}\) and all x∈ℝP we have
defining the envelope function H, where C 1(a,b,P,G,η) and C 2(a) are two positive constants.
The h-integrability of this function can be proved by showing that ∫∥x∥2 h(x)d x<∞:
where inequalities are obtained using Lemmas 3 and 4 in Maugis et al. (2007) and assumption (H2).
Hence, according to Proposition 2 in Maugis et al. (2007),
Then, since \(\ln(h) \in \mathcal{H}_{M^{0}}\), it implies that \(\mathbb{E}_{\mathbf{X}}[|\ln h(\mathbf{X})|] \leq \mathbb{E}_{\mathbf{X}}[H(\mathbf{X})]<\infty\). Thus, according to the law of large numbers
Convergences (11) and (12) imply (10), thus proving the proposition. □
1.5 A.5 Proposition 2
Let Σ 1,…,Σ g ,…,Σ G be G real, symmetric and definite positive matrices, whose dimensions are P g ×P g for g=1,…,G, and let \(\boldsymbol{\varSigma} = \bigoplus_{g=1}^{G} \boldsymbol{\varSigma}_{g}\). Furthermore, let \(\boldsymbol{\varSigma}_{g}= \lambda_{g} \mathbf{D}_{g}\mathbf{A}_{g}\mathbf{D}_{g}^{\top}\), where \(\lambda_{g}=|\boldsymbol{\varSigma}_{g}|^{1/P_{g}}\), D g is the matrix of orthonormal eigenvectors of Σ g , and A g is the diagonal matrix containing the eigenvalues of Σ g (normalized in such a way that |A g |=1). Then, Σ=λ DAD ⊤, where \(\lambda= \prod_{g=1}^{G}\lambda_{g}^{\frac{P_{g}}{P}}\), \(\mathbf{D}=\bigoplus_{g=1}^{G} \mathbf{D}_{g}\), and \(\mathbf{A}= \bigoplus_{g=1}^{G} \frac{\lambda_{g}}{\prod_{g=1}^{G}\lambda_{g}^{\frac{P_{g}}{P}}}\mathbf{A}_{g}\).
Proof
Consider the spectral decomposition of Σ g : \(\boldsymbol{\varSigma}_{g}=\mathbf{D}_{g}\mathbf{L}_{g}\mathbf{D}_{g}^{\top}\), where L g is the diagonal matrix containing the eigenvalues of Σ g , for g=1,…,G. Then, L g =λ g A g .
According to some properties of the direct sum operator, the following results hold:
-
1.
\(|\boldsymbol{\varSigma}|=\prod_{g=1}^{G}|\boldsymbol{\varSigma}_{g}|=\prod_{g=1}^{G}\lambda_{g}^{P_{g}}\) (see, for example, Lütkepohl 1996, p. 22);
-
2.
Σ=DLD ⊤, where \(\mathbf{D}=\bigoplus_{g=1}^{G} \mathbf{D}_{g}\) and \(\mathbf{L}=\bigoplus_{g=1}^{G} \mathbf{L}_{g}\) (see Lütkepohl 1996, p. 66).
Hence, \(|\boldsymbol{\varSigma}|^{\frac{1}{P}}=\prod_{g=1}^{G}\lambda_{g}^{\frac{P_{g}}{P}}=\lambda\), \(\frac{1}{|\boldsymbol{\varSigma}|^{\frac{1}{P}}}\mathbf{L}=\) \(\frac{1}{|\boldsymbol{\varSigma}|^{\frac{1}{P}}}\bigoplus_{g=1}^{G} \mathbf{L}_{g}\) \(=\bigoplus_{g=1}^{G} \frac{\lambda_{g}}{\prod_{g=1}^{G}\lambda_{g}^{\frac{P_{g}}{P}}}\mathbf{A}_{g}=\mathbf{A}\), thus proving the proposition. □
Rights and permissions
About this article
Cite this article
Galimberti, G., Soffritti, G. Using conditional independence for parsimonious model-based Gaussian clustering. Stat Comput 23, 625–638 (2013). https://doi.org/10.1007/s11222-012-9336-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-012-9336-6