Skip to main content
Log in

Using conditional independence for parsimonious model-based Gaussian clustering

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In the framework of model-based cluster analysis, finite mixtures of Gaussian components represent an important class of statistical models widely employed for dealing with quantitative variables. Within this class, we propose novel models in which constraints on the component-specific variance matrices allow us to define Gaussian parsimonious clustering models. Specifically, the proposed models are obtained by assuming that the variables can be partitioned into groups resulting to be conditionally independent within components, thus producing component-specific variance matrices with a block diagonal structure. This approach allows us to extend the methods for model-based cluster analysis and to make them more flexible and versatile. In this paper, Gaussian mixture models are studied under the above mentioned assumption. Identifiability conditions are proved and the model parameters are estimated through the maximum likelihood method by using the Expectation-Maximization algorithm. The Bayesian information criterion is proposed for selecting the partition of the variables into conditionally independent groups. The consistency of the use of this criterion is proved under regularity conditions. In order to examine and compare models with different partitions of the set of variables a hierarchical algorithm is suggested. A wide class of parsimonious Gaussian models is also presented by parameterizing the component-variance matrices according to their spectral decomposition. The effectiveness and usefulness of the proposed methodology are illustrated with two examples based on real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Baek, J., McLachlan, G.J.: Mixtures of factor analyzers with common factor loadings for the clustering and visualisation of high-dimensional data. Technical report NI08018-SCH, Preprint, Series of the Isaac Newton Institute for Mathematical Sciences, Cambridge (2008)

  • Baek, J., McLachlan, G.J., Flack, L.: Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1298–1309 (2010)

    Article  Google Scholar 

  • Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Bartholomew, D., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis: A Unified Approach, 3rd edn. Wiley, Chichester (2011)

    Book  Google Scholar 

  • Basso, R.M., Lachos, V.H., Barbosa Cabral, C.R., Ghosh, P.: Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput. Stat. Data Anal. 54, 2926–2941 (2010)

    Article  Google Scholar 

  • Biernacki, C., Govaert, G.: Choosing models in model-based clustering and discriminant analysis. J. Stat. Comput. Simul. 64, 49–71 (1999)

    Article  MATH  Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41, 561–575 (2003)

    Article  MathSciNet  Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G., Langrognet, F.: Model-based cluster and discriminant analysis with the MIXMOD software. Comput. Stat. Data Anal. 51, 587–600 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Böhning, D., Seidel, W.: Editorial: recent developments in mixture models. Comput. Stat. Data Anal. 41, 349–357 (2003)

    Article  Google Scholar 

  • Böhning, D., Seidel, W., Alfò, M., Garel, B., Patilea, V., Walther, G.: Advances in mixture models. Comput. Stat. Data Anal. 51, 5205–5210 (2007)

    Article  MATH  Google Scholar 

  • Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52, 502–519 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 79, 99–113 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognit. 28, 781–793 (1995)

    Article  Google Scholar 

  • Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994)

    Book  MATH  Google Scholar 

  • Coretto, P., Hennig, C.: Maximum likelihood estimation of heterogeneous mixtures of Gaussian and uniform distributions. J. Stat. Plan. Inference 141, 462–473 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Cutler, A., Windham, M.P.: Information-based validity functionals for mixture analysis. In: Bozdogan, H. (ed.) Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, pp. 149–170. Kluwer Academic, Dordrecht (1994)

    Chapter  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood for incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  • Dias, J.G.: Latent class analysis and model selection. In: Spilopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 95–102. Springer, Berlin (2006)

    Chapter  Google Scholar 

  • Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1998)

    Article  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: Enhanced software for model-based clustering. J. Classif. 20, 263–286 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: MCLUST version 3 for R: normal mixture modeling and model-based clustering. Technical report No. 504, Department of Statistics, University of Washington (2006)

  • Frank, A., Asuncion, A.: UCI machine learning repository. School of Information and Computer Science, University of California, Irvine, CA (2010). http://archive.ics.uci.edu/ml

  • Galimberti, G., Soffritti, G.: Model-based methods to identify multiple cluster structures in a data set. Comput. Stat. Data Anal. 52, 520–536 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Galimberti, G., Montanari, A., Viroli, C.: Penalized factor mixture analysis for variable selection in clustered data. Comput. Stat. Data Anal. 53, 4301–4310 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Ghahramani, Z., Hinton, G.E.: The EM algorithm for factor analyzers. Technical report CRG-TR-96-1, University of Toronto (1997)

  • Gordon, A.D.: Classification, 2nd edn. Chapman & Hall, Boca Raton (1999)

    MATH  Google Scholar 

  • Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2009)

    Article  MathSciNet  Google Scholar 

  • Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)

    Article  MATH  Google Scholar 

  • Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā Ser. A 62, 49–66 (2000)

    MathSciNet  MATH  Google Scholar 

  • Lin, T.I.: Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 100, 257–265 (2009)

    Article  MATH  Google Scholar 

  • Lin, T.I.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20, 343–356 (2010)

    Article  MathSciNet  Google Scholar 

  • Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew t distribution. Stat. Comput. 17, 81–92 (2007a)

    Article  MathSciNet  Google Scholar 

  • Lin, T.I., Lee, J.C., Yen, S.Y., Shu, Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b)

    MATH  Google Scholar 

  • Lütkepohl, H.: Handbook of Matrices. Wiley, Chichester (1996)

    MATH  Google Scholar 

  • MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  • Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection for clustering with Gaussian mixture models. Technical report RR-6211, Inria, France (2007)

  • Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection in model-based clustering: a general variable role modeling. Comput. Stat. Data Anal. 53, 3872–3882 (2009a)

    Article  MathSciNet  MATH  Google Scholar 

  • Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection for clustering with Gaussian mixture models. Biometrics 65, 701–709 (2009b)

    Article  MathSciNet  MATH  Google Scholar 

  • McColl, J.H.: Multivariate Probability. Arnold, London (2004)

    MATH  Google Scholar 

  • McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, Chichester (2008)

    Book  MATH  Google Scholar 

  • McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000a)

    Book  MATH  Google Scholar 

  • McLachlan, G.J., Peel, D.: Mixtures of factor analyzers. In: Langley, P. (ed.) Proceedings of the Seventeenth International Conference on Machine Learning, pp. 599–606. Morgan Kaufmann, San Francisco (2000b)

    Google Scholar 

  • McLachlan, G.J., Peel, D., Basford, K.E., Adams, P.: The EMMIX software for the fitting of mixtures of normal and t-components. J. Stat. Softw. 4, 2 (1999)

    Google Scholar 

  • McLachlan, G.J., Peel, D., Bean, R.W.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41, 379–388 (2003)

    Article  MathSciNet  Google Scholar 

  • McLachlan, G.J., Bean, R.W., Ben-Tovim Jones, L.: Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput. Stat. Data Anal. 51, 5327–5338 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008)

    Article  MathSciNet  Google Scholar 

  • McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D.: Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput. Stat. Data Anal. 54, 711–723 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Melnykov, V., Maitra, R.: Finite mixture models and model-based clustering. Stat. Surv. 4, 80–116 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Melnykov, V., Melnykov, I.: Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Comput. Stat. Data Anal. (2011). doi:10.1016/j.csda.2011.11.002

    MATH  Google Scholar 

  • Miloslavsky, M., van der Laan, M.J.: Fitting of mixtures with unspecified number of components using cross validation distance estimate. Comput. Stat. Data Anal. 41, 413–428 (2003)

    Article  Google Scholar 

  • Montanari, A., Viroli, C.: Heteroscedastic factor mixture analysis. Stat. Model. 10, 441–460 (2010a)

    Article  MathSciNet  Google Scholar 

  • Montanari, A., Viroli, C.: The independent factor analysis approach to latent variable modelling. Statistics 44, 397–416 (2010b)

    Article  MathSciNet  Google Scholar 

  • Peel, D., McLachlan, G.J.: Robust mixture modeling using the t-distribution. Stat. Comput. 10, 339–348 (2000)

    Article  Google Scholar 

  • R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2010). http://www.R-project.org

  • Raftery, A.E., Dean, N.: Variable selection for model-based cluster analysis. J. Am. Stat. Assoc. 101, 168–178 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Ray, S., Lindsay, B.G.: Model selection in high dimensions: a quadratic-risk-based approach. J. R. Stat. Soc. Ser. B 70, 95–118 (2008)

    MathSciNet  MATH  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Symposium on Computer Applications and Medical Care, pp. 261–265. IEEE Computer Society, Los Alamitos (1988)

    Google Scholar 

  • Teicher, H.: Identifiability of mixture models. Ann. Math. Stat. 34, 1265–1269 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  • Tipping, M.E., Bishop, C.M.: Mixture of probabilistic principal component analysers. Neural Comput. 11, 443–482 (1999)

    Article  Google Scholar 

  • Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester (1985)

    MATH  Google Scholar 

  • Viroli, C.: Dimensionally reduced model-based clustering through mixtures of factor mixture analyzers. J. Classif. 27, 363–388 (2010)

    Article  MathSciNet  Google Scholar 

  • Wang, K., Ng, S.-K., McLachlan, G.J.: Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. In: Shi, H., Zhang, Y., Bottema, M.J., Lovell, B.C., Maeder, A.J. (eds.) Proceedings of the 2009 Conference of Digital Image Computing: Techniques and Applications, pp. 526–531. IEEE Computer Society, Los Alamitos (2009)

    Chapter  Google Scholar 

  • Yakowitz, S.J., Spragins, J.D.: On the identifiability of finite mixtures. Ann. Math. Stat. 39, 209–214 (1968)

    Article  MathSciNet  MATH  Google Scholar 

  • Yang, C.C.: Evaluating latent class analysis models in qualitative phenotype identification. Comput. Stat. Data Anal. 50, 1090–1104 (2006)

    Article  Google Scholar 

  • Yoshida, R., Higuchi, T., Imoto, S.: A mixed factors model for dimension reduction and extraction of a group structure in gene expression data. In: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, pp. 161–172 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriele Soffritti.

Appendix A

Appendix A

1.1 A.1 Proof of Theorem 1

In order to prove Theorem 1 it is possible to exploit arguments similar to the ones used in Maugis et al. (2009b) to prove identifiability of Gaussian mixture models with irrelevant variables.

Given Eq. (3), both f(.|θ M ) and \(f(.|\boldsymbol{\theta}^{*}_{M^{*}})\) can be written as Gaussian mixture models. Namely:

$$ f(\mathbf{x}_i|\boldsymbol{\theta}_M) = \sum _{k=1}^K \pi_k \phi_{P} (\mathbf{\tilde{x}}_i|\boldsymbol{ \mu}_{k},\boldsymbol{\varSigma}_{k} ), $$

where \(\mathbf{\tilde{x}}_{i}=(\mathbf{x}_{i}^{S_{1}\top}, \ldots, \mathbf{x}_{i}^{S_{g}\top}, \ldots, \mathbf{x}_{i}^{S_{G}\top})^{\top}\) is obtained by permuting vector x i so that the P variables are listed according to the order established by (S 1,…,S G ), \(\boldsymbol{\mu}_{k}=(\boldsymbol{\mu}_{k1}^{\top}, \ldots, \boldsymbol{\mu}_{kg}^{\top}, \ldots, \boldsymbol{\mu}_{kG}^{\top})^{\top}\), and \(\boldsymbol{\varSigma}_{k}=\bigoplus_{g=1}^{G} \boldsymbol{\varSigma}_{kg}\). Analogously,

$$ f\bigl(\mathbf{x}_i|\boldsymbol{\theta}^*_{M^*} \bigr) = \sum_{k=1}^{K^*} \pi_k^* \phi_{P} \bigl(\mathbf{\tilde{x}}_i^*|\boldsymbol{ \mu}_{k}^*,\boldsymbol{\varSigma}_{k}^* \bigr). $$

Then, since the couples (μ k ,Σ k ) for k=1,…,K are distinct as well as \((\boldsymbol{\mu}_{k}^{*},\boldsymbol{\varSigma}_{k}^{*})\) for k=1,…,K , the identifiability of Gaussian mixture models gives that K=K and, up to a permutation of mixture components and of the elements of x i , \(\pi_{k}=\pi_{k}^{*}\), \(\boldsymbol{\mu}_{k}=\boldsymbol{\mu}_{k}^{*}\) and \(\boldsymbol{\varSigma}_{k}=\boldsymbol{\varSigma}_{k}^{*}\) (see, for example, Yakowitz and Spragins 1968).

In order to complete the proof it is now proven by contradiction that, under the constraint (5), G=G and each element of (S 1,…,S G ) coincides with one and only one element of \((S_{1}^{*}, \ldots, S_{G^{*}}^{*})\).

Consider S g ∈(S 1,…,S G ). Since both (S 1,…,S G ) and \((S_{1}^{*}, \ldots, S_{G^{*}}^{*})\) are partitions of the variable index set \(\mathcal{I}\), there exists at least one \(S_{h}^{*} \in (S_{1}^{*}, \ldots, S_{G^{*}}^{*})\) such that \(S_{g} \cap S_{h}^{*} \neq \emptyset\). Let \(s=S_{g} \cap S_{h}^{*}\), \(t=S_{g}^{c} \cap S_{h}^{*}\), and \(\bar{s}=S_{g} \cap S_{h}^{*c}\).

Suppose that t≠∅. Since tS g =∅, according to model M=(G,K,S 1,…,S G ) Σ k,ts =0 ts k. Due to identifiability of Gaussian mixture models, this implies that \(\boldsymbol{\varSigma}^{*}_{k,ts}=\boldsymbol{\varSigma}^{*}_{kh,ts}=\mathbf{0}_{ts}\)k. However, this result contradicts the constraint (5). Thus, t=∅.

Analogously, suppose now that \(\bar{s} \neq \emptyset\). Since \(\bar{s} \cap S_{h}^{*} = \emptyset\), model \(M^{*}=(G^{*}, K^{*}, S_{1}^{*}, \ldots, S_{G^{*}}^{*})\) implies that \(\boldsymbol{\varSigma}^{*}_{k,s\bar{s}}=\mathbf{0}_{s\bar{s}}\)k. Together with the identifiability of Gaussian mixture models, this also implies that \(\boldsymbol{\varSigma}_{k,s\bar{s}}=\boldsymbol{\varSigma}_{kg,s\bar{s}}=\mathbf{0}_{s\bar{s}}\)k. However, since this result contradicts the constraint (5), it follows from this that \(\bar{s} = \emptyset\).

These two results imply that, under the constraint (5), \(S_{g} \cap S_{h}^{*} \neq \emptyset\)\(S_{g} = S_{h}^{*}\). Thus, since S g and \(S_{h}^{*}\) belong to two partitions of the variable index set \(\mathcal{I}\), there exists a one-to-one correspondence between the two partitions, and G=G .

1.2 A.2 Proof of Corollary 1

In order to prove Corollary 1 it is sufficient to note that under the constraint (5) the parameter space of models in \(\mathcal{M}_{I}\) is a subset of \(\varTheta_{(G, K, S_{1}, \ldots, S_{G})}\) and, hence, Theorem 1 holds also for this subclass of models.

1.3 A.3 Proof of Theorem 2

A proof of Theorem 2 can be obtained by exploiting some results from Maugis et al. (2007) and by suitably modifying the proof of the consistency of the BIC criterion in selecting relevant variables for clustering with Gaussian mixture models (Maugis et al. 2009b).

Since the true number of mixture components K 0 is assumed to be known, by adapting the notation previously introduced to this assumption let now denote M=(G,K 0,S 1,…,S G ), \(M^{0} = (G^{0}, K^{0}, S^{0}_{1}, \ldots, S^{0}_{G^{0}})\), and

$$ \hat{M} = \bigl(\hat{G}, K^0, \hat{S}_1, \ldots, \hat{S}_{\hat{G}}\bigr) = \operatorname{argmax}\limits _{M \in \mathcal{M}'} \mathit{BIC}(M), $$

where \(\mathcal{M}' \subset \mathcal{M}\) is the subclass of the models obtainable from Eq. (4) with K=K 0. Furthermore, consider Δ BIC(M)=BIC(M 0)−BIC(M); after some straightforward algebra it is possible to write

$$ \Delta_{\mathit{BIC}(M)}=2n [\mathbb{D}_{nM^0}- \mathbb{D}_{nM} ]+\gamma_{M}\ln(n), $$
(8)

where \(\gamma_{M}=\lambda_{M}-\lambda_{M^{0}}\), with λ M and \(\lambda_{M^{0}}\) denoting the number of free parameters of models M and M 0, respectively, and

$$\mathbb{D}_{nM}=\frac{1}{n} \sum_{i=1}^n \ln \biggl[\frac{f(\mathbf{x}_i | \hat{\boldsymbol{\theta}}_{M})}{h(\mathbf{x}_i)} \biggr], $$
$$\mathbb{D}_{nM^0}=\frac{1}{n} \sum_{i=1}^n \ln \biggl[\frac{f(\mathbf{x}_i | \hat{\boldsymbol{\theta}}_{M^0})}{h(\mathbf{x}_i)} \biggr]. $$

Since \(P (\hat{M}=M^{0} ) = P (\Delta_{\mathit{BIC}(M)} \geq 0, \forall M \in \mathcal{M}' )\), in order to prove Theorem 2 we have to show that

$$ P (\Delta_{\mathit{BIC}(M)} < 0 ) \operatorname{ \longrightarrow}\limits _{n \rightarrow \infty} 0\quad \forall M \in \mathcal{M}'. $$
(9)

Note that, when M=M 0, we have \(\Delta_{\mathit{BIC}(M^{0})}=0\), thus \(P (\Delta_{\mathit{BIC}(M^{0})} < 0 )=0\)n.

Let now consider MM 0 and, for making easier the reading of this proof, let \(D_{M}=-KL[h,f(\cdot|\boldsymbol{\breve{\theta}}_{M})]\), and \(T_{nM}= \mathbb{D}_{nM^{0}}-\mathbb{D}_{nM}+\frac{\gamma_{M}\ln(n)}{2n}\). Given Eq. (8), it is possible to write P BIC(M)<0)= P(T nM <0). This probability is also equal to

$$ P (T_{nM}- D_{M^0} + D_{M^0} - D_M + D_M< 0 ). $$

According to Lemma 5 in Maugis et al. (2007), the following inequality holds ∀ϵ>0:

According to Proposition 1 (see below), we also have \(\mathbb{D}_{nM} \stackrel{P}{\rightarrow} D_{M}\) \(\forall M \in \mathcal{M}'\). Thus, ∀ϵ>0

$$ P (\mathbb{D}_{nM}-D_M> \epsilon ) \leq P \bigl(| \mathbb{D}_{nM}-D_M|> \epsilon \bigr) \operatorname{ \longrightarrow}\limits _{n \rightarrow \infty} 0. $$

Furthermore, according to the assumption (H1), \(D_{M^{0}}=0\), and −D M >0 since MM 0. Then,

Taking \(\epsilon = \frac{-D_{M}}{4}\), since \(\frac{\gamma_{M}\ln(n)}{2n} \operatorname{\longrightarrow}\limits_{n \rightarrow \infty} 0\) we also obtain

These results imply (9), thus proving the theorem.

1.4 A.4 Proposition 1

Under assumptions (H1) and (H2) the following convergence holds \(\forall M \in \mathcal{M}'\):

$$ \frac{1}{n} \sum_{i=1}^n \ln \biggl( \frac{h(\mathbf{x}_i)}{f(\mathbf{x}_i | \hat{\boldsymbol{\theta}}_{M})} \biggr) \stackrel{P}{\rightarrow} KL\bigl[h,f( \cdot|\boldsymbol{\breve{\theta}}_{M'})\bigr]. $$
(10)

Proof

According to (H2), \(\varTheta'_{M}\) is a compact metric space, and ln[f(x|θ M )] is a continuous function of θ M x∈ℝP. Furthermore, it is possible to show that there exists an envelope function \(H \in \mathcal{H}_{M}=\{\ln[f(\cdot|\boldsymbol{\theta}_{M})]; \boldsymbol{\theta}_{M} \in \varTheta'_{M}\}\) which is h-integrable. This latter result can be proved as follows.

Since Σ kg is positive definite, \(\|\mathbf{x}^{S_{g}}-\boldsymbol{\mu}_{kg}\|^{2}_{\boldsymbol{\varSigma}_{kg}^{-1}}\geq 0\), where \(\|\mathbf{x}^{S_{g}}-\boldsymbol{\mu}_{kg}\|^{2}_{\boldsymbol{\varSigma}_{kg}^{-1}}= (\mathbf{x}^{S_{g}}-\boldsymbol{\mu}_{kg} )^{\top}\boldsymbol{\varSigma}_{kg}^{-1} (\mathbf{x}^{S_{g}}-\boldsymbol{\mu}_{kg} )\). Furthermore, \(|\boldsymbol{\varSigma}_{kg}|^{-\frac{1}{2}} \leq a^{-\frac{1}{2}}\) (see Maugis et al. 2007, Lemma 3). Writing f(x|θ M )= \(\sum_{k=1}^{K} \pi_{k} g(\mathbf{x}|\boldsymbol{\vartheta}_{k})\), where

$$ g(\mathbf{x}|\boldsymbol{\vartheta}_k)= \prod _{g=1}^G |2 \pi\boldsymbol{\varSigma}_{kg}|^{-\frac{1}{2}} \exp \biggl(-\frac{\|\mathbf{x}^{S_g}-\boldsymbol{\mu}_{kg}\|^2_{\boldsymbol{\varSigma}_{kg}^{-1}}}{2} \biggr), $$

with ϑ k =(μ k1,…,μ kG ,Σ k1,…,Σ kG ), and recalling that \(\sum_{k=1}^{K} \pi_{k} =1\), the following upper bound of ln[f(x|θ M )] holds:

For making shorter the following equations, let \(d^{2}_{kg}=\|\mathbf{x}^{S_{g}}-\boldsymbol{\mu}_{kg}\|^{2}_{\boldsymbol{\varSigma}_{kg}^{-1}}\). Using the concavity of the logarithm function we obtain:

Since \(\boldsymbol{\mu}_{kg} \in \mathcal{B}(\eta,P_{g})\) and using Lemma 3 in Maugis et al. (2007) it is possible to write:

Furthermore, since \(|\boldsymbol{\varSigma}_{kg}| \leq b^{P_{g}}\) (see Maugis et al. 2007, Lemma 3), the lower bound of ln[f(x|θ M )] is given by:

Thus, since each function of the family \(\mathcal{H}_{M}\) is bounded by

for all \(\boldsymbol{\theta}_{M} \in \varTheta'_{M}\) and all x∈ℝP we have

$$ \bigl|\ln\bigl[f(\mathbf{x}|\boldsymbol{\theta}_M)\bigr] \bigr| \leq C_1(a,b,P,G,\eta)+C_2(a)\|\mathbf{x}\|^2, $$

defining the envelope function H, where C 1(a,b,P,G,η) and C 2(a) are two positive constants.

The h-integrability of this function can be proved by showing that ∫∥x2 h(x)d x<∞:

where inequalities are obtained using Lemmas 3 and 4 in Maugis et al. (2007) and assumption (H2).

Hence, according to Proposition 2 in Maugis et al. (2007),

$$ \frac{1}{n} \sum_{i=1}^n \ln \bigl[f(\mathbf{x}_i | \hat{\boldsymbol{\theta}}_{M}) \bigr] \stackrel{P}{\rightarrow} \mathbb{E}_\mathbf{X}\bigl[\ln f( \mathbf{X}| \boldsymbol{\breve{\theta}}_{M})\bigr]. $$
(11)

Then, since \(\ln(h) \in \mathcal{H}_{M^{0}}\), it implies that \(\mathbb{E}_{\mathbf{X}}[|\ln h(\mathbf{X})|] \leq \mathbb{E}_{\mathbf{X}}[H(\mathbf{X})]<\infty\). Thus, according to the law of large numbers

$$ \frac{1}{n} \sum_{i=1}^n \ln \bigl[h(\mathbf{x}_i) \bigr] \stackrel{P}{\rightarrow} \mathbb{E}_\mathbf{X}\bigl[\ln h(\mathbf{X})\bigr]. $$
(12)

Convergences (11) and (12) imply (10), thus proving the proposition. □

1.5 A.5 Proposition 2

Let Σ 1,…,Σ g ,…,Σ G be G real, symmetric and definite positive matrices, whose dimensions are P g ×P g for g=1,…,G, and let \(\boldsymbol{\varSigma} = \bigoplus_{g=1}^{G} \boldsymbol{\varSigma}_{g}\). Furthermore, let \(\boldsymbol{\varSigma}_{g}= \lambda_{g} \mathbf{D}_{g}\mathbf{A}_{g}\mathbf{D}_{g}^{\top}\), where \(\lambda_{g}=|\boldsymbol{\varSigma}_{g}|^{1/P_{g}}\), D g is the matrix of orthonormal eigenvectors of Σ g , and A g is the diagonal matrix containing the eigenvalues of Σ g (normalized in such a way that |A g |=1). Then, Σ=λ DAD , where \(\lambda= \prod_{g=1}^{G}\lambda_{g}^{\frac{P_{g}}{P}}\), \(\mathbf{D}=\bigoplus_{g=1}^{G} \mathbf{D}_{g}\), and \(\mathbf{A}= \bigoplus_{g=1}^{G} \frac{\lambda_{g}}{\prod_{g=1}^{G}\lambda_{g}^{\frac{P_{g}}{P}}}\mathbf{A}_{g}\).

Proof

Consider the spectral decomposition of Σ g : \(\boldsymbol{\varSigma}_{g}=\mathbf{D}_{g}\mathbf{L}_{g}\mathbf{D}_{g}^{\top}\), where L g is the diagonal matrix containing the eigenvalues of Σ g , for g=1,…,G. Then, L g =λ g A g .

According to some properties of the direct sum operator, the following results hold:

  1. 1.

    \(|\boldsymbol{\varSigma}|=\prod_{g=1}^{G}|\boldsymbol{\varSigma}_{g}|=\prod_{g=1}^{G}\lambda_{g}^{P_{g}}\) (see, for example, Lütkepohl 1996, p. 22);

  2. 2.

    Σ=DLD , where \(\mathbf{D}=\bigoplus_{g=1}^{G} \mathbf{D}_{g}\) and \(\mathbf{L}=\bigoplus_{g=1}^{G} \mathbf{L}_{g}\) (see Lütkepohl 1996, p. 66).

Hence, \(|\boldsymbol{\varSigma}|^{\frac{1}{P}}=\prod_{g=1}^{G}\lambda_{g}^{\frac{P_{g}}{P}}=\lambda\), \(\frac{1}{|\boldsymbol{\varSigma}|^{\frac{1}{P}}}\mathbf{L}=\) \(\frac{1}{|\boldsymbol{\varSigma}|^{\frac{1}{P}}}\bigoplus_{g=1}^{G} \mathbf{L}_{g}\) \(=\bigoplus_{g=1}^{G} \frac{\lambda_{g}}{\prod_{g=1}^{G}\lambda_{g}^{\frac{P_{g}}{P}}}\mathbf{A}_{g}=\mathbf{A}\), thus proving the proposition. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galimberti, G., Soffritti, G. Using conditional independence for parsimonious model-based Gaussian clustering. Stat Comput 23, 625–638 (2013). https://doi.org/10.1007/s11222-012-9336-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-012-9336-6

Keywords

Navigation