Constrained monotone EM algorithms for finite mixture of multivariate Gaussians

https://doi.org/10.1016/j.csda.2006.10.011Get rights and content

Abstract

The likelihood function for normal multivariate mixtures may present both local spurious maxima and also singularities and the latter may cause the failure of the optimization algorithms. Theoretical results assure that imposing some constraints on the eigenvalues of the covariance matrices of the multivariate normal components leads to a constrained parameter space with no singularities and at least a smaller number of local maxima of the likelihood function. Conditions assuring that an EM algorithm implementing such constraints maintains the monotonicity property of the usual EM algorithm are provided. Different approaches are presented and their performances are evaluated and compared using numerical experiments.

Introduction

Let f(x;ψ) be the density of a mixture of k multivariate normal distributionsf(x;ψ)=α1p(x;μ1,Σ1)++αkp(x;μk,Σk),where the αj's are the mixing weights and p(x;μj,Σj) is the density function of a q-variate normal distribution with mean vector μj and covariance matrix Σj; furthermore let us set ψ={(αj,μj,Σj),j=1,,k}Ψ, where Ψ is the parameter spaceΨ=(α1,,αk,μ1,,μk,Σ1,,Σk)Rk[1+q+(q2+q)/2]:α1++αk=1,αj0,Σj>0forj=1,,k.Let L(ψ) be the likelihood function of ψ given a sample x1,,xNRq of N independent and identically distributed (i.i.d.) observations with density (1), and let ψ^ be the maximum likelihood estimate of ψ. It is well known that L(ψ) is unbounded from above and may present many local spurious maxima, see e.g. McLachlan and Peel (2000). In the univariate case the likelihood grows indefinitely when the mean of one component coincides with a sample observation and the corresponding variance tends to zero; in the multivariate case analogously this occurs when some covariance matrix tends to be singular. Problems relating to the unboundness of the likelihood function have been studied by several authors both for univariate and multivariate normal mixtures. In the univariate case, Hathaway (1985) imposed relative constraints between pairs of variances while Ciuperca et al. (2003) addressed penalized maximum likelihood estimators providing statistical asymptotic properties of the penalized MLE; the problem of degeneracy has also been considered in a sequence of papers by Biernacki and Chrétien (2003) and Biernacki, 2004a, Biernacki, 2004b. In the multivariate case Snoussi and Mohammad-Djafari (2001) approached the problem of degeneracy penalizing the likelihood by means of an inverse Wishart prior on the covariance matrices. Hathaway (1985) proposed a constrained (global) maximum-likelihood formulation which presents a strongly consistent global solution, no singularities and at least a smaller number of local spurious maxima by imposing the following constraints satisfied by the true set of parametersmin1hjkλmin(ΣhΣj-1)c>0withc(0,1],where λmin(ΣhΣj-1) is the smallest eigenvalue of the matrix ΣhΣj-1; furthermore Biernacki (2004b) suggested constraining the determinant of covariance matrices to be greater than a given value.

Here we move in the spirit of Hathaway (1985) extending the work of Ingrassia (2004), who formulated a sufficient condition such that constraint (2) holds. The advantage of this proposal is that the new set of constraints can be applied directly in the EM algorithm where each covariance matrix Σj (j=1,,k) is iteratively updated. As a matter of fact condition (2) is satisfied when it resultsaλi(Σj)b,i=1,,q;j=1,,k,where λi(Σj) is the ith eigenvalue of Σj, in non decreasing order, and a and b are positive numbers such that a/bc; indeed for any q×q symmetric and positive definite matrices A,B it results λmin(AB-1)λmin(A)λmax(B),which impliesλmin(ΣhΣj-1)λmin(Σh)λmax(Σj)abc>0,1hjk,and thus (3) leads to (2).

In this paper we study constraints based on relation (4) that can be implemented directly at each iteration of the EM algorithm for the updating of the covariance matrices; furthermore we investigate the conditions such that this constrained EM algorithm leads to a non decreasing sequence of the likelihood values as in the usual unconstrained version. Our results follow from the analysis of the role of the eigenvalues and eigenvectors of the covariance matrices Σj (j=1,,k) in the sequence of the estimates performed by the EM algorithm obtained. Basic preliminary ideas have been summarized in Ingrassia and Rocci (2006).

The spectral decomposition of covariance matrices has been considered with different objectives by many authors in multivariate normal mixture approaches to clustering, see e.g. Frayley and Raftery (2002); relationships with other approaches will also be discussed throughout this paper.

The rest of the paper is organized as follows. In Section 2 we investigate the role of the eigenvalues and eigenvectors of the covariance matrices in the EM algorithm; in Section 3 we present some constrained monotone versions of the EM algorithm for multivariate normal mixtures and in Section 4 they are evaluated and compared using numerical studies; in Section 5 we present some geometrical interpretation of the proposed constraints and in Section 6 we discuss some relationships with other approaches; finally in Section 7 we give some concluding remarks.

Section snippets

On the update of the covariance matrices in the EM algorithm

The EM algorithm generates a sequence of estimates {ψ(m)}m, where ψ(0) denotes the initial guess and ψ(m)Ψ for mN, so that the corresponding sequence {L(ψ(m))}m, is not decreasing. In what follows, for sake of space we will indicate with the superscript + the (m+1)th iteration and with the superscript—the previous mth iteration. The E-step, on the (m+1)th iteration computes the quantitiesunj+=αj-p(xn;μj-,Σj-)h=1kαh-p(xn;μh-,Σh-),n=1,,N;j=1,,k,while the M-step requires the global

Constrained monotone EM algorithms

The reformulation of the update of the covariance matrices Σj (j=1,,k) presented above suggests some ideas for the construction of EM algorithms such that the constraints (3), or some other sufficient conditions for (2), are satisfied while the monotonicity is preserved. Firstly at each iteration let us compute the spectral decomposition of the estimate of covariance matrix Sj+, i.e. Sj+=Gj+Lj+Gj+ and afterwards consider one of the following strategies.

Approach A

The simplest approach concerns the update

Numerical results

In this section we present numerical results in order to evaluate and compare the performances of the different proposed constraints and algorithms. We applied five different algorithms based on the strategies presented in the previous section and we compared them with the basic unconstrained algorithm:

0.Unconstrained algorithm (U)
Ordinary EM where Σj+Sj+.
1.Constrained algorithm C naive (CCN)
Constrained EM algorithm C with the eigenvalues constrained in the interval [10-5,10].
2.Constrained

Geometrical interpretation of the constraints

Constraint (2) on the smallest eigenvalue of ΣhΣj-1 (1hjk) leads to a likelihood function with no singularities and having a smaller number of local maxima than the unconstrained likelihood function, see Hathaway (1985); however the reformulation (3) provides some relationships with other aspects to be considered in data modelling by multivariate normal mixtures. Banfield and Raftery (1993) proposed a general approach for geometric cross-cluster constraints in multivariate normal mixture by

Relationships with other approaches

We have already mentioned that several other methods have been proposed to avoid the singularities in the likelihood function and then the numerical failure of the optimization procedures. In this context, we can distinguish three main approaches:

  • 1.

    to control the condition number;

  • 2.

    to control the determinant;

  • 3.

    to add a penalty to the likelihood.

We start examining the relations between our approach and the condition number for matrix inversion. Constraint (4) protects also the covariance matrices Σj (j

Concluding remarks

In this paper we have given theoretical results about EM algorithms for mixtures of multivariate normal distributions which preserve the usual monotone property while implementing the suitable constraints on the eigenvalues of the covariance matrices. These constraints lead to a likelihood function with no singularities and present at least a smaller numbers of local maxima than the unconstrained version. We have also shown that the proposed constraints protect the covariance matrices from ill

Acknowledgements

The authors would like to thank the associate editor and the referee for their interesting comments and suggestions which considerably improved an earlier version of the present paper.

References (18)

  • C. Biernacki et al.

    Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with EM

    Statist. Probab. Lett.

    (2003)
  • G. Celeux et al.

    Gaussian parsimonious clustering models

    Pattern Recognition

    (1995)
  • O. Axelsson

    Iterative Solution Methods

    (1996)
  • J.D. Banfield et al.

    Model-based Gaussian and non-Gaussian clustering

    Biometrics

    (1993)
  • Biernacki, C., 2004a. Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures for grouped data...
  • Biernacki, C., 2004b. An asymptotic upper bound of the likelihood to prevent Gaussian mixtures from degenerating....
  • R.L. Burden et al.

    Numerical Analysis

    (1985)
  • G. Ciuperca et al.

    Penalized maximum likelihood estimator for normal mixtures

    Scand. J. Statist.

    (2003)
  • C. Frayley et al.

    Model-based clustering, discriminant analysis and density estimation

    J. Amer. Statist. Assoc.

    (2002)
There are more references available in the full text version of this article.

Cited by (70)

  • Improved Inference of Gaussian Mixture Copula Model for Clustering and Reproducibility Analysis using Automatic Differentiation

    2022, Econometrics and Statistics
    Citation Excerpt :

    Identifying the possible causes of degeneracy in the likelihood has practical significance in detecting spurious solutions, which are local maximizers of the likelihood function but lack real-life interpretability and hence do not provide a good clustering of the data (McLachlan and Peel, 2000). Constraining the parameter space has been the primary approach to avoid spurious solutions in GMM (Hathaway, 1986; Ingrassia, 2004; Ingrassia and Rocci, 2007; Chen and Tan, 2009; Ingrassia and Rocci, 2011). With respect to GMCM, the relative occurrence of spurious solutions for different inference techniques, initialization strategies, parameterizations, and with increasing dimensionality can be investigated in the future.

  • Robust, fuzzy, and parsimonious clustering, based on mixtures of Factor Analyzers

    2018, International Journal of Approximate Reasoning
    Citation Excerpt :

    On the other hand, a further issue inherent to the choice of Gaussian Mixtures (GM) is originated by the unboundedness of the likelihood, turning the maximization of the objective function into an ill-posed problem. Therefore, following a wide literature stream, based on [24,28,16,22], we will adopt a constrained estimation of the component covariances, to avoid singularities and to reduce the appearance of spurious solutions. In particular, the joint use of trimming and constrained estimation for mixtures of factor analyzers (MFA) have previously been studied for non fuzzy clustering in [19].

  • Clusterwise linear regression modeling with soft scale constraints

    2017, International Journal of Approximate Reasoning
    Citation Excerpt :

    The resulting constrained EM is monotone as Equation (17) is a maximum of the likelihood of Equation (3) subject to the constraints (11). The same rule has been used, among others, in Ingrassia and Rocci [13], and Won et al. [29]. In Won et al. [29] for instance, in a context of maximum likelihood estimation of the covariance matrix under a set of constraints, Equation (17) is the optimization step for each eigenvalue of the covariance matrix.

View all citing articles on Scopus
View full text