Constrained monotone EM algorithms for finite mixture of multivariate Gaussians
Introduction
Let be the density of a mixture of multivariate normal distributionswhere the 's are the mixing weights and is the density function of a -variate normal distribution with mean vector and covariance matrix ; furthermore let us set , where is the parameter spaceLet be the likelihood function of given a sample of independent and identically distributed (i.i.d.) observations with density (1), and let be the maximum likelihood estimate of . It is well known that is unbounded from above and may present many local spurious maxima, see e.g. McLachlan and Peel (2000). In the univariate case the likelihood grows indefinitely when the mean of one component coincides with a sample observation and the corresponding variance tends to zero; in the multivariate case analogously this occurs when some covariance matrix tends to be singular. Problems relating to the unboundness of the likelihood function have been studied by several authors both for univariate and multivariate normal mixtures. In the univariate case, Hathaway (1985) imposed relative constraints between pairs of variances while Ciuperca et al. (2003) addressed penalized maximum likelihood estimators providing statistical asymptotic properties of the penalized MLE; the problem of degeneracy has also been considered in a sequence of papers by Biernacki and Chrétien (2003) and Biernacki, 2004a, Biernacki, 2004b. In the multivariate case Snoussi and Mohammad-Djafari (2001) approached the problem of degeneracy penalizing the likelihood by means of an inverse Wishart prior on the covariance matrices. Hathaway (1985) proposed a constrained (global) maximum-likelihood formulation which presents a strongly consistent global solution, no singularities and at least a smaller number of local spurious maxima by imposing the following constraints satisfied by the true set of parameterswhere is the smallest eigenvalue of the matrix ; furthermore Biernacki (2004b) suggested constraining the determinant of covariance matrices to be greater than a given value.
Here we move in the spirit of Hathaway (1985) extending the work of Ingrassia (2004), who formulated a sufficient condition such that constraint (2) holds. The advantage of this proposal is that the new set of constraints can be applied directly in the EM algorithm where each covariance matrix () is iteratively updated. As a matter of fact condition (2) is satisfied when it resultswhere is the th eigenvalue of , in non decreasing order, and a and b are positive numbers such that ; indeed for any symmetric and positive definite matrices it results which impliesand thus (3) leads to (2).
In this paper we study constraints based on relation (4) that can be implemented directly at each iteration of the EM algorithm for the updating of the covariance matrices; furthermore we investigate the conditions such that this constrained EM algorithm leads to a non decreasing sequence of the likelihood values as in the usual unconstrained version. Our results follow from the analysis of the role of the eigenvalues and eigenvectors of the covariance matrices () in the sequence of the estimates performed by the EM algorithm obtained. Basic preliminary ideas have been summarized in Ingrassia and Rocci (2006).
The spectral decomposition of covariance matrices has been considered with different objectives by many authors in multivariate normal mixture approaches to clustering, see e.g. Frayley and Raftery (2002); relationships with other approaches will also be discussed throughout this paper.
The rest of the paper is organized as follows. In Section 2 we investigate the role of the eigenvalues and eigenvectors of the covariance matrices in the EM algorithm; in Section 3 we present some constrained monotone versions of the EM algorithm for multivariate normal mixtures and in Section 4 they are evaluated and compared using numerical studies; in Section 5 we present some geometrical interpretation of the proposed constraints and in Section 6 we discuss some relationships with other approaches; finally in Section 7 we give some concluding remarks.
Section snippets
On the update of the covariance matrices in the EM algorithm
The EM algorithm generates a sequence of estimates , where denotes the initial guess and for , so that the corresponding sequence , is not decreasing. In what follows, for sake of space we will indicate with the superscript the th iteration and with the superscript—the previous mth iteration. The E-step, on the th iteration computes the quantitieswhile the M-step requires the global
Constrained monotone EM algorithms
The reformulation of the update of the covariance matrices () presented above suggests some ideas for the construction of EM algorithms such that the constraints (3), or some other sufficient conditions for (2), are satisfied while the monotonicity is preserved. Firstly at each iteration let us compute the spectral decomposition of the estimate of covariance matrix , i.e. and afterwards consider one of the following strategies. Approach A The simplest approach concerns the update
Numerical results
In this section we present numerical results in order to evaluate and compare the performances of the different proposed constraints and algorithms. We applied five different algorithms based on the strategies presented in the previous section and we compared them with the basic unconstrained algorithm:0. Unconstrained algorithm (U) Ordinary EM where . 1. Constrained algorithm C naive (CCN) Constrained EM algorithm C with the eigenvalues constrained in the interval . 2. Constrained
Geometrical interpretation of the constraints
Constraint (2) on the smallest eigenvalue of () leads to a likelihood function with no singularities and having a smaller number of local maxima than the unconstrained likelihood function, see Hathaway (1985); however the reformulation (3) provides some relationships with other aspects to be considered in data modelling by multivariate normal mixtures. Banfield and Raftery (1993) proposed a general approach for geometric cross-cluster constraints in multivariate normal mixture by
Relationships with other approaches
We have already mentioned that several other methods have been proposed to avoid the singularities in the likelihood function and then the numerical failure of the optimization procedures. In this context, we can distinguish three main approaches:
- 1.
to control the condition number;
- 2.
to control the determinant;
- 3.
to add a penalty to the likelihood.
Concluding remarks
In this paper we have given theoretical results about EM algorithms for mixtures of multivariate normal distributions which preserve the usual monotone property while implementing the suitable constraints on the eigenvalues of the covariance matrices. These constraints lead to a likelihood function with no singularities and present at least a smaller numbers of local maxima than the unconstrained version. We have also shown that the proposed constraints protect the covariance matrices from ill
Acknowledgements
The authors would like to thank the associate editor and the referee for their interesting comments and suggestions which considerably improved an earlier version of the present paper.
References (18)
- et al.
Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with EM
Statist. Probab. Lett.
(2003) - et al.
Gaussian parsimonious clustering models
Pattern Recognition
(1995) Iterative Solution Methods
(1996)- et al.
Model-based Gaussian and non-Gaussian clustering
Biometrics
(1993) - Biernacki, C., 2004a. Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures for grouped data...
- Biernacki, C., 2004b. An asymptotic upper bound of the likelihood to prevent Gaussian mixtures from degenerating....
- et al.
Numerical Analysis
(1985) - et al.
Penalized maximum likelihood estimator for normal mixtures
Scand. J. Statist.
(2003) - et al.
Model-based clustering, discriminant analysis and density estimation
J. Amer. Statist. Assoc.
(2002)
Cited by (70)
Improved Inference of Gaussian Mixture Copula Model for Clustering and Reproducibility Analysis using Automatic Differentiation
2022, Econometrics and StatisticsCitation Excerpt :Identifying the possible causes of degeneracy in the likelihood has practical significance in detecting spurious solutions, which are local maximizers of the likelihood function but lack real-life interpretability and hence do not provide a good clustering of the data (McLachlan and Peel, 2000). Constraining the parameter space has been the primary approach to avoid spurious solutions in GMM (Hathaway, 1986; Ingrassia, 2004; Ingrassia and Rocci, 2007; Chen and Tan, 2009; Ingrassia and Rocci, 2011). With respect to GMCM, the relative occurrence of spurious solutions for different inference techniques, initialization strategies, parameterizations, and with increasing dimensionality can be investigated in the future.
A Likelihood Ratio Test of a Homoscedastic Multivariate Normal Mixture Against a Heteroscedastic Multivariate Normal Mixture
2021, Econometrics and StatisticsAddressing overfitting and underfitting in Gaussian model-based clustering
2018, Computational Statistics and Data AnalysisRobust, fuzzy, and parsimonious clustering, based on mixtures of Factor Analyzers
2018, International Journal of Approximate ReasoningCitation Excerpt :On the other hand, a further issue inherent to the choice of Gaussian Mixtures (GM) is originated by the unboundedness of the likelihood, turning the maximization of the objective function into an ill-posed problem. Therefore, following a wide literature stream, based on [24,28,16,22], we will adopt a constrained estimation of the component covariances, to avoid singularities and to reduce the appearance of spurious solutions. In particular, the joint use of trimming and constrained estimation for mixtures of factor analyzers (MFA) have previously been studied for non fuzzy clustering in [19].
A globally convergent algorithm for lasso-penalized mixture of linear regression models
2018, Computational Statistics and Data AnalysisClusterwise linear regression modeling with soft scale constraints
2017, International Journal of Approximate ReasoningCitation Excerpt :The resulting constrained EM is monotone as Equation (17) is a maximum of the likelihood of Equation (3) subject to the constraints (11). The same rule has been used, among others, in Ingrassia and Rocci [13], and Won et al. [29]. In Won et al. [29] for instance, in a context of maximum likelihood estimation of the covariance matrix under a set of constraints, Equation (17) is the optimization step for each eigenvalue of the covariance matrix.