Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints
Section snippets
The problem
The EM algorithm is a well-known and largely studied general purpose method for maximum likelihood estimation in incomplete data problems, see e.g. Dempster et al. (1977) and McLachlan and Krishnan (2008). For a given i.i.d. random sample of size drawn from the density , where and the parameter assumes values in a subset of a suitable Euclidean space, the EM algorithm generates a sequence of estimates , where denotes the initial guess and for ,
Theoretical results
In this section we extend previous results due to Biernacki and Chretien (2003) to the multivariate case.
Let be a subset of with elements and assume . Let us denote by () a degenerate component of the mixture (1), and set the vector where is defined in (2). For a degenerate component the Euclidean norm is small.
Furthermore, we shall consider the following assumption about the eigenvalues and the eigenvectors of the covariance
A dynamic constraint on the eigenvalues
The last result presented in the previous section states that if the EM algorithm fits a degenerate component, say , then the smallest eigenvalue of tends to zero at an exponential rate. This suggests that during the EM iterations the eigenvalues may vary very rapidly. Triggering such a behavior is very dangerous when the current estimates of the parameters are far from the optimal solution, like in the first iterations of the algorithm. Thus, we conjecture that such bad behavior should
Numerical experiments
In this section we present both numerical experiments on simulated and real data in order to evaluate and compare the performances of the proposed dynamic constraint under different settings. In particular, we consider the following six algorithms:
- U
Unconstrained
Ordinary EM. One random starting point.
- U2
Unconstrained
Ordinary EM. Two random starting points, the solution giving the highest likelihood value is chosen.
- LCS
Lower dynamically constrained, strong
Constrained EM algorithm with and
Concluding remarks
In this paper we have presented two main issues. The first part has been devoted to the extension to the multivariate case of some results about the convergence of the EM algorithm proposed in Biernacki and Chretien (2003). In particular, our main result concerns the generalization of Theorem 2 in that paper, and we showed that near degeneracy, the smallest eigenvalue of the degenerate component tends to zero at an exponential rate. Based on this result, in the second part of the paper we have
Acknowledgements
The authors sincerely thank the Associate editor and the anonymous referees for their very helpful comments and suggestions.
References (13)
- et al.
Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with the EM
Statistics & Probability Letters
(2003) - et al.
Inference for multivariate normal mixtures
Journal of the Multivariate Analysis
(2009) - et al.
A constrained monotone EM algorithm for finite mixture of multivariate Gaussians
Computational Statistics & Data Analysis
(2007) - et al.
A computational strategy for doubly smoothed MLE exemplified in the normal mixture model
Computational Statistics & Data Analysis
(2010) - Biernacki, C., 2004. An asymptotic upper bound of the likelihood to prevent Gaussian mixtures from degenerating,...
Estimating the components of a mixture of two normal distributions
Biometrika
(1969)
Cited by (33)
Multivariate cluster-weighted models based on seemingly unrelated linear regression
2022, Computational Statistics and Data AnalysisImproved Inference of Gaussian Mixture Copula Model for Clustering and Reproducibility Analysis using Automatic Differentiation
2022, Econometrics and StatisticsCitation Excerpt :Degeneracy of maximum likelihood in Gaussian mixture models have been well studied. The likelihood of GMM with unrestricted covariance matrices is unbounded (Day, 1969; Ingrassia, 2004; Chen and Tan, 2009; Ingrassia and Rocci, 2011). This leads to degeneracy when the component covariance matrices become singular.
Addressing overfitting and underfitting in Gaussian model-based clustering
2018, Computational Statistics and Data AnalysisA globally convergent algorithm for lasso-penalized mixture of linear regression models
2018, Computational Statistics and Data AnalysisA general hidden state random walk model for animal movement
2017, Computational Statistics and Data Analysis