Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints

doi:10.1016/j.csda.2010.10.026

Computational Statistics & Data Analysis

Volume 55, Issue 4, 1 April 2011, Pages 1715-1725

https://doi.org/10.1016/j.csda.2010.10.026 Get rights and content

Abstract

EM algorithms for multivariate normal mixture decomposition have been recently proposed in order to maximize the likelihood function in a constrained parameter space having no singularities and a reduced number of spurious local maxima. However, such approaches require some a priori information about the eigenvalues of the covariance matrices. The behavior of the EM algorithm near a degenerated solution is investigated. The obtained theoretical results would suggest a new kind of constraint based on the dissimilarity between two consecutive updates of the eigenvalues of each covariance matrix. The performances of such a “dynamic” constraint are evaluated on the grounds of some numerical experiments.

Section snippets

The problem

The EM algorithm is a well-known and largely studied general purpose method for maximum likelihood estimation in incomplete data problems, see e.g. Dempster et al. (1977) and McLachlan and Krishnan (2008). For a given i.i.d. random sample ${x_{n}}_{n = 1, \dots, N}$ of size $N$ drawn from the density $f (x; ψ)$ , where $x \in R^{q}$ and the parameter $ψ$ assumes values in a subset $Ψ$ of a suitable Euclidean space, the EM algorithm generates a sequence of estimates ${ψ^{(m)}}_{m}$ , where $ψ^{(0)}$ denotes the initial guess and $ψ^{(m)} \in Ψ$ for $m \in N$ ,

Theoretical results

In this section we extend previous results due to Biernacki and Chretien (2003) to the multivariate case.

Let $D$ be a subset of ${1, 2, \dots, N}$ with $d$ elements and assume $d \leq q$ . Let us denote by $j_{0}$ ( $1 \leq j_{0} \leq k$ ) a degenerate component of the mixture (1), and set the vector $v_{0} = {[{1 / ϕ_{n j_{0}}}_{n \in D}, {ϕ_{n j_{0}}}_{n \notin D}]}^{'},$ where $ϕ_{n j_{0}}$ is defined in (2). For a degenerate component the Euclidean norm $‖ v_{0} ‖$ is small.

Furthermore, we shall consider the following assumption about the eigenvalues and the eigenvectors of the covariance

A dynamic constraint on the eigenvalues

The last result presented in the previous section states that if the EM algorithm fits a degenerate component, say $j_{0}$ , then the smallest eigenvalue of $Σ_{j_{0}}$ tends to zero at an exponential rate. This suggests that during the EM iterations the eigenvalues may vary very rapidly. Triggering such a behavior is very dangerous when the current estimates of the parameters are far from the optimal solution, like in the first iterations of the algorithm. Thus, we conjecture that such bad behavior should

Numerical experiments

In this section we present both numerical experiments on simulated and real data in order to evaluate and compare the performances of the proposed dynamic constraint under different settings. In particular, we consider the following six algorithms:

Unconstrained

Ordinary EM. One random starting point.

Unconstrained

Ordinary EM. Two random starting points, the solution giving the highest likelihood value is chosen.

LCS

Lower dynamically constrained, strong

Constrained EM algorithm with $ϑ_{a} = 1.111$ and

Concluding remarks

In this paper we have presented two main issues. The first part has been devoted to the extension to the multivariate case of some results about the convergence of the EM algorithm proposed in Biernacki and Chretien (2003). In particular, our main result concerns the generalization of Theorem 2 in that paper, and we showed that near degeneracy, the smallest eigenvalue of the degenerate component tends to zero at an exponential rate. Based on this result, in the second part of the paper we have

Acknowledgements

The authors sincerely thank the Associate editor and the anonymous referees for their very helpful comments and suggestions.

References (13)

C. Biernacki et al.
Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with the EM
Statistics & Probability Letters
(2003)
J. Chen et al.
Inference for multivariate normal mixtures
Journal of the Multivariate Analysis
(2009)
S. Ingrassia et al.
A constrained monotone EM algorithm for finite mixture of multivariate Gaussians
Computational Statistics & Data Analysis
(2007)
B. Seo et al.
A computational strategy for doubly smoothed MLE exemplified in the normal mixture model
Computational Statistics & Data Analysis
(2010)
Biernacki, C., 2004. An asymptotic upper bound of the likelihood to prevent Gaussian mixtures from degenerating,...
N.E. Day
Estimating the components of a mixture of two normal distributions
Biometrika
(1969)

There are more references available in the full text version of this article.

Cited by (33)

Multivariate cluster-weighted models based on seemingly unrelated linear regression
2022, Computational Statistics and Data Analysis
A class of cluster-weighted models for a vector of continuous random variables is proposed. This class provides an extension to cluster-weighted modelling of multivariate and correlated responses that let the researcher free to use a different vector of covariates for each response. The class also includes parsimonious models obtained by imposing suitable constraints on the component-covariance matrices of either the responses or the covariates. Conditions for model identifiability are illustrated and discussed. Maximum likelihood estimation is carried out by means of an expectation-conditional maximisation algorithm. The effectiveness and usefulness of the proposed models are shown through the analysis of simulated and real datasets.
Improved Inference of Gaussian Mixture Copula Model for Clustering and Reproducibility Analysis using Automatic Differentiation
2022, Econometrics and Statistics
Citation Excerpt :
Degeneracy of maximum likelihood in Gaussian mixture models have been well studied. The likelihood of GMM with unrestricted covariance matrices is unbounded (Day, 1969; Ingrassia, 2004; Chen and Tan, 2009; Ingrassia and Rocci, 2011). This leads to degeneracy when the component covariance matrices become singular.
Copulas provide a modular parameterization of multivariate distributions that decouples the modeling of marginals from the dependencies between them. The Gaussian Mixture Copula Model (GMCM) is a highly flexible copula that can model many kinds of multi-modal dependencies, as well as asymmetric and tail dependencies. They have been effectively used in clustering non-Gaussian data and in Reproducibility Analysis, a meta-analysis method designed to verify the reliability and consistency of multiple high-throughput genomic experiments. Parameter estimation for GMCM is challenging due to its intractable likelihood. The best previous methods maximize a proxy-likelihood through a Pseudo Expectation Maximization (PEM) algorithm. No guarantees of convergence or convergence to the correct parameters are provided by those methods. Using Automatic Differentiation (AD), a method, called AD-GMCM, is developed that can maximize the exact GMCM likelihood. Simulation studies and experiments on real data show that AD-GMCM finds more accurate parameter estimates than PEM and yields better performance in clustering and reproducibility analysis. The advantages of an AD-based approach to address problems related to monotonic increase of likelihood and parameter identifiability in GMCM are discussed. The two well-known cases of degeneracy of maximum likelihood in GMM that can lead to spurious clustering solutions are analyzed for GMCM as well. The analysis reveals that, unlike GMM, GMCM is not affected in one of the cases.
Addressing overfitting and underfitting in Gaussian model-based clustering
2018, Computational Statistics and Data Analysis
The expectation–maximization (EM) algorithm is a common approach for parameter estimation in the context of cluster analysis using finite mixture models. This approach suffers from the well-known issue of convergence to local maxima, but also the less obvious problem of overfitting. These combined, and competing, concerns are illustrated through simulation and then addressed by introducing an algorithm that augments the traditional EM with the nonparametric bootstrap. Further simulations and applications to real data lend support for the usage of this bootstrap augmented EM-style algorithm to avoid both overfitting and local maxima.
A globally convergent algorithm for lasso-penalized mixture of linear regression models
2018, Computational Statistics and Data Analysis
Variable selection is an old and pervasive problem in regression analysis. One solution is to impose a lasso penalty to shrink parameter estimates toward zero and perform continuous model selection. The lasso-penalized mixture of linear regressions model (L-MLR) is a class of regularization methods for the model selection problem in the fixed number of variables setting. A new algorithm is proposed for the maximum penalized-likelihood estimation of the L-MLR model. This algorithm is constructed via the minorization–maximization algorithm paradigm. Such a construction allows for coordinate-wise updates of the parameter components, and produces globally convergent sequences of estimates that generate monotonic sequences of penalized log-likelihood values. These three features are missing in the previously presented approximate expectation–maximization algorithms. The previous difficulty in producing a globally convergent algorithm for the maximum penalized-likelihood estimation of the L-MLR model is due to the intractability of finding exact updates for the mixture model mixing proportions in the maximization-step. This issue is resolved by showing that it can be converted into a simple numerical root finding problem that is proven to have a unique solution. The method is tested in simulation and with an application to Major League Baseball salary data from the 1990s and the present day, where the concept of whether player salaries are associated with batting performance is investigated.
A general hidden state random walk model for animal movement
2017, Computational Statistics and Data Analysis
A general hidden state random walk model is proposed to describe the movement of an animal that takes into account movement taxis with respect to features of the environment. A circular–linear process models the direction and distance between two consecutive localizations of the animal. A hidden process structure accounts for the animal’s change in movement behavior. The originality of the proposed approach is that several environmental targets can be included in the directional model. An EM algorithm that enables prediction of the hidden states of the process is devised to fit this model. An application to the analysis of the movement of caribou in Canada’s boreal forest is presented.
Mixture-Models: a one-stop Python Library for Model-based Clustering using various Mixture Models
2024, arXiv

View all citing articles on Scopus

View full text

Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints

Abstract

Section snippets

The problem

Theoretical results

A dynamic constraint on the eigenvalues

Numerical experiments

Concluding remarks

Acknowledgements

Statistics & Probability Letters

Journal of the Multivariate Analysis

Computational Statistics & Data Analysis

Computational Statistics & Data Analysis

Estimating the components of a mixture of two normal distributions

Biometrika