Clustering with mixtures of log-concave distributions☆
Introduction
Clustering concerns the assignment of each of n observations to one of k groups. One popular way to approach this task is via a finite mixture model, see e.g. McLachlan and Peel (2000): the data are assumed i.i.d. with a density that admits a representationwhere the mixture proportions are nonnegative and sum to unity, and the component distribution models the conditional density of the data in the mth group. Typically one assumes a parametric formulation for the component distributions, such as a normal model, see e.g. Fraley and Raftery (2002). Then the fitting of the mixture model (1) as well as the assignment of the data to the k groups has an elegant solution in terms of the EM algorithm, see e.g. McLachlan and Krishnan (1997). The EM algorithm iteratively assigns the data based on the current maximum likelihood estimates of the component distributions, and then updates those estimates based on these assignments.
One key advantage of using a mixture model for clustering is that it not only provides an assignment of the data to the groups, but also a measure of uncertainty for the assignment of each observation via the posterior probabilities of component membership (see (2) in Section 3 below).
Problems arise when the parametric model is misspecified. Then the accuracy of the clustering may deteriorate, and the measure of uncertainty may be considerably off. In addition, each parametric model requires a different implementation of the EM algorithm based on attendant theoretical derivations. For these reasons, it would be helpful to have an EM-type clustering algorithm with nonparametric component distributions. Such a methodology would provide a universal software implementation with flexible component distributions. Indeed, nonparametric extensions of parametric models have proved quite successful in discriminant analysis, the supervised counterpart to the problem under consideration here, see e.g. Hastie and Tibshirani (1996) and Lin and Jeon (2003). In contrast, there seems to be little existing work on mixture models with nonparametric components for clustering, presumably because it is not obvious how to develop such methodology in the unsupervised case. Hunter et al. (2006) give methodology to estimate a location mixture of symmetric univariate components.
In this paper we will model each component as a log-concave density, i.e. as a density whose logarithm is a concave function. This model has the advantage that it includes most common parametric distributions (the prime example being the normal density, whose logarithm is a quadratic), and it is flexible enough to allow e.g. skewness. See e.g. Walther, 2001, Walther, 2002 for a further discussion of this model. Moreover, it turns out that the MLE of a log-concave density exists uniquely, so there is a hope that one can mimic the EM-type clustering algorithm that works so successfully in the parametric context. During the preparation of this paper we became aware of the work of Eilers and Borgdorff (2006), who use penalized smoothing to move a nonparametric estimate of the component distribution ‘towards’ a log-concave form. This approach requires the choice of a tuning parameter. In contrast, we use the fact that the log-concave MLE exists uniquely and gives an algorithm for its computation. Thus, our approach is free of tuning parameters. In addition, we show how to generalize this approach to the multivariate situation where dependence is present in each component.
Section snippets
The univariate model and the MLE
Our model for the univariate case posits that each component in (1) is a log-concave density, i.e. is a concave function. An EM-type algorithm requires the computation of the MLE of each . Theory and algorithms for this task have been developed in Walther (2002) and in Rufibach (2006). We briefly summarize the relevant results.
Given data i.i.d. from f, the MLE of f under the restriction that f be log-concave exists uniquely and has support . is a
Clustering with an EM-type algorithm
Our methodology is as follows: first we run the usual EM algorithm for a Gaussian mixture to convergence. We then use the outcome of this clustering as starting value for five more iterations of the EM algorithm, where in the M-step we now compute for each component the log-concave MLE instead of the Gaussian MLE.
The motivation for this approach is that the Gaussian mixture model should provide a clustering that is roughly correct, and that the subsequent log-concave MLE provides a correction
Comparison with parametric EM
We compared the result of our methodology with that obtained with the EM algorithm for the Gaussian mixture model.
In our first example we drew 500 observations from a gamma(2,1) distribution, then with probability 0.6 each observation was shifted to the right by 5. This mixture density is plotted in the left panel of Fig. 1 (dotted line). The dashed line gives the fitted model obtained by the Gaussian EM algorithm, and the solid line gives the fitted model obtained by the log-concave EM
A multivariate extension
In the multivariate set-up, we observe n i.i.d observations in . Log-concave distributions are defined in a multivariate situation just as in the univariate case, but the computation of the MLE appears to be much more complicated. For this reason, we will work with the following simpler, yet flexible model: we only require that the univariate marginal distributions be log-concave. Then we model the dependence structure with a normal copula. That is, let be multivariate
Conclusion
We have shown how the parametric EM algorithm for clustering can be extended to allow for a flexible, nonparametric class of component distributions. The advantages of this algorithm are that it is not restricted to parametric models, that it no longer requires to specify such a model for the component distributions and hence that it is not sensitive to a misspecification thereof, and that only one implementation of the algorithm is necessary. At the same time, there seems to be no noticeable
Problems for further research
We left open the question of identifiability of log-concave mixtures and the problem of selecting the number of components (clusters). Following our motivation that the log-concave MLE provides a correction to the fitted Gaussian mixture model, a reasonable suggestion would be to employ one of the criteria for selecting the number of components in the Gaussian mixture model, see e.g. Chapter 6 in McLachlan and Peel (2000). A direct way to select the number of components in a nonparametric
References (12)
- Eilers, P.H.C., Borgdorff, M.W., 2006. Non-parametric log-concave mixtures....
- et al.
Model-based clustering, discriminant analysis, and density estimation
J. Amer. Statist. Assoc.
(2002) - et al.
Discriminant analysis by Gaussian mixtures
J. Roy. Statist. Soc. Ser. B
(1996) - Hunter, D.R., Wang, S., Hettmansperger, T.P., 2006. Inference for mixtures of symmetric distributions. Ann. Statist.,...
The iterative convex minorant algorithm for nonparametric estimation
J. Comput. Graph. Statist.
(1998)- et al.
Discriminant analysis through a semi-parametric model
Biometrika
(2003)
Cited by (43)
Maximum likelihood estimation of the log-concave component in a semi-parametric mixture with a standard normal density
2024, Journal of Statistical Planning and InferenceConcentration of measure for radial distributions and consequences for statistical modeling
2019, Statistics and Probability LettersThe robust EM-type algorithms for log-concave mixtures of regression models
2017, Computational Statistics and Data AnalysisCitation Excerpt :These estimators provide more generality and flexibility without any tuning parameter. For log-concave mixture models, Chang and Walther (2007) proposed a log-concave EM-type algorithm for mixture density estimation, along with the application in clustering. Hu et al. (2016) further proposed the LCMLE, which is the maximizer of a log-likelihood type functional, and proved the existence and consistency for the LCMLE for the log-concave mixture models.
Maximum likelihood estimation of the mixture of log-concave densities
2016, Computational Statistics and Data AnalysisLocation and scale mixtures of Gaussians with flexible tail behaviour: Properties, inference and application to multivariate clustering
2015, Computational Statistics and Data AnalysisCitation Excerpt :However, to maintain tractability or identifiability in a multivariate setting, most approaches appear to restrict the type of dependence structures between the coordinates of the multidimensional variable. Typically, conditional independence (on the mixture components) is assumed in Benaglia et al. (2009a,b) while a Gaussian copula is used in Chang and Walther (2007). An alternative approach (Schmidt et al., 2006), which takes advantage of the property that Generalized Hyperbolic distributions are closed under affine-linear transformations, derives independent GH marginals but estimation of parameters appears to be restricted to density estimation, and not formally generalizable to estimation settings for a broad range of applications (e.g. clustering, regression, etc.).
- ☆
Work supported by NSF Grant DMS-0505682 and NIH Grant 5R33HL068522.