Clustering with mixtures of log-concave distributions

https://doi.org/10.1016/j.csda.2007.01.008Get rights and content

Abstract

The EM algorithm is a popular tool for clustering observations via a parametric mixture model. Two disadvantages of this approach are that its success depends on the appropriateness of the assumed parametric model, and that each model requires a different implementation of the EM algorithm based on model-specific theoretical derivations. We show how this algorithm can be extended to work with the flexible, nonparametric class of log-concave component distributions. The advantages of the resulting algorithm are: first, it is not restricted to parametric models, so it no longer requires to specify such a model and its results are no longer sensitive to a misspecification thereof. Second, only one implementation of the algorithm is necessary. Furthermore, simulation studies based on the normal mixture model show that there seems to be no noticeable performance penalty of this more general nonparametric algorithm vis-a-vis the parametric EM algorithm in the special case where the assumed parametric model is indeed correct.

Introduction

Clustering concerns the assignment of each of n observations X1,,Xn to one of k groups. One popular way to approach this task is via a finite mixture model, see e.g. McLachlan and Peel (2000): the data Xi are assumed i.i.d. with a density f(x) that admits a representationf(x)=m=1kπmfm(x),where the mixture proportions π1,,πk are nonnegative and sum to unity, and the component distribution fm models the conditional density of the data in the mth group. Typically one assumes a parametric formulation fm(x)=f(θm,x) for the component distributions, such as a normal model, see e.g. Fraley and Raftery (2002). Then the fitting of the mixture model (1) as well as the assignment of the data to the k groups has an elegant solution in terms of the EM algorithm, see e.g. McLachlan and Krishnan (1997). The EM algorithm iteratively assigns the data based on the current maximum likelihood estimates of the component distributions, and then updates those estimates π^m,θ^m based on these assignments.

One key advantage of using a mixture model for clustering is that it not only provides an assignment of the data to the k groups, but also a measure of uncertainty for the assignment of each observation via the posterior probabilities of component membership (see (2) in Section 3 below).

Problems arise when the parametric model is misspecified. Then the accuracy of the clustering may deteriorate, and the measure of uncertainty may be considerably off. In addition, each parametric model requires a different implementation of the EM algorithm based on attendant theoretical derivations. For these reasons, it would be helpful to have an EM-type clustering algorithm with nonparametric component distributions. Such a methodology would provide a universal software implementation with flexible component distributions. Indeed, nonparametric extensions of parametric models have proved quite successful in discriminant analysis, the supervised counterpart to the problem under consideration here, see e.g. Hastie and Tibshirani (1996) and Lin and Jeon (2003). In contrast, there seems to be little existing work on mixture models with nonparametric components for clustering, presumably because it is not obvious how to develop such methodology in the unsupervised case. Hunter et al. (2006) give methodology to estimate a location mixture of symmetric univariate components.

In this paper we will model each component as a log-concave density, i.e. as a density whose logarithm is a concave function. This model has the advantage that it includes most common parametric distributions (the prime example being the normal density, whose logarithm is a quadratic), and it is flexible enough to allow e.g. skewness. See e.g. Walther, 2001, Walther, 2002 for a further discussion of this model. Moreover, it turns out that the MLE of a log-concave density exists uniquely, so there is a hope that one can mimic the EM-type clustering algorithm that works so successfully in the parametric context. During the preparation of this paper we became aware of the work of Eilers and Borgdorff (2006), who use penalized smoothing to move a nonparametric estimate of the component distribution ‘towards’ a log-concave form. This approach requires the choice of a tuning parameter. In contrast, we use the fact that the log-concave MLE exists uniquely and gives an algorithm for its computation. Thus, our approach is free of tuning parameters. In addition, we show how to generalize this approach to the multivariate situation where dependence is present in each component.

Section snippets

The univariate model and the MLE

Our model for the univariate case posits that each component fm in (1) is a log-concave density, i.e. logfm(x) is a concave function. An EM-type algorithm requires the computation of the MLE of each fm. Theory and algorithms for this task have been developed in Walther (2002) and in Rufibach (2006). We briefly summarize the relevant results.

Given data X1,,Xn i.i.d. from f, the MLE f^ of f under the restriction that f be log-concave exists uniquely and has support [X(1),X(n)]. logf^ is a

Clustering with an EM-type algorithm

Our methodology is as follows: first we run the usual EM algorithm for a Gaussian mixture to convergence. We then use the outcome of this clustering as starting value for five more iterations of the EM algorithm, where in the M-step we now compute for each component the log-concave MLE instead of the Gaussian MLE.

The motivation for this approach is that the Gaussian mixture model should provide a clustering that is roughly correct, and that the subsequent log-concave MLE provides a correction

Comparison with parametric EM

We compared the result of our methodology with that obtained with the EM algorithm for the Gaussian mixture model.

In our first example we drew 500 observations from a gamma(2,1) distribution, then with probability 0.6 each observation was shifted to the right by 5. This mixture density is plotted in the left panel of Fig. 1 (dotted line). The dashed line gives the fitted model obtained by the Gaussian EM algorithm, and the solid line gives the fitted model obtained by the log-concave EM

A multivariate extension

In the multivariate set-up, we observe n i.i.d observations (Xi1,,Xid) in Rd. Log-concave distributions are defined in a multivariate situation just as in the univariate case, but the computation of the MLE appears to be much more complicated. For this reason, we will work with the following simpler, yet flexible model: we only require that the univariate marginal distributions be log-concave. Then we model the dependence structure with a normal copula. That is, let (N1,,Nd) be multivariate

Conclusion

We have shown how the parametric EM algorithm for clustering can be extended to allow for a flexible, nonparametric class of component distributions. The advantages of this algorithm are that it is not restricted to parametric models, that it no longer requires to specify such a model for the component distributions and hence that it is not sensitive to a misspecification thereof, and that only one implementation of the algorithm is necessary. At the same time, there seems to be no noticeable

Problems for further research

We left open the question of identifiability of log-concave mixtures and the problem of selecting the number of components (clusters). Following our motivation that the log-concave MLE provides a correction to the fitted Gaussian mixture model, a reasonable suggestion would be to employ one of the criteria for selecting the number of components in the Gaussian mixture model, see e.g. Chapter 6 in McLachlan and Peel (2000). A direct way to select the number of components in a nonparametric

References (12)

  • Eilers, P.H.C., Borgdorff, M.W., 2006. Non-parametric log-concave mixtures....
  • C.F. Fraley et al.

    Model-based clustering, discriminant analysis, and density estimation

    J. Amer. Statist. Assoc.

    (2002)
  • T.J. Hastie et al.

    Discriminant analysis by Gaussian mixtures

    J. Roy. Statist. Soc. Ser. B

    (1996)
  • Hunter, D.R., Wang, S., Hettmansperger, T.P., 2006. Inference for mixtures of symmetric distributions. Ann. Statist.,...
  • G. Jongbloed

    The iterative convex minorant algorithm for nonparametric estimation

    J. Comput. Graph. Statist.

    (1998)
  • Y. Lin et al.

    Discriminant analysis through a semi-parametric model

    Biometrika

    (2003)
There are more references available in the full text version of this article.

Cited by (43)

  • The robust EM-type algorithms for log-concave mixtures of regression models

    2017, Computational Statistics and Data Analysis
    Citation Excerpt :

    These estimators provide more generality and flexibility without any tuning parameter. For log-concave mixture models, Chang and Walther (2007) proposed a log-concave EM-type algorithm for mixture density estimation, along with the application in clustering. Hu et al. (2016) further proposed the LCMLE, which is the maximizer of a log-likelihood type functional, and proved the existence and consistency for the LCMLE for the log-concave mixture models.

  • Maximum likelihood estimation of the mixture of log-concave densities

    2016, Computational Statistics and Data Analysis
  • Location and scale mixtures of Gaussians with flexible tail behaviour: Properties, inference and application to multivariate clustering

    2015, Computational Statistics and Data Analysis
    Citation Excerpt :

    However, to maintain tractability or identifiability in a multivariate setting, most approaches appear to restrict the type of dependence structures between the coordinates of the multidimensional variable. Typically, conditional independence (on the mixture components) is assumed in Benaglia et al. (2009a,b) while a Gaussian copula is used in Chang and Walther (2007). An alternative approach (Schmidt et al., 2006), which takes advantage of the property that Generalized Hyperbolic distributions are closed under affine-linear transformations, derives independent GH marginals but estimation of parameters appears to be restricted to density estimation, and not formally generalizable to estimation settings for a broad range of applications (e.g. clustering, regression, etc.).

View all citing articles on Scopus

Work supported by NSF Grant DMS-0505682 and NIH Grant 5R33HL068522.

View full text