Convergence of an EM-type algorithm for spatial clustering

https://doi.org/10.1016/S0167-8655(98)00076-2Get rights and content

Abstract

Ambroise et al. (1996) have proposed a clustering algorithm that is well-suited for dealing with spatial data. This algorithm, derived from the EM algorithm (Dempster et al., 1977), has been designed for penalized likelihood estimation in situations with unobserved class labels. Some very satisfactory empirical results lead us to believe that this algorithm converges (Ambroise et al., 1996). However, this convergence has not been proven theoretically. In this paper, we present sufficient conditions and proof of the convergence. A practical application illustrates the use of this algorithm.

Introduction

Spatial clustering aims to find classes composed of objects which are both similar according to some measure and geographically close. When classical clustering algorithms (e.g., the EM algorithm for Gaussian mixture estimation) are used for partitioning spatial data, the resulting classes will often be spatially very mixed.

In geology, sociology, image analysis, and in a wide range of other fields, spatial clustering techniques are widely used for finding homogeneous zones. Satellite images are often segmented in order to determine different zones of interest (e.g. forests, cities or rivers). In this particular case, the objects are pixels described by a gray scale or color intensity. Another example consists of statistics describing the number of sick persons per geographic unity (e.g. town or country) which may be used for delimiting different zones of risk.

Several methods exist for taking spatial information into account in a clustering process:

  • Modifying existing clustering algorithms (Legendre, 1987; Lebart, 1978; Openshaw, 1977). This is done by specifying which objects are neighbors and allowing an object to be assigned to a class if and only if this class already contains a geographical neighbor. This approach has the drawback of producing classes which are necessarily geographically connected. This means that one class is bound to correspond to a single spatial region.

  • Integrating the spatial information in the data set (Berry, 1966; Jain and Farrokhnia, 1991; Oliver and Webster, 1989). One example consists of considering the geographical coordinates as new variables describing the objects; another example is the filtering techniques that extract new features from the original variables which embody the spatial information.

  • Choose a model which encompasses the spatial aspect of the data. Most of the time, this is equivalent to defining a criterion that includes spatial constraints. This approach comes mainly from image analysis where Markov random fields (Geman and Geman, 1984; Masson and Pieczinsky, 1993) are intensively used.

In a recent paper, the authors have described a clustering algorithm (Ambroise et al., 1996) related to the last approach which is able to deal with objects described by quantitative variables. The spatial distribution of the objects may be regular (e.g., pixels of a image), or irregular (e.g., towns of a given district). The algorithm estimates the parameters of a Gaussian mixture and produces a fuzzy partition made of classes which are spatially homogeneous without being “single spatial region classes”.

This paper aims to present a proof of the convergence of this algorithm for spatial clustering. Section 2introduces the Gaussian mixture model and describes the Neighborhood EM algorithm (NEM). Section 3is dedicated to the convergence proof. In Section 4, an illustrative example based on image segmentation is presented.

Section snippets

Gaussian mixture and clustering

The probabilistic approach to clustering is mainly based on Gaussian mixture models. In this framework (Celeux and Govaert, 1995), the objects to be classified are considered as a sample x=(x1,…,xN) of independent random vectors. The vectors xi are drawn from a mixture of K Gaussian distributions:f(xi|Φ)=k=1Kpkfk(xik),where the pk are the mixing proportions (for k=1,…,K,0<pk<1 and ∑kpk=1) and fk(xk) denotes the density of a Gaussian distribution with parameter θk=(μk,Σk), μk being the mean

Estimation step

The method proposed in this section to perform the E-step is inspired from the Hathaway (1986) formulation of the EM algorithm and can be also related to the work of Neal and Hinton (1993). We suggest using the fixed point method to find the classification matrix c+ which maximizes the criterion U(cq).

The necessary optimality Kuhn–Tucker conditions take the following form:Ucikc=c+=log(pkfk(xik))−1−logcik+ij=1Ncjk+vij=0∀i,k,k=1Kcik+=1∀i,where U is the Lagrangian of U(c,Φ) that takes

An application to biological images

Let us illustrate the usefulness of the NEM algorithm1 with an application to image segmentation.

Let us consider the following biological experiment: a sample of living cells is laid on a nutritive substance. After a few days new living cells appear and form a thin but visible layer around the original sample. Biologists are interested in determining the surface of

Concluding remarks

The choice of the penalizing coefficient β remains the main difficulty in applying the NEM algorithm. In the preceding example we have used our experience to determine the “optimal” β coefficient. When such a procedure is not possible, it would be useful to have an automatic estimation of this parameter. This subject still needs further research.

A particularity of the NEM algorithm consists of providing a fuzzy partition of the data. This may be interesting in some applications where region of

References (19)

There are more references available in the full text version of this article.

Cited by (63)

  • Towards justifying unsupervised stationary decisions for geostatistical modeling: Ensemble spatial and multivariate clustering with geomodeling specific clustering metrics

    2018, Computers and Geosciences
    Citation Excerpt :

    The goal is to generate classes that are spatially contiguous and have distinct multivariate properties. Two general strategies have been applied: 1) some form of neighborhood constraint to modify relatedness of distant and uncorrelated samples (Oliver and Webster, 1989; Ambroise and Govaert, 1998; Fouedjio,; Romary et al., 2015); or 2) generating a secondary dataset calculated from the original data with local autocorrelation statistics (Scrucca, 2005). Oliver and Webster (1989) justified the variogram model as a method to increase the relatedness of nearby points in clustering.

  • Model-based Poisson co-clustering for Attributed Networks

    2021, IEEE International Conference on Data Mining Workshops, ICDMW
View all citing articles on Scopus
View full text