Clustering of spatial point patterns

https://doi.org/10.1016/j.csda.2004.10.013Get rights and content

Abstract

Spatial point patterns arise as the natural sampling information in many problems. An ophthalmologic problem gave rise to the problem of detecting clusters of point patterns. A set of human corneal endothelium images is given. Each image is described by using a point pattern, the cell centroids. The main problem is to find groups of images corresponding with groups of spatial point patterns. This is interesting from a descriptive point of view and for clinical purposes. A new image can be compared with prototypes of each group and finally evaluated by the physician. Usual descriptors of spatial point patterns such as the empty-space function, the nearest distribution function or Ripley's K-function, are used to define dissimilarity measures. Moreover, the relationship between some estimation problems in spatial point processes and survival analysis is used to define dissimilarity measures between point patterns. All the proposed dissimilarities and the cluster procedures are evaluated in a simulation study. Finally, a detailed analysis of the images of corneal endothelia is provided.

Introduction

This paper is concerned with clustering problems of point patterns. Note that it is not intended to find groups in a given spatial point pattern. The objects to group are point patterns defined over possibly different sampling windows. It is an unusual problem in the statistical literature, but it arises in a wide range of applications.

Our motivating problem is concerned with the analysis of digital images of human corneal endothelia. The deepest part of the human cornea is a single layer of 400,000500,000 cells called the corneal endothelium. Cells are 4–6μm in height and 20μm in width, and their posterior surfaces are predominantly hexagonal when viewed under specular microscopy. This technique is used to study “in vivo” the size, shape and number of endothelial cells. The light of a slit lamp is directly reflected from the cornea, and the microscope is focused on this area so that the light that is reflected is viewed through the microscope. It is this specular reflection which enables the observer to see and photograph the images of the endothelial cells (Tasman and Jaeger, 1995).

The endothelial cell population decreases with age or following stressful situations such as cataract surgery, corneal transplantation and the implantation of intra-ocular lenses. When endothelial loss occurs through aging or trauma, the endothelial response is an enlargement and sliding of the existing cells to cover the area previously occupied by the lost cells. As a result of the spreading of the cells, their diameters are twice their normal sizes and cells lose their hexagonal appearance. Fig. 1 displays an example of a human corneal endothelium. The physician has to evaluate the corneal endothelium status using this image. Traditionally, the usual analysis uses three numerical descriptors, density, hexagonality and coefficient of variation. The density is defined as the mean number of cells per unit area. The hexagonality is the percentage of cells with six neighbors. Two cells are neighbors if they touch each other. Finally, the coefficient of variation of the cell areas is used. The evaluation of this image is made using just these three values. A healthy corneal endothelium is associated with high densities and hexagonalities and low coefficients of variation. Normative data regarding endothelial cell density and morphology have been largely recorded in the medical literature (Sturrock et al., 1978, Blatt et al., 1979, Lester et al., 1981, Matsuda et al., 1985, Yee et al., 1985). In addition to age, differences between different populations (Matsuda et al., 1985) have been reported. There are no commonly accepted (and used) normative limits for these parameters.

In our opinion, the classical quantities are too poor a description of these kinds of images in order to detect lesions. Our previous papers (Díaz et al., 2001, Domingo et al., 2002) are concerned with this point and different functions associated to the image have been proposed. Our basic idea was to associate two point patterns to a given image. First the centroids of the different cells and, second, the triple points (those points where three different cells meet them). A bivariate point pattern (two sets of points jointly considered) is associated to the image. This point pattern can be described using functions proposed in the corresponding literature. Finally, the proposed descriptors for a given image are compared with these descriptors observed in a set of “normal” images observed from similar aged people i.e., similar aged controls. The comparison is performed by using a method that can be found in Diggle (2003), pp. 12–14, i.e., a case-control comparison is performed.

A fundamental question has to be considered: which are the controls here? Here, it is important to note that the subjective evaluation of the images jointly with the classical descriptors is not enough. In our experience, some images evaluated as normal (good classical descriptors, no known pathology and a positive evaluation of the image by the physician) by a first ophthalmologist were rejected as controls by a second one. If a bivariate point pattern is associated with the original image, it seems natural to study the possible groups of similar spatial point patterns. Then, the ophthalmologist can examine these groups and reach conclusions about them. Of course, the final set of controls has to be chosen by the ophthalmologist.

Our primary motivation was to find groups of similar point patterns. However, we think that many other applications can appear within image processing. If the original image is described by a (univariate or multivariate) point pattern, the similarity between images can be reformulated as a similarity between the corresponding point patterns.

Section 2 gives a brief summary of point process theory. The dissimilarities proposed and the clustering procedures used are provided in Section 3. Section 4 contains a simulation study on the different combinations between dissimilarities and clustering procedures. Section 5 presents the application to the analysis of images of corneal endothelia. The paper finishes, in Section 6, with some conclusions and possible further developments.

Section snippets

Describing a point pattern

This section contains a brief summary of the most commonly used numerical descriptors of a point pattern. Some basic concepts and notation are presented.

From a probabilistic point of view, a point pattern is a realization of a point process. A point process is a probabilistic model almost surely producing locally finite sets of points, i.e., for any Borel bounded set there is a finite number of points (with probability one). Good standard references on point process theory are (Cressie, 1993,

Different approaches for clustering of point patterns

This is the methodological section of the paper. From now on, the m different point patterns will be denoted by s(i) with i=1,,m, where the i-th point pattern s(i) will be s1(i),,sni(i). The sampling window where the point pattern s(i) is observed will be denoted by W(i). Notice that the different point patterns are observed in different regions or sampling windows. This point is important in order to compare the different point patterns. As is well known in the context of statistical

Simulation study

In this section a simulation study has been carried out in order to compare the performance of the proposed dissimilarities. Several point patterns have been simulated with different parameter values. For each simulation, a cluster analysis has been performed by using the different dissimilarities explained above with both cluster procedures.

Clustering of corneal endothelia

Fig. 1 (a) displays an image obtained by using a specular microscope. The procedure to obtain it is very fast, bloodless and can be followed by a non specialist. The professional has to deal with a lot of similar images and a clear and fast conclusion about the corneal endothelium status is not so easy to reach. He/she has a lot of images but no clear procedure to manage them with: to retrieve similar images from an image database, to classify the image within a class from a well-defined set of

Conclusions and further developments

This paper is concerned with the clustering of spatial point patterns and was motivated by a clinical application. We considered a sample of images of human corneal endothelia. A bivariate point pattern is associated with each image, the centroids and triple points of the different cells. The point pattern considered as a realization of a point process is described by means of different functions and two types of distances: the point-to-event distance or empty-space distance (the distance from

Acknowledgements

This paper has been supported by Grants CICYT BSA2001-0803-C02-02 and GV04B/32 (I. Epifanio and A. Simó), TIC2002-03494 (V. Zapater and G. Ayala) and CTIDIA-2002-133, RGY 40/2003, GV04A/177, Grupos04-08 (G. Ayala). The authors would like to thank both reviewers for their very constructive suggestions which led to an improvement in this paper.

References (18)

  • A. Baddeley et al.

    Kaplan-Meier estimators of distance distributions for spatial point processes

    Ann. Statist.

    (1997)
  • H. Blatt et al.

    Endothelial cell density in relation to morphology

    Invest. Ophthalmol. Vis. Sci.

    (1979)
  • N.A. Cressie

    Statistics for Spatial Data

    (1993)
  • M. Díaz et al.

    Testing abnormality in the spatial arrangement of cells in the corneal endothelium by using spatial point processes

    Statist. Med.

    (2001)
  • P. Diggle

    Statistical Analysis of Spatial Point Patterns

    (2003)
  • J. Domingo et al.

    Morphometric analysis of human corneal endothelium by means of spatial point patterns

    Int. J. Pattern Recognition Artif. Intell.

    (2002)
  • T.R. Fleming et al.

    A class of hypothesis tests for one and two sample censored survival data

    Comm. Statist. Theory Methods A

    (1981)
  • L. Kaufman et al.

    Finding Groups in Dataan Introduction to Cluster Analysis

    (1990)
  • E.T. Lee

    Statistical Methods for Survival Data Analysis

    (1992)
There are more references available in the full text version of this article.

Cited by (16)

  • Mining complex spatial patterns:Issues and techniques

    2014, Journal of Information and Knowledge Management
  • Supervised Classification via Neural Networks for Replicated Point Patterns

    2023, Studies in Classification, Data Analysis, and Knowledge Organization
View all citing articles on Scopus
View full text