Elsevier

Ecological Informatics

Volume 2, Issue 2, 1 June 2007, Pages 121-127
Ecological Informatics

Measuring information-based complexity across scales using cluster analysis

https://doi.org/10.1016/j.ecoinf.2007.03.011Get rights and content

Abstract

Scaling of ecological data can present a challenge firstly because of the large amount of information contained in an ecological data set, and secondly because of the problem of fitting data to models that we want to use to capture structure. We present a measure of similarity between data collected at several scales using the same set of attributes. The measure is based on the concept of Kolmogorov complexity and implemented through minimal message length estimates of information content and cluster analysis (the models). The similarity represents common patterns across scales, within the model class. We thus provide a novel solution to the problem of simultaneously considering data structure, model fit and scale. The methods are illustrated in application to an ecological data set.

Introduction

Scale is a concept central to all ecological studies, whether relating to space or time. Sayama et al. (2003) demonstrated that there are powerful linkages between scales, contradicting the erroneous, though commonly held, assumption that it is possible to neatly partition evolutionary effects at different spatial scales, to study a molecule, individual, population, metapopulation, species or ecosystem. It is these dynamic linkages among the levels, rather than the number of levels themselves, that should probably be the focus of attention. Hogeweg (2002) argues that ‘processes do not, in biotic systems, operate in isolation and the existence of entanglement at different time and space scales does not need explanation, being there by default’. Ignoring it by segregating time and space scales is simply a modelling artefact.

There are several ways in which scale appears as a feature in ecological studies. An omnipresent problem is the modifiable areal unit problem (MUAP; see Openshaw, 1984, Fotheringham and Wong, 1991, Nakaya, 2000, Brunsdon, 2002, Holt et al., 1996, Jelinski and Wu, 1996). Ecological units do not come in convenient packets and the size, shape and distribution of samples will all have effects on any study; this aspect of scale has already received considerable research (Brunsdon, 2002, Pavlov et al., 2001; see also methods developed by Juhász-Nagy and Podani, 1983). The effects of scale can, however, be mitigated by employing fuzzy concepts. This allows any individual sample to partake of several component structures and leads to consistent estimates of cluster parameters. Bar-Yam (2002) proposed that agglomerative clustering indicates the mechanism by which information is lost as the level of uncertainty increases across scales, but a quantitative measure will depend on the similarity coefficient and the particular algorithm used for clustering. Pavlov et al. (2001) and Puzicha and Buhman (1998) use similar ideas to obtain segmentation of images based on texture variation and using fractal concepts; specifically Pavlov et al. (2001) suggest using wavelet decompositions. The number and distribution of samples also links to the part-whole problem (cf. Szabo, 1996) and also to the relationship between habitat heterogeneity and spontaneous pattern production (Sayama et al., 2003).

Here we shall consider a different problem that concerns the estimation of common structure between levels. We ask, does common structure exist and, if so, how strong is it? And how does it decay as differences in scale increase? Some argue that scale invariance, or the presence of similar structure across different spatial or temporal scales, should be expected for complex systems (e.g., Brown et al., 2002). However, Wolpert and Macready (2000) recently put forward another view that over different space and time scales, the patterns exhibited by such a complex system should vary greatly, and in ways that are unexpected given the patterns on the other scales. The degree of dissimilarity plotted against scale would therefore provide a profile, which can be used as a system descriptor, and compared with other system profiles irrespective of the subject matter. Binder and Plazas (2001) recommend a similar procedure. This obviously requires a suitable measure of dissimilarity or similarity between data at two or more scales. Several authors have suggested ordination methods for multiscale analyses (Noy-Meir and Anderson, 1971, Borcard and Legendre, 2002); however, these do not in general provide a similarity measure between scales. Another approach makes use of fractal and multifractal analysis (e.g., based on Rényi's generalized entropy functions; Borda-de-Agua et al., 2002); however, again the degree of self-similarity cannot easily be determined. In this paper, we present a clustering approach to determine similarity between scales. The problems caused MUAP can be overcome by using fuzzy clustering. This allows us to identify common structure at different scales using the minimum message length principle. The method was applied to ecological data to test its efficacy at detecting changes in community structure in terms of the composition and relative abundances of species in the community.

Section snippets

A minimum message length similarity measure

Dale (2002); (see also Dale and Anand, 2004) have proposed using the minimum message length (MML) principle to estimate the Kolmogorov complexity as a sum of two components: model (structure) description and model fit. Kolmogorov complexity is a measure of the difficulty of description of a pattern or algorithm (Li and Vitányi, 1997); however, the measure has not been used very often for ecological informatics (but see Anand and Orlóci, 1996, Anand and Orlóci, 2000). In the present work, it is

Data and methods

The data were modified as follows in order to examine the changes in community structure, in terms of the composition and relative abundance of species, at different scales: The primary data consist of records of the cover abundance of 119 species of understorey plants. These were collected from line transects from 6 sites located along a historic pollution gradient (Anand et al., 2003, Tucker and Anand, 2003, Desrochers and Anand, 2005). This gradient reflects decreasing historic sulphur

Results

The results from the independent analysis of the several scales are shown in Table 1a. The number of classes and the associated n-class MML show a close relationship with the size of the population employed and at all scales the clustering provides a markedly better n-class result compared with that for a single class; however, the MML per thing values are not as closely related. Turning to the pairwise analyses (Table 1b), we obtain similar results except that the MML per individual values are

Discussion

We introduce a new measure for cross-scale analysis of ecological data and the structures it defines. On the basis of a single analysis, it is not possible to decide if the results are an inherent feature of ecological systems. There is certainly some common structure as well as idiosyncratic variation, and the methods used here can separate these components: cross-scale similarity measures based on Kolmogorov complexity provide needed information. The exception is the Bush analysis where we

Acknowledgments

MA acknowledges funding from the Natural Sciences and Engineering Research Council of Canada, the Canada Research Chairs Program and the Ontario Ministry of Science and Technology for infrastructure and salary support for RD and MD. We thank B.C. Tucker and K. Lemire for assistance with field data collection and Steve Kaufman for technical assistance and comments on a previous version of the manuscript. An anonymous reviewer provided helpful comments.

References (47)

  • Y. Agusta et al.

    Unsupervised learning of correlated multivariate Gaussian mixture models using MML

  • Y. Agusta et al.

    Unsupervised learning of gamma mixture models using minimum message length

  • Y. Bar-Yam

    Sum rule for multiscale representations of kinematically described systems

    Advances in Complex Systems

    (2002)
  • P.M. Binder et al.

    Multiscale analysis of complex systems

    Physics Review E

    (2001)
  • L. Borda-de-Agua et al.

    Species-area curves, diversity indices, and species abundance distributions: a multifractal analysis

    American Naturalist

    (2002)
  • J.H. Brown et al.

    The fractal nature of nature: power laws, ecological complexity and biodiversity

    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences

    (2002)
  • C. Brunsdon

    A Bayesian perspective on the modifiable areal unit problem using data augmentation

  • M.B. Dale

    Models, measures and messages: an essay on the role for induction

    Community Ecology

    (2002)
  • M.B. Dale et al.

    Domain knowledge, evidence, complexity and convergence

    International Journal of Ecology and Environmental Sciences

    (2004)
  • M.B. Dale et al.

    Minimum message length clustering: an explication and some applications to vegetation data

    Community Ecology

    (2001)
  • R.E. Desrochers et al.

    Quantifying the components of biocomplexity along ecological perturbation gradients

    Biodiversity and Conservation

    (2005)
  • T. Edgoose et al.

    MML Markov classification of sequential data

    Statistics and Computing

    (1999)
  • A.S. Fotheringham et al.

    The modifiable areal unit problem in statistical analysis

    Environment and Planning A

    (1991)
  • Cited by (8)

    • Geospatial analysis of hypospadias and cryptorchidism prevalence rates based on postal code in a Canadian province with stable population

      2023, Journal of Pediatric Urology
      Citation Excerpt :

      The discrepancies between these studies are likely a result of using more granular administrative boundaries for analysis in the present study. Studies on the use of GIS in population health research have identified that using finer spatial units such as FSAs can help reduce statistical bias associated with aggregating data to large units such as counties [18]. As such, this study is a more accurate representation of the geospatial clustering of these anomalies in Nova Scotia than previously reported [9].

    View all citing articles on Scopus
    View full text