Measuring information-based complexity across scales using cluster analysis
Introduction
Scale is a concept central to all ecological studies, whether relating to space or time. Sayama et al. (2003) demonstrated that there are powerful linkages between scales, contradicting the erroneous, though commonly held, assumption that it is possible to neatly partition evolutionary effects at different spatial scales, to study a molecule, individual, population, metapopulation, species or ecosystem. It is these dynamic linkages among the levels, rather than the number of levels themselves, that should probably be the focus of attention. Hogeweg (2002) argues that ‘processes do not, in biotic systems, operate in isolation and the existence of entanglement at different time and space scales does not need explanation, being there by default’. Ignoring it by segregating time and space scales is simply a modelling artefact.
There are several ways in which scale appears as a feature in ecological studies. An omnipresent problem is the modifiable areal unit problem (MUAP; see Openshaw, 1984, Fotheringham and Wong, 1991, Nakaya, 2000, Brunsdon, 2002, Holt et al., 1996, Jelinski and Wu, 1996). Ecological units do not come in convenient packets and the size, shape and distribution of samples will all have effects on any study; this aspect of scale has already received considerable research (Brunsdon, 2002, Pavlov et al., 2001; see also methods developed by Juhász-Nagy and Podani, 1983). The effects of scale can, however, be mitigated by employing fuzzy concepts. This allows any individual sample to partake of several component structures and leads to consistent estimates of cluster parameters. Bar-Yam (2002) proposed that agglomerative clustering indicates the mechanism by which information is lost as the level of uncertainty increases across scales, but a quantitative measure will depend on the similarity coefficient and the particular algorithm used for clustering. Pavlov et al. (2001) and Puzicha and Buhman (1998) use similar ideas to obtain segmentation of images based on texture variation and using fractal concepts; specifically Pavlov et al. (2001) suggest using wavelet decompositions. The number and distribution of samples also links to the part-whole problem (cf. Szabo, 1996) and also to the relationship between habitat heterogeneity and spontaneous pattern production (Sayama et al., 2003).
Here we shall consider a different problem that concerns the estimation of common structure between levels. We ask, does common structure exist and, if so, how strong is it? And how does it decay as differences in scale increase? Some argue that scale invariance, or the presence of similar structure across different spatial or temporal scales, should be expected for complex systems (e.g., Brown et al., 2002). However, Wolpert and Macready (2000) recently put forward another view that over different space and time scales, the patterns exhibited by such a complex system should vary greatly, and in ways that are unexpected given the patterns on the other scales. The degree of dissimilarity plotted against scale would therefore provide a profile, which can be used as a system descriptor, and compared with other system profiles irrespective of the subject matter. Binder and Plazas (2001) recommend a similar procedure. This obviously requires a suitable measure of dissimilarity or similarity between data at two or more scales. Several authors have suggested ordination methods for multiscale analyses (Noy-Meir and Anderson, 1971, Borcard and Legendre, 2002); however, these do not in general provide a similarity measure between scales. Another approach makes use of fractal and multifractal analysis (e.g., based on Rényi's generalized entropy functions; Borda-de-Agua et al., 2002); however, again the degree of self-similarity cannot easily be determined. In this paper, we present a clustering approach to determine similarity between scales. The problems caused MUAP can be overcome by using fuzzy clustering. This allows us to identify common structure at different scales using the minimum message length principle. The method was applied to ecological data to test its efficacy at detecting changes in community structure in terms of the composition and relative abundances of species in the community.
Section snippets
A minimum message length similarity measure
Dale (2002); (see also Dale and Anand, 2004) have proposed using the minimum message length (MML) principle to estimate the Kolmogorov complexity as a sum of two components: model (structure) description and model fit. Kolmogorov complexity is a measure of the difficulty of description of a pattern or algorithm (Li and Vitányi, 1997); however, the measure has not been used very often for ecological informatics (but see Anand and Orlóci, 1996, Anand and Orlóci, 2000). In the present work, it is
Data and methods
The data were modified as follows in order to examine the changes in community structure, in terms of the composition and relative abundance of species, at different scales: The primary data consist of records of the cover abundance of 119 species of understorey plants. These were collected from line transects from 6 sites located along a historic pollution gradient (Anand et al., 2003, Tucker and Anand, 2003, Desrochers and Anand, 2005). This gradient reflects decreasing historic sulphur
Results
The results from the independent analysis of the several scales are shown in Table 1a. The number of classes and the associated n-class MML show a close relationship with the size of the population employed and at all scales the clustering provides a markedly better n-class result compared with that for a single class; however, the MML per thing values are not as closely related. Turning to the pairwise analyses (Table 1b), we obtain similar results except that the MML per individual values are
Discussion
We introduce a new measure for cross-scale analysis of ecological data and the structures it defines. On the basis of a single analysis, it is not possible to decide if the results are an inherent feature of ecological systems. There is certainly some common structure as well as idiosyncratic variation, and the methods used here can separate these components: cross-scale similarity measures based on Kolmogorov complexity provide needed information. The exception is the Bush analysis where we
Acknowledgments
MA acknowledges funding from the Natural Sciences and Engineering Research Council of Canada, the Canada Research Chairs Program and the Ontario Ministry of Science and Technology for infrastructure and salary support for RD and MD. We thank B.C. Tucker and K. Lemire for assistance with field data collection and Steve Kaufman for technical assistance and comments on a previous version of the manuscript. An anonymous reviewer provided helpful comments.
References (47)
- et al.
Complexity in plant communities: the notion and quantification
Journal of Theoretical Biology
(1996) - et al.
On hierarchical partitioning of an ecological complexity function
Ecological Modelling
(2000) - et al.
Characterizing biocomplexity and soil microbial dynamics along a smelter-damaged landscape gradient
The Science of the Total Environment
(2003) - et al.
All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices
Ecological Modelling
(2002) - et al.
Markov models for incorporating temporal dependence
Acta Oecologica
(2002) Computing an organism: on the interface between informatic and dynamic processes
BioSystems
(2002)- et al.
Scaling features of texts, images and time series
Physica. A
(2001) - et al.
Spatial complexity of ecological communities: Bridging the gap between probabilistic and non-probabilistic uncertainty measures
Ecological Modelling
(2006) - et al.
MML clustering of continuous-valued data using Gaussian and t distributions
- et al.
Clustering of Gaussian and t distributions using minimum message length
Unsupervised learning of correlated multivariate Gaussian mixture models using MML
Unsupervised learning of gamma mixture models using minimum message length
Sum rule for multiscale representations of kinematically described systems
Advances in Complex Systems
Multiscale analysis of complex systems
Physics Review E
Species-area curves, diversity indices, and species abundance distributions: a multifractal analysis
American Naturalist
The fractal nature of nature: power laws, ecological complexity and biodiversity
Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences
A Bayesian perspective on the modifiable areal unit problem using data augmentation
Models, measures and messages: an essay on the role for induction
Community Ecology
Domain knowledge, evidence, complexity and convergence
International Journal of Ecology and Environmental Sciences
Minimum message length clustering: an explication and some applications to vegetation data
Community Ecology
Quantifying the components of biocomplexity along ecological perturbation gradients
Biodiversity and Conservation
MML Markov classification of sequential data
Statistics and Computing
The modifiable areal unit problem in statistical analysis
Environment and Planning A
Cited by (8)
Geospatial analysis of hypospadias and cryptorchidism prevalence rates based on postal code in a Canadian province with stable population
2023, Journal of Pediatric UrologyCitation Excerpt :The discrepancies between these studies are likely a result of using more granular administrative boundaries for analysis in the present study. Studies on the use of GIS in population health research have identified that using finer spatial units such as FSAs can help reduce statistical bias associated with aggregating data to large units such as counties [18]. As such, this study is a more accurate representation of the geospatial clustering of these anomalies in Nova Scotia than previously reported [9].
Complexity of chemical products, plants, processes and control systems
2009, Chemical Engineering Research and DesignFactors driving potential ammonia oxidation in Canadian arctic ecosystems: Does spatial scale matter?
2012, Applied and Environmental MicrobiologyInformation sets partition based on entropy using improved particle swarm optimization algorithm
2010, Journal of Computational Information Systems