Scales, levels and processes: Studying spatial patterns of British census variables

https://doi.org/10.1016/j.compenvurbsys.2005.08.005Get rights and content

Abstract

This paper is based on the assumption that there may be scale effects at all levels of areal data and that they vary both within areal units and between areal units. Spatial distributions are based on processes taking place in geographical space. A mapped pattern may reflect several distinct processes, each of which may affect a different area and operate at a different scale. The challenge for the spatial analyst is to identify these processes and evaluate their importance from the spatial pattern observed. Here the well known modifiable areal unit problem is not really a problem but a resource. Data at different scales can help us identify processes operating at different scales. We build on models and methods described by [Tranmer, M., & Steel, D. G. (2001). Using local census data to investigate scale effects. In N. J. Tate, & P. M. Atkinson (Eds.), Modelling scale in geographical information science (pp. 105–122). Chichester: John Wiley and Sons], which facilitate the identification of processes occurring within areal units. The method is extended using concepts from multi-level modelling and spatial autocorrelation, through the application of local statistics applied to what may be termed area effect estimates. It is illustrated with respect to two very different census variables and three different study areas.

Introduction

The modifiable areal unit problem (MAUP) is a phenomenon whereby different results are obtained in analysis of the same data grouped into different sets of areal units. It vexes the geographical and spatial analyst almost as much today as it did when first identified by Gehlke and Biehl (1934) or when subsequently popularised by Openshaw and Taylor, 1979, Openshaw and Taylor, 1981. The MAUP has been subdivided into two separate but linked issues. One is the zonation issue, which concerns the effects of the arbitrary nature of the boundary division placed upon the data. The other issue is the scale issue, which can be defined as occurring where the statistical results of an analysis may change as the level of analysis changes. These effects occur because spatial processes generating the observed data may exist at scales and for particular areal units that may be reflected more or less accurately by the boundaries in use. Among other authors, Fotheringham and Wong (1991) have demonstrated these effects for US census data, and Tranmer and Steel (2001) have done so for UK data. See Openshaw (1984) for further discussion of these concepts.

Two analytical techniques are applied in this paper to investigate the processes generating spatial patterns. The first technique is the Multi-level model, or MLM (Jones, 1991). The MLM is based on the recognition that a response variable can be affected by processes occurring at both the individual level and the group level. Thus, the MLM can be used to assess the existence, and estimate the magnitude, of processes that operate at the individual person level, and also one or more grouped level. In the classic applications of MLM in education, the groups may correspond to classes or schools; in the current context, the groups may refer to geographical areas over which spatial processes operate.

The second of these techniques is spatial autocorrelation. This has been identified as highly relevant to the analysis of spatial data, such as data that is available for areal units (see for instance Cliff & Ord, 1973). Spatial autocorrelation has been discussed as a factor in the debate concerning the modifiable areal unit problem (see Openshaw & Taylor, 1979). At its simplest, spatial autocorrelation can be thought of as the correlation of a variable at one place with the same variable at neighbouring places. It exemplifies Tobler’s first law of geography that “everything is related to everything else, but near things are more related than distant things” (Tobler, 1970, p. 236). Goodchild (1986) gives a more detailed treatment.

Spatial autocorrelation can inform analysts about the patterning of areal data. It is logical that spatial autocorrelation and multi-level modelling should be analysed together. Jones (1991, p. 8) states, “the degree of auto-correlation in MLM can loosely be conceived as the ratio of ‘variation at the higher level’ to the ‘total variation at all levels’. A value of zero for a spatial autocorrelation coefficient signifies no auto-correlation, indicating that there is no variation at the higher level”. The work presented here builds on this basis, aiming to find evidence for the spatial processes generating the data under analysis, using a combination of adapted multi-level modelling and spatial autocorrelation techniques. The paper also provides conclusions about the patterns displayed by certain British census variables.

Section snippets

Background, data and theory

Prior to presenting our methods it is necessary to consider the nature of areal units for which spatial data may be provided. There may be processes and effects within areal data that interact in a complex fashion to create the observed data. If data are available at different scales, this may reflect the processes generating the data. However, there may be other processes affecting observed data that occur at scales for which we do not have information. Despite this, they deserve

Methodology

The models and methods described by Tranmer and Steel (2001) only allow for a global measure of homogeneity to be calculated, but do not allow the differing levels of homogeneity within a SAR district to be calculated. Therefore we extend the approach to examine evidence of such changes in homogeneity by attempting to identify processes generating these different levels of homogeneity. Having presented some background to the approach, this section details the method that was used to further

Analysis

The Glasgow SAR district was chosen to test the methodology outlined above, as it was known to be an area in which strong scale effects could be seen. It will be contrasted with the Reigate and Ribble SAR districts, which were identified as less susceptible to MAUP (scale) effects (Manley & Flowerdew, 2003). Reigate was chosen in part because Tranmer and Steel (2001) used it as an example, and Ribble because it was known to include areas of different settlement pattern. The variables used are

Conclusions

It has been shown that although an aggregation level (EDs or wards in our case) is presented as a homogeneous set of areal units, the reality is that an aggregation level may be affected by processes operating at vastly different scales. Two variables have been used, demonstrating that different variables act in different manners. Thus, the processes that operate for certain units are specific to a certain variable. It is clear that it is not possible to define an ideal single census geography

Acknowledgements

The census data used in this study, including the Household Sample of Anonymised Records, are Crown Copyright. They were bought for academic use by the ESRC/JISC/DENI and are held at the Manchester Computing Centre. Digital boundary data for Great Britain were also purchased by ESRC for the academic community. Access was obtained via the UKBORDERS service at the University of Edinburgh. An initial version of this paper was presented at the GISRUK 2003 conference at City University. The authors

References (23)

  • L. Anselin

    Local indicators of spatial association—LISA

    Geographical Analysis

    (1995)
  • A. Cliff et al.

    Spatial autocorrelation

    (1973)
  • C. Denham

    Census geography

  • R. Flowerdew et al.

    Behaviour of regression models under random aggregation

  • A.S. Fotheringham et al.

    The modifiable areal unit problem in multivariate statistical analysis

    Environment and Planning A

    (1991)
  • C.E. Gehlke et al.

    Certain effects of grouping upon the size of the correlation in census tract material

    Journal of the American Statistical Association

    (1934)
  • A. Getis et al.

    Local spatial statistics: an overview

  • M. Green et al.

    New evidence on the modifiable areal unit problem

  • H. Goldstein

    Multilevel statistical models

    (2003)
  • M.F. Goodchild

    Spatial autocorrelation

    (1986)
  • R. Haining

    Spatial data analysis: Theory and practice

    (2003)
  • Cited by (78)

    • A graded cluster system to mine virtual stations in free-floating bike-sharing system on multi-scale geographic view

      2021, Journal of Cleaner Production
      Citation Excerpt :

      That means DBSCAN algorithm is better than K-Means algorithm. The modifiable areal unit problem (MAUP) (Gehlke and Biehl, 1934; Openshaw and Taylor, 1979, 1981) is a phenomenon whereby different results are obtained in analysis of the same data grouped into different sets of areal units (Manley et al., 2006). It haunts the geographical and spatial analyst in two aspects.

    • Uncertainty in Causal Neighborhood Effects: A Multi-Agent Simulation Approach

      2023, Leibniz International Proceedings in Informatics, LIPIcs
    View all citing articles on Scopus
    View full text