Abstract
Qualitative spatial variables are important in many fields of research. However, unlike the decades-worth of research devoted to the spatial association of quantitative variables, the exploratory analysis of spatial qualitative variables is relatively less developed. The objective of the present paper is to propose a new test (Q) for spatial independence. This is a simple, consistent, and powerful statistic for qualitative spatial independence that we develop using concepts from symbolic dynamics and symbolic entropy. The Q test can be used to detect, given a spatial distribution of events, patterns of spatial association of qualitative variables in a wide variety of settings. In order to enable hypothesis testing, we give a standard asymptotic distribution of an affine transformation of the symbolic entropy under the null hypothesis of independence in the spatial qualitative process. We include numerical experiments to demonstrate the finite sample behaviour of the test, and show its application by means of an empirical example that explores the spatial association of fast food establishments in the Greater Toronto Area in Canada.
Similar content being viewed by others
References
Anselin L (1988) Spatial econometrics: methods and models. Kluwer, Dordrecht
Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):93–115
Austin SB, Melly SJ, Sanchez BN, Patel A, Buka S, Gortmaker SL (2005) Clustering of fast-food restaurants around schools: a novel application of spatial statistics to the study of food environments. Am J Public Health 95(9):1575–1581
Bailey TC, Gatrell AC (1995) Interactive spatial data analysis. Addison Wesley Longman, Essex
Bell N, Schuurman N, Hameed SM (2008) Are injuries spatially related? Join-count spatial autocorrelation for small-area injury analysis. Inj Prev 14(6):346–353
Bhat CR, Sener IN (2009) A copula-based closed-form binary logit choice model for accommodating spatial correlation across observational units. J Geograph Syst 11(3):243–272
Boots B (2003) Developing local measures of spatial association for categorical variables. J Geograph Syst 5(2):139–160
Chakir R, Parent O (2009) Determinants of land use changes: a spatial multinomial probit approach. Pap Reg Sci 88(2):327–344
Chuang KS, Huang HK (1992) Assessment of noise in a digital image using the join-count statistic and the Moran test. Phys Med Biol 37(2):357–369
Cliff AD, Ord JK (1973) Spatial autocorrelation. Pion, London
Cliff AD, Ord JK (1981) Spatial processes: models and applications. Pion, London
Cressie NAC (1993) Statistics for spatial data. Wiley, New York
Dacey MF (1968) A review on measures of contiguity for two and k-color maps. In: Berry BJL, Marble DF (eds) Spatial analysis: a reader in statistical geography. Prentice Hall, Englewood Cliffs, pp 479–495
Dejong PD, Debree J (1995) Analysis of the spatial-distribution of rust-infected leek plants with the black-white join-count statistic. Eur J Plant Pathol 101(2):133–137
Dubin R (1995) Estimating logit models with spatial dependence. In: Anselin L, Florax RJGM (eds) New directions in spatial econometrics. Springer, Berlin, pp 229–242
Epperson BK, AlvarezBuylla ER (1997) Limited seed dispersal and genetic structure in life stages of Cecropia obtusifolia. Evolution 51(1):275–282
Farber S, Páez A, Volz E (2009) Topology and dependency tests in spatial and network autoregressive models. Geogr Anal 41(2):158–180
Geary RC (1954) The contiguity ratio and statistical mapping. Inc Stat 5(3):115–145
Getis A (2008) A history of the concept of spatial autocorrelation: a geographer’s perspective. Geogr Anal 40(3):297–309
Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics. Geogr Anal 25(3):189–206
Ghent AW, Warner RE, Mankin PC (1992) Accurate counts for Moran joins tests in ecological studies. Am Midl Nat 128(2):366–376
Goldsborough LG (1994) Heterogeneous spatial distribution of periphytic diatoms on vertical artificial substrata. J N Am Benthol Soc 13(2):223–236
Griffith DA (1988) Advanced spatial statistics: special topics in the exploration of quantitative spatial data series. Kluwer, Dordrecht
Griffith DA (1999) Statistical and mathematical sources of regional science theory: map pattern analysis as an example. Pap Reg Sci 78(1):21–45
Haining RP (1978) Spatial model for high-plains agriculture. Ann Assoc Am Geogr 68(4):493–504
Haining R (1990) Spatial data analysis in the social and environmental sciences. Cambridge University Press, Cambridge
Hao B, Zheng W (1998) Applied symbolic dynamics and chaos. World Scientific, Singapore
Hewes L, Schmieding AC (1956) Risk in the Central Great Plains: geographical patterns of wheat failure in Nebraska, 1931–1952. Geogr Rev 46(3):375–387
Krishna Iyer PVA (1949) The first and second moments of some probability distributions arising from points on a lattice, and their applications. Biometrika 36(1/2):135–141
Lehman EL (1986) Testing statistical hypothesis. Wiley, New York
Mannelli A, Sotgia S, Patta C, Oggiano A, Carboni A, Cossu P, Laddomada A (1998) Temporal and spatial patterns of African swine fever in Sardinia. Prev Vet Med 35(4):297–306
McMillen DP (1992) Probit with spatial autocorrelation. J Reg Sci 32(3):335–348
Miller HJ (2004) Tobler’s first law and spatial analysis. Ann Assoc Am Geogr 94(2):284–289
Moran PAP (1948) The interpretation of statistical maps. J R Stat Soc Series B Stat Methodol 10(2):243–251
Moran PAP (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23
Páez A (2006) Exploring contextual variations in land use and transport analysis using a probit model with geographical weights. J Transp Geogr 14(3):167–176
Páez A, Scott DM, Volz E (2008) Weight matrices for social influence analysis: an investigation of measurement errors and their effect on model identification and estimation quality. Soc Netw 30(4):309–317
Real LA, McElhany P (1996) Spatial pattern and process in plant-pathogen interactions. Ecology 77(4):1011–1025
Ripley BD (1981) Spatial statistics. Wiley, Hobroken
Robertson RD, Nelson GC, De Pinto A (2009) Investigating the predictive capabilities of discrete choice models in the presence of spatial effects. Pap Reg Sci 88(2):367–388
Rohatgi VK (1976) An introduction to probability theory and mathematical statistics. Wiley, New York
Soon SYT (1996) Binomial approximation for dependent indicators. Statistica Sinica 6(3):703–714
Stratton DA, Bennington CC (1996) Measuring spatial variation in natural selection using randomly-sown seeds of Arabidopsis thaliana. J Evol Biol 9(2):215–228
Taam W, Hamada M (1993) Detecting spatial effects from factorial-experiments—an application from integrated-circuit manufacturing. Technometrics 35(2):149–160
Upton G, Fingleton B (1985) Spatial data analysis by example. Wiley, Chichester
Wang XK, Kockelman KM (2009) Application of the dynamic spatial ordered probit model: patterns of land development change in Austin, Texas. Pap Reg Sci 88(2):345–365
Acknowledgments
The authors gratefully acknowledge for financial support of grant ECO-2009-10534-ECON of Ministerio Español de Ciencia e Innovación and Fundación Séneca de la Región de Murcia. In preparing this paper we benefited from the comments of anonymous reviewers, and feedback received from participants in the 2009 Meetings of the AAG and the 2009 Spatial Econometric World Congress. In particular, we are grateful for useful discussions with Prof. Daniel A. Griffith and Ms. Melissa J. Rura. The authors alone are responsible for the contents of the paper.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 Proofs
Proof of Theorem 1
Under the null H 0, the joint probability density function of the n variables \( \left( {Y_{{\sigma_{1} }} ,Y_{{\sigma_{2} }} , \ldots ,Y_{{\sigma_{{k^{m} }} }} } \right) \) is:
where a 1 + a 2 + ··· + a n = R. Consequently, the joint distribution of the n variables \( \left( {Y_{{\sigma_{1} }} ,Y_{{\sigma_{2} }} , \ldots ,Y_{{\sigma_{{k^{m} }} }} } \right) \) is a multinomial distribution.
The likelihood function of the distribution given by Eq. (19) is:
and since, \( \sum\nolimits_{i = 1}^{{k^{m} }} p_{{\sigma_{i} }} = 1 \), it follows that
Then the logarithm of this likelihood function remains as
In order to obtain the maximum likelihood estimators \( \hat{p}_{{\sigma_{i} }} \) of \( p_{{\sigma_{i} }} \) for all i = 1, 2,…, n, we solve the following equation
to get that:
Then the likelihood ratio statistic is (see for example Lehman 1986):
where \( p_{{\sigma_{i} }}^{(0)} \) denotes the probability of the symbol σ i under the null hypothesis.
On the other hand, Q(m) = −2 ln(λ(Y)) asymptotically follows a Chi-squared distribution with k m − 1 degrees of freedom (see Lehman 1986). Hence:
Denote by α ij the number of times that class a j appears in symbol σ i and by q j = P(X = a j ). Then under the null we have that \( p_{{\sigma_{i} }}^{(0)} = \prod\nolimits_{j = 1}^{k} {q_{j}^{{\alpha_{ij} }} } \) and hence, it follows that
Now, taking into account that \( h(m) = - \sum\nolimits_{i = 1}^{{k^{m} }} {p_{{\sigma_{i} }} { \ln }\left( {p_{{\sigma_{i} }} } \right)} = - \sum\nolimits_{i = 1}^{{k^{m} }} {{\frac{{n_{{\sigma_{i} }} }}{R}}{ \ln }\left( {{\frac{{n_{{\sigma_{i} }} }}{R}}} \right)} \), we have that
Notice that if the spatial process is independent identically distributed then \( q_{j} = {\frac{1}{k}} \) and therefore \( Q(m) = 2R\left( {{ \ln }(k^{m} ) - h(m)} \right) \) which finishes the proof of the theorem.
Proof of Theorem 2
First, notice that the estimator of h(m), \( \widehat{h}\left( m \right) = - \sum\nolimits_{{\sigma \in S_{m} }} {\hat{p}_{\sigma } { \ln }\left( {\hat{p}_{\sigma } } \right)} \), where \( \hat{p}_{\sigma } = n_{\sigma } /R \), is consistent because, \( p\mathop { \lim }\nolimits_{R \to \infty } \hat{p}_{\sigma } = p_{\sigma } \), and hence:
Recall that:
Now, let us call:
Also, since ln(x) ≤ x − 1 for all x with equality if and only if x = 1, and under the alternative hypothesis of spatial dependence of order ≤m we have that:
it follows that:
Since, also \( p\mathop { \lim }\nolimits_{R \to \infty } \hat{q}_{j} = q_{j} \), then by Eq. (29) we have
Let 0 < C < ∞ with \( C \in \mathbb{R} \) and take R large enough such that
Then, under the spatial dependence of order less than or equal to m it follows that H(m) ≠ 0 and, thus,
Therefore, by Eqs. (34), (35) and (36) we have that:
as desired.
Rights and permissions
About this article
Cite this article
Ruiz, M., López, F. & Páez, A. Testing for spatial association of qualitative data using symbolic dynamics. J Geogr Syst 12, 281–309 (2010). https://doi.org/10.1007/s10109-009-0100-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10109-009-0100-1