Abstract
Being a hot topic in recent years, many studies have been conducted with spatial data containing massive numbers of observations. Because initial developments for classical spatial autocorrelation statistics are based on rather small sample sizes, in the context of massive spatial datasets, this paper presents extensions to efficiency and statistical power comparisons between the Moran coefficient and the Geary ratio for different variable distribution assumptions and selected geographic neighborhood definitions. The question addressed asks whether or not earlier results for small n extend to large and massively large n, especially for non-normal variables; implications established are relevant to big spatial data. To achieve these comparisons, this paper summarizes proofs of limiting variances, also called asymptotic variances, to do the efficiency analysis, and derives the relationship function between the two statistics to compare their statistical power at the same scale. Visualization of this statistical power analysis employs an alternative technique that already appears in the literature, furnishing additional understanding and clarity about these spatial autocorrelation statistics. Results include: the Moran coefficient is more efficient than the Geary ratio for most surface partitionings, because this index has a relatively smaller asymptotic as well as exact variance, and the superior power of the Moran coefficient vis-à-vis the Geary ratio for positive spatial autocorrelation depends upon the type of geographic configuration, with this power approaching one as sample sizes become increasingly large. Because spatial analysts usually calculate these two statistics for interval/ration data, this paper also includes comments about the join count statistics used for nominal data.










Similar content being viewed by others
Notes
Refer to Univariate distribution relationships: http://www.math.wm.edu/~leemis/chart/UDR/UDR.html.
This property also holds for the SR, CN-C, and CN-TR cases.
The diagonal entries are zeros; i.e., \(c_{ii} = 0, i = 1,2, \ldots,n\).
Given a random variable \(x\), for a two-sided test, \({\text{power}} = 1 - {\text{probability}}\left( {x < {\text{right}}\;{\text{critical }}\;{\text{value}}} \right) + {\text{probability}}\left( {x > {\text{left}}\;{\text{critical}}\;{\text{value}}} \right)\); for a right-sided test, \({\text{power}} = 1 - {\text{probability}}\left( {x < {\text{critical}}\; {\text{value}}} \right)\), whereas for a left-sided test, \({\text{power}} = {\text{probability}}\left( {x > {\text{critical}}\;{\text{value}}} \right)\).
Following the steps in the mentioned paper, the statistical power of the MC is assessed by replacing all 1.96 values with 1.645 and retaining only the right-hand side of the standardized normal curve. Meanwhile, for the GR, because positive SA is in the interval [0, 1), the one-tailed test is the left-hand side rather than the right-hand side of the standardized normal curve; − 1.96 should be replaced with − 1.645, and the positive portions removed.
References
Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x
Anselin L (1996) The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten H, Unwin D (eds) Spatial analytical perspectives on GIS. Taylor and Francis, London, pp 111–125
Anselin L (2018) A local indicator of multivariate spatial association: Extending Geary’s c. Geogr Anal. https://doi.org/10.1111/gean.12164
Bartels CPA, Hordijk L (1977) On the power of the generalized Moran contiguity coefficient in testing for spatial autocorrelation among regression distributions. Reg Sci Urban Econ 7(1):83–101. https://doi.org/10.1016/0166-0462(77)90019-9
Bavaud F (2013) Testing spatial autocorrelation in weighted networks: The modes permutation test. J Geogr Syst 15(3):233–247. https://doi.org/10.1007/s10109-013-0179-2
Bivand R, Müller WG, Reder M (2009) Power calculations for global and local Moran’s I. Comput Stat Data Anal 53(8):2859–2872. https://doi.org/10.1016/j.csda.2008.07.021
Boots B (2003) Developing local measure of spatial association for categorical data. J Geogr Syst 5(2):139–160. https://doi.org/10.1007/s10109-003-0110-3
Boots B, Tiefelsdorf M (2000) Global and local spatial autocorrelation in bounded regular tessellations. J Geogr Syst 2(4):319–348. https://doi.org/10.1007/PL00011461
Carrijo TB, da Silva AR (2017) Modified Moran’s I for small samples. Geogr Anal 49(4):451–467. https://doi.org/10.1111/gean.12130
Cheng T, Haworth J, Wang J (2012) Spatio-temporal autocorrelation of road network data. J Geogr Syst 14(4):389–413. https://doi.org/10.1007/s10109-011-0149-5
Chun Y (2008) Modeling network autocorrelation within migration flows by eigenvector spatial filtering. J Geogr Syst 10(4):317–344. https://doi.org/10.1007/s10109-008-0068-2
Chun Y, Griffith DA (2013) Spatial statistics and geostatistics: theory and applications for geographic information science and technology. SAGE, Thousand Oaks
Cliff AD, Ord JK (1969) The problem of spatial autocorrelation. In: Scott AJ (ed) Studies in regional science. Pion Ltd, London, pp 25–55
Cliff AD, Ord JK (1970) Spatial autocorrelation: A review of existing and new measures with applications. Econ Geogr 46:269–292. https://doi.org/10.2307/143144
Cliff AD, Ord JK (1973) Spatial autocorrelation. Pion Ltd, London
Cliff AD, Ord JK (1981) Spatial process. Pion Ltd, London
de Jong P, Sprenger C, van Veen F (1984) On extreme values of Moran’s I and Geary’s c. Geogr Anal 16(1):17–24. https://doi.org/10.1111/j.1538-4632.1984.tb00797.x
de la Mata T, Llano C (2013) Social networks and trade of service: Modelling interregional flows with spatial and network autocorrelation. J Geogr Syst 15(3):319–367. https://doi.org/10.1007/s10109-013-0183-6
Diggle P (2010) Nonparametric methods. In: Gelfand AE, Diggle PJ, Fuentes M, Guttorp P (eds) Handbook of spatial statistics. CRC Press, Baca Raton, pp 299–316
Dray S (2011) A new perspective about Moran’s Coefficient: Spatial autocorrelation as a linear regression problem. Geogr Anal 43(2):127–141. https://doi.org/10.1111/j.1538-4632.2011.00811.x
Geary RC (1954) The contiguity ratio and statistical mapping. Inc Stat 5(3):115–146. https://doi.org/10.2307/2986645
Griffith DA (1987) Spatial autocorrelation: a primer. AAG, Pennsylvania
Griffith DA (1996) Spatial autocorrelation and eigenfunctions of the geographic weights matrix accompanying geo-referenced data. Can Geogr 40(4):351–367. https://doi.org/10.1111/j.1541-0064.1996.tb00462.x
Griffith DA (2003) Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization. Springer, Berlin
Griffith DA (2004) Extreme eigenfunctions of adjacency matrices for planar graphs employed in spatial analyses. Linear Algebra Appl 388:201–219. https://doi.org/10.1016/S0024-3795(03)00368-9
Griffith DA (2010) The Moran coefficient for non-normal data. J Stat Plan Inference 140(11):2980–2990. https://doi.org/10.1016/j.jspi.2010.03.045
Griffith DA (2015) On the eigenvalue distribution of adjacency matrices for connected planar graphs. Quaest Geogr. https://doi.org/10.1515/quageo-2015-0035
Griffith D, Chun Y (2016) Spatial autocorrelation and uncertainty associated with remotely-sensed data. Remote Sens 8(7):535. https://doi.org/10.3390/rs8070535
Griffith DA, Luhanga U (2011) Approximating the inertia of the adjacency matrix of a connected planar graph that is the dual of a geographic surface partitioning. Geogr Anal 43(4):383–402. https://doi.org/10.1111/j.1538-4632.2011.00828.x
Haining RP (1978) The moving average model for spatial interaction. Trans Inst Br Geogr 3(2):202–225. https://doi.org/10.2307/622202
Haynes D, Jokela A, Manson S (2018) IPUMS-Terra: Integrated big heterogeneous spatiotemporal data analysis system. J Geogr Syst 20(4):343–361. https://doi.org/10.1007/s10109-018-0277-2
Hope ACA (1968) A simplified Monte Carlo significance test procedure. J R Stat Soc B 30(3):582–598
Jackson MC, Huang L, Xie Q, Tiwari RC (2010) A modified version of Moran’s I. Int J Health Geogr 9:33. https://doi.org/10.1186/1476-072X-9-33
Lee SI (2001) Developing a bivariate spatial association measure: an integration of Pearson’s r and Moran’s I. J Geogr Syst 3(4):369–385. https://doi.org/10.1007/s101090100064
Lee J, Kang M (2015) Geospatial big data: challenges and oppurtunities. Big Data Res 2(2):74–81. https://doi.org/10.1016/j.bdr.2015.01.003
Legendre P, Fortin MJ (1989) Spatial pattern and ecological analysis. Vegetatio 80(2):107–138. https://doi.org/10.1007/BF00048036
Li S, Dragicevic S, Castro AC et al (2016) Geospatial big data handling theory and methods: a review and research challenges. ISPRS J Photogramm Remote Sens 115:119–133. https://doi.org/10.1016/j.isprsjprs.2015.10.012
Luo Q, Griffith DA, Wu H (2017) The Moran coefficient and Geary ratio: some mathematical and numerical comparisons. In: Griffith DA, Chun Y, Dean DJ (eds) Advances in geocomputation. Advances in geographic information science. Springer, Cham, pp 253–269
Moran PAP (1950) Notes on continuous stochastic phenomena. Biometrika 37(1/2):17–23. https://doi.org/10.2307/2332142
Oden D (1995) Adjusting Moran’s I for population density. Stat Med 14(1):17–26
Tait M, Tobin J (2017) Three conjectures in extremal spectral graph theory. J Comb Theory Ser B 126:137–161. https://doi.org/10.1016/j.jctb.2017.04.006
Tiefelsdorf M, Boots B (1995) The exact distribution of Moran’s I. Environ Plan A 27(6):985–999. https://doi.org/10.1068/a270985
van Zyl T (2014) Algorithmic design considerations for geospatial and/or temporal big data. In: Karimi HA (ed) Big data: techniques and technologies in geoinformatics. CRC Press, Baca Raton, pp 117–132
Waldhör T (1996) The spatial autocorrelation coefficient Moran’s I under heteroscedasticity. Stat Med 15(7–9):887–892
Weiss NA (2017) Introductory statistics, 10th edn. Pearson Education Ltd, London
Acknowledgements
Funding was provided by The National Key Research and Development Program of China (Grant No. 2017YFB0503802) and China Scholarship Council (Grant No. 201406270075).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Selected eigenvalues of binary connectivity matrices and corresponding MC and GR values for three theoretical configurations
See Table 6.
Appendix 2: A descriptive introduction of statistical power
Figure 11 shows necessary elements of a hypothesis testing procedure. Suppose one is testing the null hypothesis mean = 0 whose underlying distribution is standard normal, setting the significance level \(\alpha\) to 0.05, which results in the critical values ± 1.96. Suppose the true mean value is one, which is the alternative hypothesis. The two green areas are critical regions in which the null hypothesis will be rejected; thus, the interval [− 1.96, 1.96] is the range across which the null will not be rejected. Because the true mean is one, failing to reject the null commits a Type II error, which is the area colored blue under the alternative distribution curve (the blue normal curve). Therefore, the statistical power of this hypothesis testing example is the areas under the blue curve that are restricted to \(\left[ {1.96,\left. { + \infty } \right)} \right.\) and \(\left( { - \infty ,\left. { - 1.96} \right]} \right.\).
Appendix 3: Proofs for the relationship function between the MC and the GR and Theorems 1 to 4
Proof 1
Substituting Eq. (1) into Eq. (3) yields
Comparing this equation to Eq. (2), the proof requires only showing the equality of their numerators. Considering \(\left( {x_{i} - x_{j} } \right)^{2} = \left[ {\left( {x_{i} - \bar{x}} \right) - \left( {x_{j} - \bar{x}} \right)} \right]^{2}\), and utilizing the symmetry of matrix \(\varvec{C}\), yields
\(\therefore GR =\) Eq. (3).□
The following are proofs for Theorems 1 to 4 (T1 to T4).
Proof of T1
where \(o\left( 1 \right) = 1/n\) is an infinitesimal over \(n \to \infty\), \(S_{2} /S_{0}^{2}\) is a constant (it is a positive constant for the maximum planar connectivity case; otherwise, it converges to zero), and \(o\left( {1/n} \right) = 1/n^{2}\) is the infinitesimal of higher order than \(1/n\) over \(n \to \infty\).□
Proof of T2
where \(b_{2}\) is a constant (an index of kurtosis) whose value may vary with the assumed distribution, and \(o\left( {1/n^{i} } \right)\)(\(i = 0,1,2\)) are infinitesimals (of higher order) over \(n \to \infty\).□
Proof of T3
\(\therefore \mathop {\lim }\limits_{n \to \infty } {\text{Var}}_{N} \left( {\text{GR}} \right) = {\text{Var}}_{A} \left( {\text{GR}} \right)\) □
Proof of T4
□
Rights and permissions
About this article
Cite this article
Luo, Q., Griffith, D.A. & Wu, H. Spatial autocorrelation for massive spatial data: verification of efficiency and statistical power asymptotics. J Geogr Syst 21, 237–269 (2019). https://doi.org/10.1007/s10109-019-00293-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10109-019-00293-3