Estimation of spatial autoregressive models with measurement error for large data sets

Suesse, Thomas

doi:10.1007/s00180-017-0774-7

Estimation of spatial autoregressive models with measurement error for large data sets

Original Paper
Published: 08 November 2017

Volume 33, pages 1627–1648, (2018)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Thomas Suesse¹

781 Accesses
8 Citations
Explore all metrics

Abstract

Maximum likelihood (ML) estimation of spatial autoregressive models for large spatial data sets is well established by making use of the commonly sparse nature of the contiguity matrix on which spatial dependence is built. Adding a measurement error that naturally separates the spatial process from the measurement error process are not well established in the literature, however, and ML estimation of such models to large data sets is challenging. Recently a reduced rank approach was suggested which re-expresses and approximates such a model as a spatial random effects model (SRE) in order to achieve fast fitting of large data sets by fitting the corresponding SRE. In this paper we propose a fast and exact method to accomplish ML estimation and restricted ML estimation of complexity of $O(n^{3/2})$ operations when the contiguity matrix is based on a local neighbourhood. The methods are illustrated using the well known data set on house prices in Lucas County in Ohio.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast estimation of matrix exponential spatial models

Article 12 September 2021

Statistical inference of partially specified spatial autoregressive model

Article 01 January 2015

Bayesian Spatial Statistical Modeling

References

Bates D, Maechler M (2017) Matrix: sparse and dense matrix classes and methods. http://CRAN.R-project.org/package=Matrix. Accessed 6 Nov 2017
Besag J, York J, Molli A (1991) Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43(1):1–20
Article MathSciNet MATH Google Scholar
Bivand R (2010) Comparing estimation methods for spatial econometrics techniques using R. Report, Department of Economics, Norwegian School of Economics and Business Administration, SAM 26 2010
Bivand R (2017) spdep: spatial dependence: weighting schemes, statistics and models. http://CRAN.R-project.org/package=spdep. Accessed 6 Nov 2017
Bivand R, Piras G (2015) Comparing implementations of estimation methods for spatial econometrics. J Stat Softw 63(18):1–36
Article Google Scholar
Bivand R, Hauke J, Kossowski T (2013) Computing the Jacobian in Gaussian spatial autoregressive models: an illustrated comparison of available methods. Geogr Anal 45(2):150–179
Article Google Scholar
Bivand R, Gómez-Rubio V, Rue H (2015) Spatial data analysis with R-INLA with some extensions. J Stat Softw 63(20):1–31
Article Google Scholar
Burden S, Cressie N, Steel DG (2015) The SAR model for very large datasets: a reduced rank approach. Econometrics 3(2):317–338
Article Google Scholar
Corrado L, Fingleton B (2012) Where is the economics in spatial econometrics? J Reg Sci 52(2):210–239
Article Google Scholar
Cressie N (1993) Statistics for spatial data. Wiley, New York
MATH Google Scholar
Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J R Stat Soc Ser B (Stat Methodol) 70(1):209–226
Article MathSciNet MATH Google Scholar
Davis TA (2006) Direct methods for sparse linear systems, vol 2. SIAM, Philadelphia
Book MATH Google Scholar
Dong G, Harris R, Jones K, Yu J (2015) Multilevel modelling with spatial interaction effects with application to an emerging land market in Beijing, China. PLoS ONE 10(6):1–18
Google Scholar
George A, Liu JW (1981) Computer solution of large sparse positive definite Systems. Prentice-Hall series in computational mathematics. Prentice-Hall, Upper Saddle River
MATH Google Scholar
Gomez-Rubio V, Bivand RS, Rue H (2017) Estimating spatial econometrics models with integrated nested Laplace approximation. arXiv preprint arXiv:1703.01273
Goulard M, Laurent T, Thomas-Agnan C (2017) About predictions in spatial autoregressive models: optimal and almost optimal strategies. Spat Econ Anal 2–3:304–325
Article Google Scholar
Halleck Vega S, Elhorst JP (2015) The SLX model. J Reg Sci 55(3):339–363
Article Google Scholar
Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5:81–102
Article MATH Google Scholar
Huque MH, Bondell HD, Ryan L (2014) On the impact of covariate measurement error on spatial regression modelling. Environmetrics 25(8):560–570
Article MathSciNet Google Scholar
Kang EL, Liu D, Cressie N (2009) Statistical analysis of small-area data based on independence, spatial, non-hierarchical, and hierarchical models. Comput Stat Data Anal 53(8):3016–3032
Article MathSciNet MATH Google Scholar
Lehoucq R, Maschhoff K, Sorensen D, Yang C (2015) ARPACK library for calculating eigenvectors of sparse matrices. http://www.caam.rice.edu/software/ARPACK/. Accessed 6 Nov 2017
Lesage J, Pace R (2009) Introduction to spatial econometrics. Chapman & Hall/CRC, Boca Raton
Book MATH Google Scholar
LeSage JP, Vance C, Chih YY (2017) A Bayesian heterogeneous coefficients spatial autoregressive panel data model of retail fuel duopoly pricing. Reg Sci Urb Econ 62:46–55
Article Google Scholar
Li H, Calder CA, Cressie N (2012) One-step estimation of spatial dependence parameters: properties and extensions of the APLE statistic. J Multivar Anal 105(1):68–84
Article MathSciNet MATH Google Scholar
Lindstrom MJ, Bates DM (1988) Newton–Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. J Am Stat Assoc 83(404):1014–1022
MathSciNet MATH Google Scholar
Mukherjee C, Kasibhatla PS, West M (2014) Spatially varying SAR models and Bayesian inference for high-resolution lattice data. Ann Inst Stat Math 66(3):473–494
Article MathSciNet MATH Google Scholar
Ord K (1975) Estimation methods for models of spatial interaction. J Am Stat Assoc 70(349):120–126
Article MathSciNet MATH Google Scholar
Pace R, Barry R (1997) Fast CARs. J Stat Comput Simul 59(2):123–147
Article MATH Google Scholar
Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. CRC Press, Boca Raton
Book MATH Google Scholar
Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B (Stat Methodol) 71(2):319–392
Article MathSciNet MATH Google Scholar
Samart K, Chambers R (2014) Linear regression with nested errors using probability-linked data. Aust N Z J Stat 56(1):27–46
Article MathSciNet MATH Google Scholar
Verbyla AP (1990) A conditional derivation of residual maximum likelihood. Aust J Stat 32(2):227–230
Article Google Scholar
Villard G, Thome E (2002) Computation of the inverse and determinant of a matrix. In: Algorithms seminar 2001–2002, pp 29–32. http://algo.inria.fr/seminars/. Accessed 6 Nov 2017

Download references

Acknowledgements

We would like to thank Dr. Sandy Burden and Distinguished Prof. Noel Cressie for posing the computational problem. We also appreciate the help from Clint Shumack for helping with and NIASRA for using the high performance cluster for the empirical studies in this paper. The author wishes to gratefully acknowledge the help of Dr. Madeleine Strong Cincotta in the final language editing of this paper. We would also like to thank the referees for their helpful comments that greatly improved the manuscript.

Author information

Authors and Affiliations

National Institute for Applied Statistics Research Australia, University of Wollongong, Wollongong, NSW, 2522, Australia
Thomas Suesse

Authors

Thomas Suesse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Suesse.

Appendices

A. Simulation results for the SEM-M

See Tables 7, 8, 9, and 10.

Table 7 Empirical mean of $\rho $, $\sigma ^2_y$, $\sigma ^2_{\epsilon }$ and $\beta $ estimates over 10,000 simulations for the SEM-M model with $n=506$, $\sigma ^2_y=1$, $\sigma ^2_{\epsilon }=2$ and $\beta _0=1$ and $\beta _1=5$

Full size table

Table 8 Empirical mean of $\rho $, $\sigma ^2_y$, $\sigma ^2_{\epsilon }$ and $\beta $ estimates over 10,000 simulations excluding those for which $\hat{\sigma }^2_{\epsilon }=0$ for the SEM-M model with $n=506$, $\sigma _y=1$, $\sigma ^2_{\epsilon }=2$ and $\beta _0=1$ and $\beta _1=5$

Full size table

Table 9 Coverage of Wald-type CI for $\rho $, $\sigma ^2_y$, $\sigma ^2_{\epsilon }$ and $\beta $ estimates over 10,000 simulations for the SEM-M model with $n=506$, $\sigma ^2_y=1$, $\sigma ^2_{\epsilon }=2$ and $\beta _0=1$ and $\beta _1=5$

Full size table

Table 10 Coverage of Wald-type CI for $\rho $, $\sigma ^2_y$, $\sigma ^2_{\epsilon }$ and $\beta $ estimates over 10,000 simulations excluding those for which $\hat{\sigma }^2_{\epsilon }=0$ for the SEM-M model with $n=506$, $\sigma ^2_y=1$, $\sigma ^2_{\epsilon }=2$ and $\beta _0=1$ and $\beta _1=5$

Full size table

B. Reduced rank approach

To fit the SEM-M, Burden et al. (2015) considered a reduced rank approach that relies on a spectral decomposition. Once this has been accomplished the computational complexity is reduced to that of a SRE model. The spectral decomposition of the matrix

$$\begin{aligned} \mathbf {B}= \frac{ \mathbf {P}\left( \mathbf {W} + \mathbf {W}^{\top }\right) \mathbf {P}}{2}= \mathbf {U}\mathbf {\Omega }\mathbf {U}, \end{aligned}$$

(12)

see formula (22) in Burden et al. (2015), is computationally intensive. The matrix $\mathbf {P}$ is the projection matrix defined in Sect. 3.2. The matrix $\mathbf {U}$ contains the eigenvectors of $\mathbf {B}$ and $\mathbf {\Omega }$ is a diagonal matrix with the eigenvalues of $\mathbf {B}$ on its diagonal. Notice that the matrix $\mathbf {B}$ is a first order approximation of the (co)variance matrix that is obtained from applying REML, ignoring second order terms. The decomposition can be written as

$$\begin{aligned} \mathbf {B}=\mathbf {U}_1\mathbf {\Omega }_1\mathbf {U}_1 + \mathbf {U}_2\mathbf {\Omega }_2\mathbf {U}_2, \end{aligned}$$

where now $\mathbf {\Omega }_1$ contains the largest r absolute eigenvalues and $\mathbf {\Omega }_2$ the remaining $n-r$, and where $\mathbf {U}_j$ is the corresponding matrix containing the eigenvectors. The authors suggested then to use $\mathbf {U}_1\mathbf {\Omega }_1\mathbf {U}_1$ as an approximation to $\mathbf {B}$ and used this approximation to base estimation of the SEM-M on estimation of SREs.

We use the house data set with $n=25{,}357$, see Sect. 5, to illustrate the preprocessing times of the reduced rank approach. Instead of calculating all or a large number of eigenvalues we used Matlab and the R interface of the ARPACK software (Lehoucq et al. 2015) to calculate only the largest absolute $r=25$ eigenvalues and corresponding eigenvectors. We would expect this process to take only a fraction of the time compared to calculating all $n=25{,}357$ eigenvalues, presented in Fig. 4, which took 1 h and 36 minutes (min) using Matlab. However this process of calculating 25 eigenvalues took 54 min 40 s (s) with Matlab (using command ‘eigs’ and option ‘largest magnitude’) and 54 min 32 s with ARPACK (also option ‘largest magnitude’), approximately two thirds of the time even though only $\approx 1/1000$ (25 / 25, 357) of the eigenvalues and vectors were calculated. Also the matrix $\mathbf {B}$ requires storage. In fact 4.9 GB of memory is required with R indicating that the calculation of $\mathbf {B}$ is limited by the available hardware. The objects of the proposed approach of this paper do not require much memory, for example $\mathbf {W}$ or the Cholesky decomposition of $\mathbf {A}\mathbf {A}^{\top }$ only require a couple of (up to 6) Megabytes.

The 54 min serves as a very optimistic lower bound for the time needed to fit this data set with the reduced rank approach. The SRE also has to be fitted, meaning that the total calculation time will clearly exceed this value. Figure 4 in Burden et al. (2015) demonstrates the effect of choosing different values of $r\le 100$ on $\hat{\rho }$, showing the approximate nature of the method for various values of r.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suesse, T. Estimation of spatial autoregressive models with measurement error for large data sets. Comput Stat 33, 1627–1648 (2018). https://doi.org/10.1007/s00180-017-0774-7

Download citation

Received: 07 October 2016
Accepted: 29 October 2017
Published: 08 November 2017
Issue Date: December 2018
DOI: https://doi.org/10.1007/s00180-017-0774-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of spatial autoregressive models with measurement error for large data sets

Abstract

Access this article

Similar content being viewed by others

Fast estimation of matrix exponential spatial models

Statistical inference of partially specified spatial autoregressive model

Bayesian Spatial Statistical Modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

A. Simulation results for the SEM-M

B. Reduced rank approach

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimation of spatial autoregressive models with measurement error for large data sets

Abstract

Access this article

Similar content being viewed by others

Fast estimation of matrix exponential spatial models

Statistical inference of partially specified spatial autoregressive model

Bayesian Spatial Statistical Modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

A. Simulation results for the SEM-M

B. Reduced rank approach

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation