Abstract
Bayesian hierarchical modeling with Gaussian process random effects provides a popular approach for analyzing point-referenced spatial data. For large spatial data sets, however, generic posterior sampling is infeasible due to the extremely high computational burden in decomposing the spatial correlation matrix. In this paper, we propose an efficient algorithm—the adaptive griddy Gibbs (AGG) algorithm—to address the computational issues with large spatial data sets. The proposed algorithm dramatically reduces the computational complexity. We show theoretically that the proposed method can approximate the real posterior distribution accurately. The sufficient number of grid points for a required accuracy has also been derived. We compare the performance of AGG with that of the state-of-the-art methods in simulation studies. Finally, we apply AGG to spatially indexed data concerning building energy consumption.
Similar content being viewed by others
References
Banerjee, S., Gelfand, A., Finley, A., Sang, H.: Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. 70, 825–848 (2008)
Cressie, N.: Statistics for Spatial Data, 2nd edn. Wiley, New York (1993)
Eidsvik, J., Finley, A., Banerjee, S., Rue, H.: Approximate Bayesian inference for large spatial datasets using predictive process models. Comput. Stat. Data Anal., 1362–1380 (2012)
Finley, A., Sang, H., Banerjee, S., Gelfand, A.: Improving the performance of predictive process modeling for large datasets. Comput. Stat. Data Anal. 53, 2873–2884 (2009)
Fuentes, M.: Approximate likelihood for large irregularly spaced spatial data. J. Am. Stat. Assoc. 102, 321–331 (2007)
Gelman, A., Carlin, J., Stern, H.: Bayesian Data Analysis, 2nd edn. CRC Press, Boca Raton (2009)
Higdon, D.: Space and space time modeling using process convolutions. In: Anderson, C., Barnett, V., Chatwin, P.C., El-Shaarawi, A.H. (eds.) Quantitative Methods for Current Environmental Issues, pp. 37–56. Springer, London (2002)
IBM: Smarter Planet Initiatives (2010). www.ibm.com/smarterplanet/global/files/us__en_us_buildings__green_buildings.pdf
Lin, X., Wahba, G., Xiang, D., Gao, F., Klein, R., Klein, B.: Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV. Ann. Stat. 28, 1570–1600 (2000)
Matérn, B.: Spatial Variation, 2nd edn. Springer, Berlin (1960)
Paciorek, C.: Computational techniques for spatial logistic regression with large datasets. Comput. Stat. Data Anal. 51, 3631–3653 (2007)
Ritter, C., Tanner, M.A.: Facilitating the Gibbs sampler: the Gibbs stopper and the griddy-Gibbs sampler. J. Am. Stat. Assoc. 87, 861–868 (1992)
Rue, H., Held, L.: Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall, Boca Raton (2006)
Rue, H., Martino, S., Chopin, N.: Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations. J. R. Stat. Soc., Ser. B, Stat. Methodol. 71, 319–392 (2009)
Rue, H., Tjelmeland, H.: Fitting Gaussian Markov random fields to Gaussian fields. Scand. J. Stat. 29, 31–49 (2002)
Stein, M.: Interpolation of Sptaital Data: Some Theory of Kriging. Springer, New York (1999)
Stein, M., Chi, Z., Welty, L.: Approximating likelihoods for large spatial data sets. J. R. Stat. Soc. Ser. B 66, 275–296 (2004)
Vecchia, A.: Estimation and model identification for continuous spatial processes. J. R. Stat. Soc. Ser. B 50, 297–312 (1988)
Ver Hoef, J., Cressie, N., Barry, R.: Flexible spatial models based on the fast Fourier transform (FFT) for cokriging. J. Comput. Graph. Stat. 13, 265–282 (2004)
Walker, S., Laud, P., Zantedeschi, D., Damien, P.: Direct sampling. J. Comput. Graph. Stat. 20, 692–713 (2011)
Wikle, C., Cressie, N.: A dimension-reduced approach to space-time Kalman filtering. Biometrika 86, 815–829 (1999)
Xia, G., Gelfand, A.: Stationary process approximation for the analysis of large spatial datasets. Technical Report, Institute of Statistics and Decision Sciences, Duke University, Durham (2006)
Acknowledgements
We appreciate Dr. Avishek Chakraborty for his very useful discussions and suggestions. This work was partially supported by Award Number R01ES017240 from the National Institute of Environmental Health Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Environmental Health Sciences or the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs
Appendix: Proofs
Proof of Theorem 1
We first show that F c (ϕ) is well defined. For θ=(σ 2,τ 2) in the likelihood function L(ϕ,θ) in Equation (3), the marginal posterior of ϕ becomes,
where Θ is the support for posterior distribution of θ. Denote R(ϕ)=∫ Θ L(ϕ,θ)π(θ) d θ. We first want to verify R(ϕ) is continuous for any ϕ∈[a,b]. Since |σ 2 H(ϕ)+τ 2 I n |>τ 2n, π(θ) is product of proper priors, it then follows
-
(i)
The joint posterior for (ϕ,θ) is proper, so R(ϕ) is well defined.
-
(ii)
For any θ∈Θ and ϕ∈[a,b], L(ϕ,θ) ≤g(θ), for a function g with the property ∫ θ∈Θ g(θ)π(θ) d θ<∞.
For any sequence ϕ n →ϕ, continuity of L(ϕ n ,θ) implies pointwise convergence to L(ϕ,θ), ∀θ∈Θ. This, along with (ii) also implies, by Dominated Convergence Theorem, R(ϕ n )→R(ϕ), so R(ϕ) is continuous. For the continuous uniform prior π(ϕ), the marginal posterior cdf of ϕ turns out to be
where \(c_{1} = \int_{\phi_{\min}}^{\phi_{\max}} R(\phi)\,d\phi\). F C (ϕ)<∞, e.g., is well-defined.
For any ϵ>0, let k>(ϕ max−ϕ min)/ϵ and define
We further define \(F_{D, k}(\phi) = \frac{j - 1}{k}\) for ϕ∈E j . Apparently, for any ϕ∈(ϕ min,ϕ max), we have
Consequently, we obtain the following to complete the proof:
□
Rights and permissions
About this article
Cite this article
Yang, H., Liu, F., Ji, C. et al. Adaptive sampling for Bayesian geospatial models. Stat Comput 24, 1101–1110 (2014). https://doi.org/10.1007/s11222-013-9422-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-013-9422-4