Skip to main content

Advertisement

Log in

Spatial model fitting for large datasets with applications to climate and microarray problems

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Many problems in the environmental and biological sciences involve the analysis of large quantities of data. Further, the data in these problems are often subject to various types of structure and, in particular, spatial dependence. Traditional model fitting often fails due to the size of the datasets since it is difficult to not only specify but also to compute with the full covariance matrix describing the spatial dependence. We propose a very general type of mixed model that has a random spatial component. Recognizing that spatial covariance matrices often exhibit a large number of zero or near-zero entries, covariance tapering is used to force near-zero entries to zero. Then, taking advantage of the sparse nature of such tapered covariance matrices, backfitting is used to estimate the fixed and random model parameters. The novelty of the paper is the combination of the two techniques, tapering and backfitting, to model and analyze spatial datasets several orders of magnitude larger than those datasets typically analyzed with conventional approaches. Results will be demonstrated with two datasets. The first consists of regional climate model output that is based on an experiment with two regional and two driver models arranged in a two-by-two layout. The second is microarray data used to build a profile of differentially expressed genes relating to cerebral vascular malformations, an important cause of hemorrhagic stroke and seizures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abramowitz, M., Stegun, I.A. (eds.): Handbook of Mathematical Functions. Dover, New York (1970)

    Google Scholar 

  • Bates, D., Maechler, M.: Matrix: A Matrix package for R. R package version 0.995-12 (2006)

  • Breiman, L., Friedman, J.H.: Estimating optimal transformations for multiple regression and correlations (with discussion). J. Am. Stat. Assoc. 80, 580–619 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  • Buja, A., Hastie, T.J., Tibshirani, R.J.: Linear smoothers and additive models (with discussion). Ann. Stat. 17, 453–555 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  • Christensen, J., Christensen, O.: A summary of the PRUDENCE model projections of changes in European climate by the end of this century. Clim. Change 81, 7–30 (2007)

    Article  Google Scholar 

  • Christensen, J., Carter, T.R., Rummukainen, M.: Evaluating the performance and utility of regional climate models: the PRUDENCE project. Clim. Change 81, 1–6 (2007)

    Article  Google Scholar 

  • Cressie, N.A.C.: Statistics for Spatial Data. Wiley, New York (1993). Revised reprint

    Google Scholar 

  • Fowler, H.J., Ekström, M., Blenkinsop, S., Smith, A.P.: Estimating change in extreme European precipitation using a multimodel ensemble. J. Geophys. Res. 112, D18104 (2007)

    Article  Google Scholar 

  • Furrer, R.: Spam: sparse matrix algebra. http://www.mines.edu/~rfurrer/software/spam/ (2007)

  • Furrer, R., Genton, M.G., Nychka, D.: Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Stat. 15, 502–523 (2006)

    Article  MathSciNet  Google Scholar 

  • Furrer, R., Knutti, R., Sain, S.R., Nychka, D.W., Meehl, G.A.: Spatial patterns of probabilistic temperature change projections from a multivariate Bayesian analysis. Geophys. Res. Lett. 34, L06711 (2007a)

    Article  Google Scholar 

  • Furrer, R., Sain, S.R., Nychka, D.W., Meehl, G.A.: Multivariate Bayesian analysis of atmosphere-ocean general circulation models. Environ. Ecol. Stat. 14, 249–266 (2007b)

    Article  MathSciNet  Google Scholar 

  • Furrer, R., Sain, S.R.: Spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields. Technical Report, MCS-08-05, Colorado School of Mines, Golden, USA (2008)

  • George, A., Liu, J.W.H.: Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, Englewood Cliffs (1981)

    MATH  Google Scholar 

  • Gneiting, T.: Correlation functions for atmospheric data analysis. Q.J.R. Meteorol. Soc. 125, 2449–2464 (1999)

    Article  Google Scholar 

  • Gneiting, T.: Compactly supported correlation functions. J. Multivar. Anal. 83, 493–508 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Handcock, M.S., Stein, M.L.: A Bayesian analysis of kriging. Technometrics 35, 403–410 (1993)

    Article  Google Scholar 

  • Harville, D.A.: Matrix Algebra From a Statistician’s Perspective. Springer, New York (1997)

    MATH  Google Scholar 

  • Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1994)

    MATH  Google Scholar 

  • Ihaka, R., Gentleman, R.: R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996)

    Article  Google Scholar 

  • Kaufman, C., Sain, S.R.: Bayesian functional ANOVA modeling using Gaussian process prior distributions (2008, submitted)

  • Kitanidis, P.K.: Introduction to Geostatistics: Applications in Hydrogeology. University Press, Cambridge (1997)

    Google Scholar 

  • Koenker, R., Ng, P.: SparseM: sparse matrix package for R. http://www.econ.uiuc.edu/~roger/research/sparse/SparseM.pdf (2003)

  • Li, C., Tseng, G.C., Wong, H.W.: Model-based analysis of oligonucleotide arrays and issues in cDNA microarray analysis. In: Speed, T.P. (ed.) Statistical Analysis of Gene Expression Microarray Data, pp. 1–34. Chapman & Hall/CRC, London (2003). Chap. 1

    Google Scholar 

  • Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., Brown, E.L.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680 (1996)

    Article  Google Scholar 

  • Matérn, B.: Spatial variation: stochastic models and their application to some problems in forest surveys and other sampling investigations. Medd. Statens Skogsforsk. Inst. Stockh. 49(5) (1960)

  • Nychka, D.W.: Spatial-process estimates as smoothers. In: Schimek, M.G. (ed.) Smoothing and Regression: Approaches, Computation, and Application, pp. 393–424. Wiley, New York (2000). Chap. 13

    Google Scholar 

  • PRUDENCE: Prediction of regional scenarios and uncertainties for defining european climate change risks and effects. http://prudence.dmi.dk (2007)

  • R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org (2006)

  • Rue, H., Held, L.: Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall, London (2005)

    MATH  Google Scholar 

  • Sain, S.R., Furrer, R., Cressie, N.: Combining regional climate model output via a multivariate Markov random field model. In: 56th Session of the International Statistical Institute, Lisbon, Portugal (2007)

  • Schabenberger, O., Gotway, C.A.: Statistical Methods for Spatial Data Analysis. Chapman & Hall/CRC, London (2005)

    MATH  Google Scholar 

  • Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York (1992)

    MATH  Google Scholar 

  • Shenkar, R., Elliott, J.P., Diener, K., Gault, J., Hu, L., Cohrs, R.J., Phang, T., Hunter, L., Breeze, R.E., Awad, I.A.: Differential gene expression in human cerebrovascular malformations (with discussion). Neurosurgery 52, 465–478 (2003)

    Article  Google Scholar 

  • Speed, T.P. (ed.): Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall/CRC, New York (2003)

    MATH  Google Scholar 

  • Stein, M.L.: Uniform asymptotic optimality of linear predictions of a random field using an incorrect second-order structure. Ann. Stat. 18, 850–872 (1990)

    Article  MATH  Google Scholar 

  • Stein, M.L.: A simple condition for asymptotic optimality of linear predictions of random fields. Stat. Probab. Lett. 17, 399–404 (1993)

    Article  MATH  Google Scholar 

  • Stein, M.L.: Interpolation of Spatial Data. Springer, New York (1999a)

    MATH  Google Scholar 

  • Stein, M.L.: Predicting random fields with increasing dense observations. Ann. Appl. Probab. 9, 242–273 (1999b)

    Article  MATH  MathSciNet  Google Scholar 

  • Wang, H., He, X.: Detecting differential expressions in GeneChip microarray studies: A quantile approach. J. Am. Stat. Assoc. 102, 104–112 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Wendland, H.: Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4, 389–396 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  • Wu, Z.M.: Compactly supported positive definite radial functions. Adv. Comput. Math. 4, 283–292 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  • Zimmerman, D.L., Cressie, N.: Mean squared prediction error in the spatial linear model with estimated covariance parameters. Ann. Inst. Stat. Math. 44, 27–43 (1992)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reinhard Furrer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Furrer, R., Sain, S.R. Spatial model fitting for large datasets with applications to climate and microarray problems. Stat Comput 19, 113–128 (2009). https://doi.org/10.1007/s11222-008-9075-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-008-9075-x

Keywords

Navigation