Abstract
This paper proposes a geostatistical hedonic price model in which the effects of location on house values are explicitly modeled. The proposed geostatistical approach, namely area-to-point Kriging with External Drift (A2PKED), can take into account spatial dependence and spatial heteroskedasticity, if they exist. Furthermore, this approach has significant implications in situations where exhaustive area-averaged housing price data are available in addition to a subset of individual housing price data. In the case study, we demonstrate that A2PKED substantially improves the quality of predictions using apartment sale transaction records that occurred in Seoul, South Korea, during 2003. The improvement is illustrated via a comparative analysis, where predicted values obtained from different models, including two traditional regression-based hedonic models and a point-support geostatistical model, are compared to those obtained from the A2PKED model.






Similar content being viewed by others
References
Anselin L (2002) Under the hood: issues in the specification and interpretation of spatial regression models. Agric Econ 27:247–267
Anselin L (1998) Spatial econometrics: methods and models. Kluwer, Dordrecht
Basu S, Thibodeau TG (1998) Analysis of spatial autocorrelation in house prices. J Real Estate Financ Econ 17:61–85
Black SE (1999) Do better schools matter? Parental valuation of elementary education. Quart J Econ 114(2):577–599
Can A (1990) The measurement of neighborhood dynamics in urban house prices. Econ Geogr 66(3):254–272
Can A, Megbolugbe I (1997) Spatial dependence and house price index construction. J Real Estate Financ Econ 14:203–222
Chica-Olmo J (2007) Prediction of housing location price by a multivariate spatial method: cokriging. J Real Estate Res 29(2):233–254
Chica-Olmo J (1995) Spatial estimation of housing prices and locational rent. Urban Stud 32(8):1331–1344
Chilès JP, Delfiner P (1999) Geostatistics: modeling spatial uncertainty. Wiley, New York
Cressie N (1993) Statistics for spatial data. Wiley, New York
Dubin RA (1988) Estimation of regression coefficients in the presence of spatially autocorrelated error terms. Rev Econ Stat 70:466–474
Dubin RA (1992) Spatial autocorrelation and neighborhood quality. Region Sci Urban Econ 22:433–452
Dubin RA (1998) Predicting house prices using multiple listings data. J Real Estate Financ Econ 17:35–59
Dubin RA, Pace RK, Thibodeau TG (1999) Spatial augoregression techniques for real estate data. J Real Estate Lit 7:79–95
Gelfand AE, Ecker MD, Knight JR, Sirmans CF (2004) The dynamics of location in home price. Real Estate Financ Econ 29(2):149–166
Goodman AC, Thibodeau TG (1998) Housing market segmentation. J Hous Econ 7:121–143
Goovaerts P (2005) Geostatistical analysis of disease data: estimation of cancer mortality risk from empirical frequencies using Poisson kriging. Int J Health Geogr 4(31)
Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York
Gotway CA, Young LJ (2002) Combining incompatible spatial data. J Am Stat Assoc 97(458):632–648
Jones K, Bullen N (1994) Contextual models of urban house prices: a comparison of fixed- and random-coefficient models developed by expansion. Econ Geogr 70(3):252–272
Journel AG, Huijbregts ChJ (1978) Mining geostatistics. Academic Press, New York
Kim CW, Phipps TT, Anselin L (2003) Measuring the benefits of air quality improvement: a spatial hedonic approach. J Environ Econ Manage 45:24–39
Kutner MH, Nachtsheim CJ, Neter J, Li W (1974) Applied linear statistical models, 5th edn. McGraw-Hill, London
Kyriakidis PC (2004) A geostatistical framework for the area-to-point spatial interpolation. Geogr Anal 36(3):41–50
Kyriakidis PC, Goodchild MF (2006) On the prediciton error of variance of three common spatial interpolation schemes. Int J Geogr Inform Sci 20(8):823–855
LeSage JP, Pace RK (2004a) Models for spatially dependent missing data. J Real Estate Financ Econ 29(2):233–254
LeSage JP, Pace RK (eds) (2004b) Spatial and spatiotemporal econometrics. Elsevier, Oxford
Montgomery CC, Peck EA, Vining GG (2001) Introduction to linear regression analysis. Wiley, New York
Neuman SP, Jacobson EA (1984) Analysis of nonintrinsic spatial variability by residual kriging with application to regional ground water levels. Math Geol 16(5):499–521
Orford S (2000) Modelling spatial structures in local housing market dynamics: a multilevel perspective. Urban Stud 37(9):1643–1671
Pace RK, Gilley OW (1998) Generalizing the OLS and grid estimators. Real Estate Econ 26:331–347
Páez A, Uchida T, Miyamoto K (2001) Spatial association and heterogeneity issues in spatial association and heterogeneity issues in land price models. Urban Stud 38(9):1493–1508
Páez A, Long F, Farber S (2008) Moving window approaches for hedonic price estimation: An empirical comparison of modeling techniques. Urban Stud 45(8):1565–1581
Ripley BD (1981) Spatial statistics. Wiley, New York
Rosen S (1974) Hedonic prices and implicit markets: product differentiation in pure competition. J Polit Econ 82(1):34–55
Yoo E-H, Kyriakidis PC (2008) Area-to-point predictions under boundary conditions. Geogr Anal 40(4):355–379
Acknowledgments
The P. C. Kyriakidis acknowledges funding provided by the National Geospatial Intelligence Agency (NGA) under award: HM1582-07-2020.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: A2PKED as a form of generalized linear regression
Given both n point support data and K area-averaged data, the A2PKED prediction of the unknown value at a location u p (see Eq. 2 for the original form) can be rewritten in the form of a linear regression model with correlated residuals, i.e., a point prediction of mean response and the residual, as (Chilès and Delfiner 1999):
where \(\tilde{{\varvec{\upeta}}}_p\) and \(\tilde{{\varvec{\lambda}}}_p\) denote, respectively, the area-to-point Simple Kriging (A2PSK) weights for n point data residuals and the K areal data residuals.
The GLS estimator of regression coefficients \({\varvec{\upbeta}}\) can be obtained as a function of the spatial correlation model of the underlying residuals \({\varvec{\Upsigma}}_R\) and the design matrix F as (Cressie 1993, p 167):
with
Note that when \({\varvec{\Upsigma}}_R\) is diagonal, with constant entries along its main diagonal, the drift prediction obtained from Eq. 1 coincides with that of OLS. Again, the unique solution of estimate \({\hat{\varvec{\upbeta}}}\) requires the ((n + K) × (n + K)) matrix of variogram values of residuals \({\varvec{\Upsigma}}_R\) to be non-singular and the ((n + K) × (M + 1)) design matrix F to be of full column rank.
Once the drift component is predicted as \({\hat \mu}({{\mathbf{u}}}_p) = {\hat {\varvec{\upbeta}}}^T{{\mathbf{f}}}_p^{{\mathbf{u}}}\) using the GLS estimator of regression coefficients \({\hat{\varvec{\upbeta}}},\) we can predict unknown residual component \({\hat r}({{\mathbf{u}}}_p)\) at the prediction location u p using A2PSK. The A2PSK prediction at u p is a weighted linear combination of the n “point data” residuals \({{\mathbf{r}}}_{{\mathbf{u}}} = [r({{\mathbf{u}}}_i) = z({{\mathbf{u}}}_i) - \sum_{m=0}^M f_m({{\mathbf{u}}}_i){\hat \upbeta}_m,\;i = 1, \ldots, n]^T\) and the “areal data” residuals \({{\mathbf{r}}}_s = [r(s_k) = z(s_k) - \sum_{m=0}^M f_m(s_k){\hat{\upbeta}}_m, k = 1, \ldots, K]^T\) as:
Here, the A2PSK weights \(\tilde{{\varvec{\upeta}}}_p = [{\tilde{\eta}}_p({{\mathbf{u}}}_i), i=1, \ldots, n]^T\) for the n point data and the weight \(\tilde{\varvec{\lambda}}_p = [\tilde \lambda_p(s_k), k = 1, \ldots, K]^T\) for the areal data are determined per solution of a A2PSK system similar to that of Eq. 4 as:
where the covariance model of residuals involved in above A2PSK system is assumed to be identical to that used in Eq. 4.
In summary, A2PKED is equivalent to optimum drift estimation followed by area-to-point simple Kriging of the residuals from this drift estimate, as if the mean were known. The A2PKED prediction error variance accounts for the fact that the drift is actually unknown but estimated.
Appendix 2: Estimation of a local drift coefficients in A2PKED
Consider the task of predicting the unknown value at location u p using n point data and a subset of the areal data instead of all K such data. This modification of the original A2PKED system in Eq. 4 may be necessary when dealing with large data sets of point support as well as areal support or when statistical modeling of spatial variation, i.e., spatially varying regression model coefficients, is adopted.
In our case study, we include only one areal datum in which the prediction location u p falls, which may affect the efficiency of the GLS estimator as well as the residual component. This change of the original system may yield some problems in the design matrix, particularly when dummy variables associated with areal data are included. However, this change does not destroy the desirable properties of Kriging prediction, such as unbiasedness and minimum prediction error variance. In what follows, we present the A2PKED prediction at location u p as a linear regression form with correlated residuals based on n point data and a single areal datum:
where \({\tilde \eta}_p({{\mathbf{u}}}_i)\) and \({\tilde \lambda}_p(s_k)\) denote, respectively, the A2PSK weight assigned to the ith point support residual \(r({{\mathbf{u}}}_p) = z({{\mathbf{u}}}_p)-{\hat \mu}({{\mathbf{u}}}_p)\) and the kth areal residual \(r(s_k) = z(s_k) - {\hat \mu}(s_k)\) at the region s k in which the prediction location u p falls. Note that the A2PSK weights for point data and the single areal datum in Eq. 14 need to be updated at each prediction location as the areal datum associated with each prediction location is subject to change. This amounts to estimate a spatially varying (local) linear drift component whose GLS regression coefficients are constant within each neighborhood, but vary from one area to another. For example, the GLS regression coefficients \({\hat{\upbeta}}_m({{\mathbf{u}}}_p)\) with m = 0,…, M at location u p are different from those \({\hat \upbeta}_m({{\mathbf{u}}}_{p'})\) at location u p′ if u p′ ∈ s k′ for k ≠ k′.
Typically, the application of GLS to hedonic price models assumes that the relationship between house price and covariates is fixed over the study area. The proposed approach takes into account heteroskedasticity present in house prices so that the implicit price of housing attributes varies spatially by submarkets defined by areal units (Orford 2000).
Rights and permissions
About this article
Cite this article
Yoo, EH., Kyriakidis, P.C. Area-to-point Kriging in spatial hedonic pricing models. J Geogr Syst 11, 381–406 (2009). https://doi.org/10.1007/s10109-009-0090-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10109-009-0090-z