Marginal maximum likelihood estimation of SAR models with missing data

doi:10.1016/j.csda.2017.11.004

Computational Statistics & Data Analysis

Volume 120, April 2018, Pages 98-110

https://doi.org/10.1016/j.csda.2017.11.004 Get rights and content

Abstract

Maximum likelihood (ML) estimation of simultaneous autocorrelation models is well known. Under the presence of missing data, estimation is not straightforward, due to the implied dependence of all units. The EM algorithm is the standard approach to accomplish ML estimation in this case. An alternative approach is considered, the method of maximising the marginal likelihood. At first glance the method is computationally complex due to inversion of large matrices that are of the same size as the complete data, but these can be avoided, leading to an algorithm that is usually much faster than the EM algorithm and without typical EM convergence issues. Another approximate method is also proposed that serves as an alternative, for example when the contiguity matrix is dense. The methods are illustrated using a well known data set on house prices with 25,357 units.

Introduction

Simultaneous autoregressive models (SAR) are popular linear regression models for spatially distributed data that take the dependence of the response variable of neighbouring units into account. Maximum likelihood (ML) estimation for SAR models is well established (Ord, 1975). The dependence of neighbouring units is represented by the contiguity matrix $W$ , which has non-zero entries $W_{i j}$ if $i$ is close to $j$ . Various choices of how to define $W$ are available. The $W$ of size $n \times n$ matrix refers to the $n$ units of interest. If complete data are observed for a response variable $Y$ of interest, then ML estimation can be accomplished relatively effectively following the methods proposed by Ord (1975). However when missing data are present, i.e. the response variable $Y$ is only observed for a subset of $n_{s}$ units with $n_{s} < n$ , estimation becomes more difficult as the contiguity matrix refers to $n$ units while $Y$ refers to $n_{s}$ units. ML estimation for this situation has been investigated by Lesage and Pace (2004) using the EM algorithm.

Kato (2013) considered estimation and prediction of several spatial covariance models, including SAR models, using the EM algorithm and also a quasi-likelihood method for estimation. Goulard et al. (2017) used the EM algorithm to investigate the performance of various in-sample and out-of-sample predictors, but also used an ML algorithm. Computational improvements of the EM algorithm based on Lesage and Pace (2004) have been considered by Suesse and Zammit Mangion (2017).

Other estimation methods for SAR models under the presence of missing data have been extensively studied using, for example, the generalised method of moments and least squares (Wang and Lee, 2013) and approximate Bayesian methods using Integrated Laplace (INLA) approximation Bivand et al. (2014), Gomez-Rubio et al. (2015), Gomez-Rubio et al. (2017).

In this paper I follow a different approach for ML estimation, by maximising the marginal likelihood directly instead of maximising it indirectly via the EM algorithm, leading to a generally much faster algorithm by applying some computational tricks. The proposed method is a true marginal ML method and requires out-of-sample information, which is different from the ML methods used by Kato (2013) and Goulard et al. (2017) that only use in-sample information. The proposed algorithm also does not suffer from convergence problems, that are common and well known for the EM algorithm (McLachlan and Krishnan, 2008), in particular when the sample size is very small.

Both the EM algorithm and the marginal ML approach generally lead to the same ML estimates, provided convergence to a global maximum has been achieved. In analogy Laird and Ware (1982) considered ML estimation via the EM algorithm for linear mixed models, whereas Lindstrom and Bates (1988) considered maximising the marginal likelihood. Both are also common techniques for restricted ML estimation, a method to achieve unbiased variance estimates.

In Section 2, the spatial models under consideration are introduced. Then in Section 3 standard ML estimation of complete data is reviewed. In Section 4 I introduce an exact method of maximising the marginal likelihood and an alternative approximate method. Section 4 presents a simulation study to investigate the performance of the various methods and Section 5 applies the methods to a well known data set on house prices by also illustrating typical computation times and convergence issues of the EM algorithm for small sampling ratios. The article concludes with a discussion.

Section snippets

SAR models

There are two main SAR models, one where the spatial dependence is directly incorporated into the equation for the response variable, in the following referred to as spatial autoregressive model (SAM) model, and the model where the spatial dependence is incorporated into the error term, the spatial errors model (SEM), using the convention of Lesage and Pace (2004).

Let $y = (y_{1}, \dots, y_{n})^{⊤}$ be the $n$ -vector of the response variable, $X$ be the $n \times p$ design matrix containing the explanatory variables and $W$ be

Maximum likelihood estimation with complete data

In the following it is assumed that the complete response vector $y \equiv (y_{s}^{⊤}, y_{u}^{⊤})^{⊤}$ is not fully observed, because $y_{u}$ is unobserved and assumed to be missing at random (Little and Rubin, 2002), and only $y_{s}$ is available.

The marginal likelihood of $y_{s}$ under this situation can be obtained by integrating over the unobserved data by $f (y_{s}; θ) = \int f (y; θ) d y_{u},$ where $f (y)$ is the density of the complete data and $θ = (β^{⊤}, ρ, σ^{2})^{⊤}$ contains the unknown parameters. Lesage and Pace (2004) circumvented dealing with the

Marginal log-likelihood

While integration in Eq. (4) may appear difficult, the distribution of $y$ is multivariate normal, consequently $y_{s}$ is also multivariate normal with mean $μ_{s}$ and variance $ω V_{s s}$ . For example this follows from the general well-known result: when $y \sim N (μ, Σ)$ , then $A y \sim N (A μ, A Σ A^{⊤})$ for any full rank matrix $A$ . When $A$ is a matrix of ones and zeros, such that $y_{s} = A y$ , then the result $y_{s} \sim N (μ_{s}, ω V_{s s})$ follows.

This result gives immediately the marginal log-likelihood by replacing $V$ by $V_{s s}$ , $μ$ by $μ_{s}$ , $y$ by $y_{s}$ and $n$ by $n_{s}$

Simulation study

In theory the EM algorithm and the marginal ML method produce the same estimates. However often non-convergence of the EM algorithm can yield different estimates. When an underlying optimisation problem has multiple local maxima, then using local optimisers, such as Newton–Raphson, to find the global maximum can also cause issues with the marginal ML method. A simulation study is conducted in order (i) to demonstrate that the EM algorithm and the proposed marginal ML method are equivalent by

Example

The Lucas County (Ohio, USA) housing data consist of $n = 25, 357$ observations of single family homes sold in the period 1993–1998. The data set is part of the R package spdep (Bivand, 2017) and are fully described in the Spatial Econometrics toolbox for Matlab, see http://www.spatial-econometrics.com/html/jplv7.zip. The data have been used by Bivand (2010) to compare several software packages for fitting spatial regression models. Bivand (2010) used log houseprice ( $log (p r i c e)$ ) as the response

Conclusion

I presented an alternative method for obtaining ML estimates of SAR models under the presence of missing data using the direct approach of maximising the marginal likelihood instead of its indirect maximisation via the EM algorithm. The method is worthwhile applying when $W$ is sparse and when $n$ is not too large, and as long as a Cholesky factorisation of the $n_{u} \times n_{u}$ matrix $M_{u u}$ can be calculated quickly. The method is attractive because it is widely known that the EM algorithm may converge slowly

Acknowledgements

I would like to thank NIASRA for providing and in particular Clint Shumack for guidance of the high performance cluster to run the simulation and to fit the large data set. I am also grateful to the referees for their valuable feedback that greatly improved the paper.

References (26)

BivandR.S. et al.
Approximate bayesian inference for spatial econometrics models
Spatial Stat.
(2014)
Gomez-RubioV. et al.
A new latent class to fit spatial econometrics models with integrated nested laplace approximations
Procedia Environ. Sci.
(2015)
HarrisonD. et al.
Hedonic housing prices and the demand for clean air
J. Environ. Econom. Manage.
(1978)
LiH. et al.
One-step estimation of spatial dependence parameters: Properties and extensions of the APLE statistic
J. Multivariate Anal.
(2012)
BatesD. et al.
Matrix: Sparse and Dense Matrix Classes and Methods
(2017)
BivandR.
Comparing estimation methods for spatial econometrics techniques using r
Bivand, R., spdep: Spatial Dependence: Weighting Schemes, Statistics and Models R package version 0.5-82, 2017...
BrentR.
Algorithms for Minimization without Derivatives
(1973)
Gomez-Rubio, V., Bivand, R.S., Rue, H., (2017) Estimating spatial econometrics models with integrated nested laplace...
GoulardM. et al.
About predictions in spatial autoregressive models: Optimal and almost optimal strategies
Spatial Econom. Anal.
(2017)

HarvilleD.

Matrix Algebra From a Statistician’s Perspective

(1997)

KatoT.

Usefulness of the information contained in the prediction sample for the spatial error model

J. Real Estate Finance Econom.

(2013)

LairdN.M. et al.

Random-effects models for longitudinal data

Biometrics

(1982)

Cited by (13)

Spatial linear discriminant analysis approaches for remote-sensing classification
2023, Spatial Statistics
Linear Discriminant Analysis (LDA) is a popular and simple classification tool that often outperforms more sophisticated modern machine learning techniques in remote sensing. We introduce a novel LDA method that uses spatial autocorrelation of all pixels of an object to be classified but also of other objects of the training set that are spatially close to improve classification performance. To simplify spatial modelling and model fitting, the methodology is applied to the transformed feature vectors. We term this method conditional spatial LDA. Much alike universal Kriging in geostatistical interpolation, the combined use of feature data and conditioning on labelled training data in conditional spatial LDA was best able to exploit the available geospatial data. The method is illustrated on a crop classification case study from the Aconcagua agricultural region in central Chile.
IPW-based robust estimation of the SAR model with missing data
2021, Statistics and Probability Letters
Citation Excerpt :
In fact, it results in inconsistent estimators, such as the naive ordinary least square (OLS) estimator and the naive 2SLS estimator. The likelihood-based approach (Goulard et al., 2017; Suesse and Zammit-Mangion, 2017; Suesse, 2018) would also derive an inconsistent estimator under the misspecified distribution. In view of robustness, Wang and Lee (2013) present the generalized method of moments (GMM) estimator, the nonlinear least squares (NLS) estimator, and the 2SLS estimator with imputation (I2SLS), which all are consistent and asymptotically normally distributed.
Statistical Inference on Hierarchical Simultaneous Autoregressive Models with Missing Data
2024, arXiv
Estimation of Semiparametric Spatial Autoregressive Model with Missing Data
2023, Acta Mathematica Sinica, Chinese Series
Spatial Data Science: With Applications in R
2023, Spatial Data Science: With Applications in R
Missing Data Estimation and Imputation Algorithm for Wireless Sensor Network Applications
2022, 2022 International Conference on Computer Communication and Informatics, ICCCI 2022

View all citing articles on Scopus

View full text

Marginal maximum likelihood estimation of SAR models with missing data

Abstract

Introduction

Section snippets

SAR models

Maximum likelihood estimation with complete data

Marginal log-likelihood

Simulation study

Example

Conclusion

Acknowledgements

Spatial Stat.

Procedia Environ. Sci.

J. Environ. Econom. Manage.

J. Multivariate Anal.

Matrix: Sparse and Dense Matrix Classes and Methods

Comparing estimation methods for spatial econometrics techniques using r

Algorithms for Minimization without Derivatives

About predictions in spatial autoregressive models: Optimal and almost optimal strategies

Spatial Econom. Anal.

Matrix Algebra From a Statistician’s Perspective

Usefulness of the information contained in the prediction sample for the spatial error model

J. Real Estate Finance Econom.

Random-effects models for longitudinal data

Biometrics