SAS macro programs for geographically weighted generalized linear modeling with spatial point data: Applications to health research

https://doi.org/10.1016/j.cmpb.2011.10.006Get rights and content

Abstract

An increasing interest in exploring spatial non-stationarity has generated several specialized analytic software programs; however, few of these programs can be integrated natively into a well-developed statistical environment such as SAS. We not only developed a set of SAS macro programs to fill this gap, but also expanded the geographically weighted generalized linear modeling (GWGLM) by integrating the strengths of SAS into the GWGLM framework. Three features distinguish our work. First, the macro programs of this study provide more kernel weighting functions than the existing programs. Second, with our codes the users are able to better specify the bandwidth selection process compared to the capabilities of existing programs. Third, the development of the macro programs is fully embedded in the SAS environment, providing great potential for future exploration of complicated spatially varying coefficient models in other disciplines. We provided three empirical examples to illustrate the use of the SAS macro programs and demonstrated the advantages explained above.

Introduction

Past decades have witnessed an increasing interest in integrating a spatial analytical perspective into health research [1], [2], [3]. The main task has often been to explore the associations between regional characteristics and certain health outcomes observed in different geographic locations. Implementing this task with conventional statistical methods involves the use of a global spatial model that estimates a single regression equation with fixed parameter estimates based on the empirical data. The underlying assumption of this approach is that the associations between the response and predictor variables are homogeneous (stationary) over space [4]. While these “global” modeling procedures are developed to handle spatial dependence in the data and yield less biased estimates, little attention has been paid to investigating spatial heterogeneity or non-stationarity (where the relationships among variables are not stable across space), a “local” spatial association.

Fotheringham and colleagues developed the geographically varying coefficient models by means of local regression and smoothing techniques to explore spatial non-stationarity [4], [5], [6], [7]. Their approach is designed for observations with coordinates, particularly spatial point data. They first introduced geographically weighted regression (GWR), and then extended the framework to the context of generalized linear modeling (GLM). The former assumes that the response variable is continuous, and the error terms follow a normal distribution. The latter, referred to as geographically weighted generalized linear modeling (GWGLM) [7], expands the GWR concept by allowing the users to fit local regression models where the response variables could go beyond continuous measures and the error terms do not necessarily follow a normal distribution. Geographically weighted Poisson regression (GWPR) [7], [8] and GW Logistic regression [7] are examples of GWGLM for count and binary data, respectively. Similar to the concept of GLM, the GWR could be regarded as a Gaussian-version of GWGLM. To implement GWGLM, it is necessary to first select the kernel weighting functions and bandwidths, and then estimate the local models for each data point using the information captured within the bandwidth of the kernel [7] (see Section 2.2 and Fig. 1 for details). Different kernel and bandwidth combinations may yield different results [9]. The GWGLM provides mappable statistics that can be used to visualize the spatial patterns of the “local” relationships in the model. This advantage has led GWGLM to a broad applicability in many disciplines [8], [10], [11], [12]. More importantly, this approach to spatial non-stationarity has been identified as one of the geostatistical methods that should be promoted in health studies in light of the locality of health issues, e.g., premature mortality patterns [8] and cardiovascular mortality analysis [10], [13].

A small number of software programs can be used to perform GWR and/or GWGLM. A commonly used program is the GWR software developed by Fotheringham and colleagues. In this program's present form (GWR 3.0), the users are allowed to calibrate local constant models with continuous (normal distribution), binary (binomial distribution), and count (Poisson distribution) response variables. However, GWR 3.0 is a specialized stand alone software package that cannot be fully integrated into any existing statistical software such as Statistical Analysis System (SAS). A similar effort could be found in the Spatial Analysis in Macroecology (SAM) program [14]. ArcGIS is able to conduct GWR modeling, but the response variable is constrained to be normally distributed. Explicitly, GWGLM could not be implemented in ArcGIS. Bivand and Yu [15] provided a free “spgwr” package in R that contains two functions “gwr” and “ggwr”. The former can implement the original GWR for normally distributed outcomes and basically replicates the functions of GWR 3.0. The latter is developed to conduct GWGLM, but provides fewer functionalities and diagnostic statistics compared to GWR 3.0. In general, relatively few statistical software packages can conduct GWGLM with great flexibility in the modeling process, such as specifying kernel functions or bandwidth selection range.

SAS is a widely used software program in statistical analysis. To our knowledge, no SAS procedure is readily available to conduct GWGLM. In this paper, we develop and introduce a set of SAS macros which addresses this issue. The SAS macros allow the users to have more options calibrating the GWGLM models than both GWR 3.0 and the R package. Section 2 will elaborate on the theoretical framework of GWGLM, and Section 3 will illustrate how to apply the macros to empirical data. We will compare the results obtained from the SAS macros with those from GWR 3.0 and R.

Section snippets

GWGLM framework

The GWGLM is extended from the GWR based on the theory of GLM [16]. Let Yi, i = 1, …, n, be the response observations collected from location i in space. The corresponding geospatial covariate vector is Xi = (1, Xi1, Xi2, …, Xip)t of dimensions (p + 1) including the constant 1 for intercept. Suppose that the distribution of Yi is a probability density function (for continuous cases) or probability mass function (for discrete cases) that depends on the location parameter θi and scale parameter ϕi. The

Computing SAS programs

In this section, we demonstrate how to conduct GWGLM using the SAS programs, which we developed in SAS version 9.2. Our programs should work well with version 9.2 or higher. Users are advised to check their SAS version prior to the use of the programs.

Applications

We applied our SAS macro programs to one educational attainment and two health data sets where the response variables could be modeled with GWR, GWPR and GW Logistic regressions, which are the GWGLM procedures for continuous and discrete response variables. The analytic results from our SAS programs were also compared with those from the GWR 3.0 and/or R. The data sets for the following examples are available at http://help.pop.psu.edu/gia-resources/gwr-paper-materials and //sas-for-gwglm.blogspot.com/

Discussion

This paper illustrates how SAS users can explore and analyze spatial point data using macro programs for geographically weighted generalized linear models introduced by Fotheringham et al. [7]. Several features render our SAS macro programs easier to use and more flexible in model specifications over the standard software GWR 3.0. Next, we summarize the three most significant differences between our SAS macro programs and GWR 3.0.

First, in non-parametric statistics theory, several kernel

References (29)

  • A.S. Fotheringham et al.

    Geographically Weighted Regression: The Analysis of Spatially Varying Relationships

    (2002)
  • T. Nakaya et al.

    Geographically weighted Poisson regression for disease association mapping

    Statistics in Medicine

    (2005)
  • L. Guo et al.

    Comparison of bandwidth selection in application of geographically weighted regression: a case study

    Canadian Journal of Forest Research

    (2008)
  • M.D. Partridge et al.

    The geographic diversity of US nonmetropolitan growth dynamics: a geographically weighted regression approach

    Land Economics

    (2008)
  • Cited by (0)

    View full text