Generalized estimating equations and regression diagnostics for longitudinal controlled clinical trials: A case study
Introduction
Twenty-five years ago the generalized estimating equations (GEE) for analyzing correlated non-normal data were introduced by Liang and Zeger in a series of papers (see, e.g., Liang and Zeger, 1986, Zeger and Liang, 1986). The strength of this semiparametric approach is that regression coefficients can be consistently estimated in regression models with clustered non-normally dependent variables even if the distribution is partly misspecified. Specifically, only the correct specification of the mean structure is required for consistent estimation. Variances and within-cluster correlations may be misspecified. However, the efficiency of the estimation approach generally depends on the degree of misspecification of the covariance matrix.
The GEE have been extended in several ways, and the extensions include approaches for dealing with missing data (for an overview, see, e.g., Ziegler et al., 2003), approaches for sample size calculations (reviewed in Dahmen and Ziegler, 2004), or regression diagnostics (Preisser and Qaqish, 1996, Ziegler et al., 1995). However, these extensions have rarely been used in applications, partly because of the lack of appropriate software.
The aim of this paper is therefore two-fold. First, we want to illustrate that the application of GEE to a repeated measurement intervention study can be an interesting alternative or at least a supplementation to the standard analysis which only involves the last follow-up and, possibly, adjustments for baseline measurements. Second, we aim at demonstrating that regression diagnostics should supplement the GEE analysis to serve as sensitivity analysis. For illustration, we re-analyze data from a double-blind placebo-controlled randomized multicenter trial, in which the oedema-protective effect of a vasoactive drug was investigated in patients suffering from chronic venous insufficiency after decongestion of the legs. The primary analysis was a baseline-adjusted covariance analysis (ANCOVA) between the two treatment groups (Vanscheidt et al., 2002). A secondary analysis using GEE which aimed at detecting a difference in the slopes will be presented in this paper.
The paper is organized as follows. First, we describe the SB-LOT data (Vanscheidt et al., 2002) which are re-analyzed below. Second, we give a short introduction to GEE, and we briefly discuss approaches for selecting the most plausible correlation structure. Next, we review regression diagnostic methods for GEE, which are primarily based on deletion diagnostics. Results from the re-analysis of the SB-LOT data are presented, and findings from regression diagnostics are displayed. We specifically show for this data set that the removal of outliers does not alter the overall conclusion of the study. However, the goodness of fit as assessed by half-normal plots and simulated envelopes improves.
Section snippets
The SB-LOT data
For illustration we use a parallel group design with repeated measurements. In this double-blind placebo-controlled randomized multicenter trial, the oedema-protective effect of a vasoactive drug was investigated in patients suffering from chronic venous insufficiency after decongestion of the legs (Vanscheidt et al., 2002). At the baseline, 226 patients were randomized to medical compression stockings plus SB-LOT (90 mg Coumarin and 540 mg Troxerutin per day) or medical compression stockings
Generalized estimating equations
Let be the number of independent clusters , and, for simplicity, assume that there are observations per cluster . For each dependent variable a -dimensional vector of independent variables is available. Data are collected in column vectors and dimensional matrices .
The mean structure is assumed to be given by where is the non-linear response function, and is the link function. As in
Choosing a reasonable working correlation structure
In applications we cannot expect the correct specification of the working correlation structure. However, if it is correctly specified, the estimator is BAN. Furthermore, the closer the working correlation structure to the true correlation structure, the more efficient the estimator is (Chaganty and Joe, 2004). If possible, the investigator should choose a specific working correlation structure for both statistical and biological reasons (Ziegler and Vens, 2010).
While it is probably intuitive
Regression diagnostics
Unusual data may substantially alter the fit of the regression model, and regression diagnostics identify subjects which might influence the regression relation substantially. Outliers in the dependent variable are termed outliers, while outliers with respect to the independent variables are termed leverage points. The effect of these is best studied by investigating the alteration of parameter estimates when an observation is omitted from the analysis. Corresponding statistics are termed
Standard GEE1 analysis
In the first step, the SB-LOT data were estimated using the AR(1) working correlation structure (Table 3) as recommended by Wang and Carey (2003). Analogously to the ANCOVA model of Vanscheidt et al. (2002), the intention to treat (ITT) analysis using the GEE showed an advantage for SB-LOT over the placebo because the slope parameter was significant at the 5% test level (). The difference in the lower leg volume between SB-LOT and placebo increased by 2.64 ml per week (95% confidence
Discussion
The standard analysis in a longitudinal parallel group clinical trial usually involves only the last time point so that the primary analysis often is a standard two-group comparison. For continuous outcome variables, either the -test or the -test are often the methods of choice. If adjustments for baseline measurements are performed, an analysis of covariance (ANCOVA) is commonly chosen. The latter approach only involves the last follow-up and the baseline measurement, while other follow-ups
Acknowledgments
The authors are grateful to Dr. Hans-Heinrich Henneicke-von Zepelin for making the SB-LOT data available. We thank Silke Szymczak, Christina Loley, and Janja Nahrstaedt for discussions on the topic of the article. We also thank two anonymous referees for valuable suggestions that helped to improve our work.
References (30)
- et al.
A SAS/IML software program for GEE and regression diagnostics
Comput. Statist. Data Anal.
(2006) - et al.
Model diagnostic plots for repeated measures data using the generalized estimating equations approach
Comput. Statist. Data Anal.
(2008) - et al.
Alternative computational formulae for generalized linear model diagnostics: identifying influential observations with SAS software
Comput. Statist. Data Anal.
(2005) - et al.
Local influence in estimating equations
Comput. Statist. Data Anal.
(2011) - et al.
The mean-shift outlier model in general weighted regression and its applications
Comput. Statist. Data Anal.
(1999) Two graphical displays for outlying and influential observations in regression
Biometrika
(1981)- et al.
Regression Diagnostics: Identifying Influential Data and Sources of Collinearity
(1980) - et al.
Efficiency of generalized estimating equations for binary responses
J. R. Stat. Soc. Ser. B Stat. Methodol.
(2004) - et al.
Residuals and Influence in Regression
(1982) - et al.
Generalized estimating equations in controlled clinical trials: hypotheses testing
Biom. J.
(2004)