Generalized estimating equations and regression diagnostics for longitudinal controlled clinical trials: A case study

https://doi.org/10.1016/j.csda.2011.04.010Get rights and content

Abstract

Generalized estimating equations (GEE) were proposed for the analysis of correlated data. They are popular because regression parameters can be consistently estimated even if only the mean structure is correctly specified. GEE have been extended in several ways, including regression diagnostics for outlier detection. However, GEE have rarely been used for analyzing controlled clinical trials. The SB-LOT trial, a double-blind placebo-controlled randomized multicenter trial in which the oedema-protective effect of a vasoactive drug was investigated in patients suffering from chronic insufficiency was re-analyzed using the GEE approach. It is demonstrated that the autoregressive working correlation structure is the most plausible working correlation structure in this study. The effect of the vasoactive drug is a difference in lower leg volume of 2.64 ml per week (p=0.0288, 95% confidence interval 0.27–4.99 ml per week), making a difference of 30 ml at the end of the study. Deletion diagnostics are used for identification of outliers and influential probands. After exclusion of the most influential patients from the analysis, the overall conclusion of the study is not altered. At the same time, the goodness of fit as assessed by half-normal plots increases substantially. In summary, the use of GEE in a longitudinal clinical trial is an alternative to the standard analysis which usually involves only the last follow-up. Both the GEE and the regression diagnostic techniques should accompany the GEE analysis to serve as sensitivity analysis.

Introduction

Twenty-five years ago the generalized estimating equations (GEE) for analyzing correlated non-normal data were introduced by Liang and Zeger in a series of papers (see, e.g., Liang and Zeger, 1986, Zeger and Liang, 1986). The strength of this semiparametric approach is that regression coefficients can be consistently estimated in regression models with clustered non-normally dependent variables even if the distribution is partly misspecified. Specifically, only the correct specification of the mean structure is required for consistent estimation. Variances and within-cluster correlations may be misspecified. However, the efficiency of the estimation approach generally depends on the degree of misspecification of the covariance matrix.

The GEE have been extended in several ways, and the extensions include approaches for dealing with missing data (for an overview, see, e.g., Ziegler et al., 2003), approaches for sample size calculations (reviewed in Dahmen and Ziegler, 2004), or regression diagnostics (Preisser and Qaqish, 1996, Ziegler et al., 1995). However, these extensions have rarely been used in applications, partly because of the lack of appropriate software.

The aim of this paper is therefore two-fold. First, we want to illustrate that the application of GEE to a repeated measurement intervention study can be an interesting alternative or at least a supplementation to the standard analysis which only involves the last follow-up and, possibly, adjustments for baseline measurements. Second, we aim at demonstrating that regression diagnostics should supplement the GEE analysis to serve as sensitivity analysis. For illustration, we re-analyze data from a double-blind placebo-controlled randomized multicenter trial, in which the oedema-protective effect of a vasoactive drug was investigated in patients suffering from chronic venous insufficiency after decongestion of the legs. The primary analysis was a baseline-adjusted covariance analysis (ANCOVA) between the two treatment groups (Vanscheidt et al., 2002). A secondary analysis using GEE which aimed at detecting a difference in the slopes will be presented in this paper.

The paper is organized as follows. First, we describe the SB-LOT data (Vanscheidt et al., 2002) which are re-analyzed below. Second, we give a short introduction to GEE, and we briefly discuss approaches for selecting the most plausible correlation structure. Next, we review regression diagnostic methods for GEE, which are primarily based on deletion diagnostics. Results from the re-analysis of the SB-LOT data are presented, and findings from regression diagnostics are displayed. We specifically show for this data set that the removal of outliers does not alter the overall conclusion of the study. However, the goodness of fit as assessed by half-normal plots and simulated envelopes improves.

Section snippets

The SB-LOT data

For illustration we use a parallel group design with repeated measurements. In this double-blind placebo-controlled randomized multicenter trial, the oedema-protective effect of a vasoactive drug was investigated in patients suffering from chronic venous insufficiency after decongestion of the legs (Vanscheidt et al., 2002). At the baseline, 226 patients were randomized to medical compression stockings plus SB-LOT (90 mg Coumarin and 540 mg Troxerutin per day) or medical compression stockings

Generalized estimating equations

Let n be the number of independent clusters i=1,,n, and, for simplicity, assume that there are T observations per cluster (t=1,,T). For each dependent variable yit a p-dimensional vector of independent variables xit is available. Data are collected in column vectors yi=(yi1,,yiT) and T×p dimensional matrices Xi=(xi1,,xiT).

The mean structure is assumed to be given by μit=E(yit|Xi)=E(yit|xit)=g(xitβ), where g is the non-linear response function, and g1 is the link function. As in

Choosing a reasonable working correlation structure

In applications we cannot expect the correct specification of the working correlation structure. However, if it is correctly specified, the estimator is BAN. Furthermore, the closer the working correlation structure to the true correlation structure, the more efficient the estimator is (Chaganty and Joe, 2004). If possible, the investigator should choose a specific working correlation structure for both statistical and biological reasons (Ziegler and Vens, 2010).

While it is probably intuitive

Regression diagnostics

Unusual data may substantially alter the fit of the regression model, and regression diagnostics identify subjects which might influence the regression relation substantially. Outliers in the dependent variable are termed outliers, while outliers with respect to the independent variables are termed leverage points. The effect of these is best studied by investigating the alteration of parameter estimates when an observation is omitted from the analysis. Corresponding statistics are termed

Standard GEE1 analysis

In the first step, the SB-LOT data were estimated using the AR(1) working correlation structure (Table 3) as recommended by Wang and Carey (2003). Analogously to the ANCOVA model of Vanscheidt et al. (2002), the intention to treat (ITT) analysis using the GEE showed an advantage for SB-LOT over the placebo because the slope parameter β4 was significant at the 5% test level (p=0.0288). The difference in the lower leg volume between SB-LOT and placebo increased by 2.64 ml per week (95% confidence

Discussion

The standard analysis in a longitudinal parallel group clinical trial usually involves only the last time point so that the primary analysis often is a standard two-group comparison. For continuous outcome variables, either the t-test or the U-test are often the methods of choice. If adjustments for baseline measurements are performed, an analysis of covariance (ANCOVA) is commonly chosen. The latter approach only involves the last follow-up and the baseline measurement, while other follow-ups

Acknowledgments

The authors are grateful to Dr. Hans-Heinrich Henneicke-von Zepelin for making the SB-LOT data available. We thank Silke Szymczak, Christina Loley, and Janja Nahrstaedt for discussions on the topic of the article. We also thank two anonymous referees for valuable suggestions that helped to improve our work.

References (30)

  • S. Evans et al.

    A comparison of goodness of fit tests for the logistic gee model

    Stat. Med.

    (2005)
  • K.-M. Jung

    Local influence in generalized estimating equations

    Scand. J. Statist.

    (2008)
  • K.-Y. Liang et al.

    Longitudinal data analysis using generalized linear models

    Biometrika

    (1986)
  • L.A. Mancl et al.

    Efficiency of regression estimates for clustered data

    Biometrics

    (1996)
  • W. Pan

    Akaikes information criterion in generalized estimating equations

    Biometrics

    (2001)
  • Cited by (0)

    View full text