Analysis of longitudinal data with covariate measurement error and missing responses: An improved unbiased estimating equation

https://doi.org/10.1016/j.csda.2017.11.010Get rights and content

Abstract

Because of the data collection process, measurement error and missing responses are common in longitudinal data, and correctly addressing these scenarios becomes one of main challenges in longitudinal data analysis. First, an unbiased estimating equation is proposed to improve the efficiency of parameter estimations for the marginal mean model for longitudinal data with covariate measurement error. The proposed unbiased estimating equation is asymptotically more efficient than the method in Qin et al. (2016a). Second, the proposed method can be extended to handle more complicated scenarios. Specifically, robust estimation for partially linear models with missing responses and covariate measurement error is considered. The proposed robust estimation does not require specifying the distribution of the covariate or the measurement error and is computationally easy to implement. Simulation studies are conducted to evaluate the improvement of the proposed method over existing methods (Qin et al., 2016b), and a sketch of the proof of its asymptotic property is provided. Finally, the proposed method is applied to the data from the Lifestyle Education for Activity and Nutrition (LEAN) study.

Introduction

Longitudinal studies are commonly conducted in biomedical and epidemiological research, and mis-measured covariates are typical in longitudinal data for various reasons, such as the equipments’ accuracy or the testing personnel’s skills. For example, the Lifestyle Education for Activity and Nutrition (LEAN) study (Barry et al., 2011), conducted at the University of South Carolina, Columbia, SC, USA, was designed to determine the effectiveness of an intervention to enhance weight loss over a 9-month period in sedentary overweight or obese adults. Body weight, systolic blood pressure (SBP) and diastolic blood pressure (DBP) were measured at baseline, month 4 and month 9. Both p-values from the Wald tests on replicated measurements of SBP and DBP are less than 0.001, which indicates that the variances of the measurement errors are not zero and there exit measurement errors in SBP and DBP. However, most popular methods for longitudinal data analysis require that data are measured perfectly, which in turn leads to naive approaches for handling mis-measured covariates in practice, such as ignoring covariate measurement error or treating observed values of the covariates as if they were the true values. These naive approaches for handling mis-measured covariates would in general lead to biased inferences, even in the case of simple linear regressions (Buonaccorsi, 2010).

There have been discussions on how to treat covariate measurement error appropriately in the literature. For example, in a linear regression, one can correct the bias induced by covariate measurement error through correction for attenuation. In the context of semiparametric partially linear models, Liang et al. (1999) proposed a bias-corrected estimator that is a semiparametric version of the usual parametric correction for attenuation. For longitudinal data, many existing works adopted a structural approach for parameter estimation, specifying the distribution of the error-prone covariates. Carroll et al. (2012) considered mixed models with covariate measurement error and reviewed several maximum likelihood and pseudo-maximum likelihood estimating methods. See also Rummel et al. (2010) and Lin and Carroll (2006).

Furthermore, in the LEAN study, 21% of body mass index (BMI) measures, calculated from measured height and body weight, are missing, which raises more challenges to statistical estimations. Note, most methods mentioned above may not be easily tailored to dealing with such complex data. For example, the bias term induced by mis-measured covariates may not be easily calculated when there are outliers and dropouts at the same time. For the structural approach, it is a challenge to correctly specify the distribution of the error-prone covariates. Therefore, an estimation method that does not need the distribution specification of the mis-measured covariates is preferred. Assuming replicate measurements are available, Qin et al. (2016b) proposed a robust estimation method for longitudinal data with dropouts and measurement error, which can be easily implemented with popular softwares.

The main purpose of this paper is to develop a method with improved efficiency that can be easily implemented with standard algorithms developed for generalized estimating equations (GEE). Thus, it can be used widely in practice to handle complicated longitudinal data. Assuming that replicate measurements are available, we first propose an unbiased estimating equation and show that it is asymptotically more efficient than the estimator proposed in Qin et al. (2016a). Then we move forward to a more complicated scenario where missing responses, outliers and covariate measurement error exist simultaneously. The proposed method can be easily adapted to handle such scenarios, and we show that it is more efficient than the method in Qin et al. (2016b) through simulation studies.

The rest of the paper is organized as follows. Section 2 introduces models and methods. Section 2.1 includes the linear mean models and proposed unbiased estimating equation, which is asymptotically more efficient than the method in Qin et al. (2016a). In Section 2.2, the idea in the proposed method is extended to a more complicated case with simultaneous outliers, missing responses and covariate measurement error for partially linear marginal mean models. We illustrate through simulation studies that our method improves estimation efficiency compared to the method in Qin et al. (2016b) while retaining the consistency of the estimators when dealing with such complicated data. Simulation studies are conducted in Section 3. Real data analysis is given in Section 4, and we end with concluding remarks in Section 5. The details of regularity conditions and proofs are given in the online supplementary material.

Section snippets

Linear model and measurement error process

Consider a study consisting of n subjects with m observations over time for each subject. Let Yij be the response and Xij be the p-dimensional covariate vector for the ith subject at the jth observation. For simplicity, let Yi=(Yi1,,Yim)T be the response vector and Xi=(Xi1,,Xim)T be the covariate matrix. We now consider the following linear marginal model: Yij=XijTβ0+ϵij,i=1,,n,j=1,,m,where β0 is a p-dimensional vector of unknown regression parameter, and ϵi=(ϵi1,,ϵim)T is the vector of

Simulation studies

We now conduct a simulation study to assess the performance of the proposed method when there are simultaneously response dropouts, mis-measured covariates and outliers. For comparison, we also present the estimating results of the complete-case GEE (GEE-C) method and Qin et al. (2016b)’s method (QIN). Note that the GEE-C method ignores missing responses, covariate measurement error and outliers. Our main focus is to assess the relative efficiency improvement of the proposed method compared to

Real data analysis

We use data from the Lifestyle Education for Activity and Nutrition (LEAN) study (Barry et al., 2011) that tested the effectiveness of an intervention to enhance weight loss over a 9-month period in overweight or obese adults. 197 men and women aged 18 to 64 years who were not physically active, were overweight or obese (BMI >=25) and had access to the Internet were randomly assigned to the intervention group and standard care group (control). Subjects’ age, race, gender and education level

Discussion

We have developed a method that is asymptotically more efficient than the method proposed by Qin et al. (2016a). We started developing the method for the linear mean model when there is measurement error in covariates and then extended it to more complicated scenarios. Specifically, we considered a complicated setting where outliers, missing responses and covariate measurement error exist simultaneously. Moreover, the marginal mean followed a partially linear model, i.e., the relationship

Acknowledgments

We greatly appreciate Drs. Xuemei Sui and Steven N. Blair of the University of South Carolina for providing the LEAN study dataset and for their contributions in data interpretation.

This work was partially supported by the National Natural Science Foundation of China (11371100, 11731011, 11671096), China Medical Board Collaborating Program in Health Technology Assessment (Grant 16-251) and Shanghai Leading Academic Discipline Project, Project number: B118.

References (15)

There are more references available in the full text version of this article.

Cited by (0)

View full text