Analysis of longitudinal data with covariate measurement error and missing responses: An improved unbiased estimating equation
Introduction
Longitudinal studies are commonly conducted in biomedical and epidemiological research, and mis-measured covariates are typical in longitudinal data for various reasons, such as the equipments’ accuracy or the testing personnel’s skills. For example, the Lifestyle Education for Activity and Nutrition (LEAN) study (Barry et al., 2011), conducted at the University of South Carolina, Columbia, SC, USA, was designed to determine the effectiveness of an intervention to enhance weight loss over a 9-month period in sedentary overweight or obese adults. Body weight, systolic blood pressure (SBP) and diastolic blood pressure (DBP) were measured at baseline, month 4 and month 9. Both p-values from the Wald tests on replicated measurements of SBP and DBP are less than 0.001, which indicates that the variances of the measurement errors are not zero and there exit measurement errors in SBP and DBP. However, most popular methods for longitudinal data analysis require that data are measured perfectly, which in turn leads to naive approaches for handling mis-measured covariates in practice, such as ignoring covariate measurement error or treating observed values of the covariates as if they were the true values. These naive approaches for handling mis-measured covariates would in general lead to biased inferences, even in the case of simple linear regressions (Buonaccorsi, 2010).
There have been discussions on how to treat covariate measurement error appropriately in the literature. For example, in a linear regression, one can correct the bias induced by covariate measurement error through correction for attenuation. In the context of semiparametric partially linear models, Liang et al. (1999) proposed a bias-corrected estimator that is a semiparametric version of the usual parametric correction for attenuation. For longitudinal data, many existing works adopted a structural approach for parameter estimation, specifying the distribution of the error-prone covariates. Carroll et al. (2012) considered mixed models with covariate measurement error and reviewed several maximum likelihood and pseudo-maximum likelihood estimating methods. See also Rummel et al. (2010) and Lin and Carroll (2006).
Furthermore, in the LEAN study, 21% of body mass index (BMI) measures, calculated from measured height and body weight, are missing, which raises more challenges to statistical estimations. Note, most methods mentioned above may not be easily tailored to dealing with such complex data. For example, the bias term induced by mis-measured covariates may not be easily calculated when there are outliers and dropouts at the same time. For the structural approach, it is a challenge to correctly specify the distribution of the error-prone covariates. Therefore, an estimation method that does not need the distribution specification of the mis-measured covariates is preferred. Assuming replicate measurements are available, Qin et al. (2016b) proposed a robust estimation method for longitudinal data with dropouts and measurement error, which can be easily implemented with popular softwares.
The main purpose of this paper is to develop a method with improved efficiency that can be easily implemented with standard algorithms developed for generalized estimating equations (GEE). Thus, it can be used widely in practice to handle complicated longitudinal data. Assuming that replicate measurements are available, we first propose an unbiased estimating equation and show that it is asymptotically more efficient than the estimator proposed in Qin et al. (2016a). Then we move forward to a more complicated scenario where missing responses, outliers and covariate measurement error exist simultaneously. The proposed method can be easily adapted to handle such scenarios, and we show that it is more efficient than the method in Qin et al. (2016b) through simulation studies.
The rest of the paper is organized as follows. Section 2 introduces models and methods. Section 2.1 includes the linear mean models and proposed unbiased estimating equation, which is asymptotically more efficient than the method in Qin et al. (2016a). In Section 2.2, the idea in the proposed method is extended to a more complicated case with simultaneous outliers, missing responses and covariate measurement error for partially linear marginal mean models. We illustrate through simulation studies that our method improves estimation efficiency compared to the method in Qin et al. (2016b) while retaining the consistency of the estimators when dealing with such complicated data. Simulation studies are conducted in Section 3. Real data analysis is given in Section 4, and we end with concluding remarks in Section 5. The details of regularity conditions and proofs are given in the online supplementary material.
Section snippets
Linear model and measurement error process
Consider a study consisting of subjects with observations over time for each subject. Let be the response and be the -dimensional covariate vector for the th subject at the th observation. For simplicity, let be the response vector and be the covariate matrix. We now consider the following linear marginal model: where is a -dimensional vector of unknown regression parameter, and is the vector of
Simulation studies
We now conduct a simulation study to assess the performance of the proposed method when there are simultaneously response dropouts, mis-measured covariates and outliers. For comparison, we also present the estimating results of the complete-case GEE (GEE-C) method and Qin et al. (2016b)’s method (QIN). Note that the GEE-C method ignores missing responses, covariate measurement error and outliers. Our main focus is to assess the relative efficiency improvement of the proposed method compared to
Real data analysis
We use data from the Lifestyle Education for Activity and Nutrition (LEAN) study (Barry et al., 2011) that tested the effectiveness of an intervention to enhance weight loss over a 9-month period in overweight or obese adults. 197 men and women aged 18 to 64 years who were not physically active, were overweight or obese (BMI ) and had access to the Internet were randomly assigned to the intervention group and standard care group (control). Subjects’ age, race, gender and education level
Discussion
We have developed a method that is asymptotically more efficient than the method proposed by Qin et al. (2016a). We started developing the method for the linear mean model when there is measurement error in covariates and then extended it to more complicated scenarios. Specifically, we considered a complicated setting where outliers, missing responses and covariate measurement error exist simultaneously. Moreover, the marginal mean followed a partially linear model, i.e., the relationship
Acknowledgments
We greatly appreciate Drs. Xuemei Sui and Steven N. Blair of the University of South Carolina for providing the LEAN study dataset and for their contributions in data interpretation.
This work was partially supported by the National Natural Science Foundation of China (11371100, 11731011, 11671096), China Medical Board Collaborating Program in Health Technology Assessment (Grant 16-251) and Shanghai Leading Academic Discipline Project, Project number: B118.
References (15)
- et al.
Bivariate tensor-product b-splines in a partly linear model
J. Multivariate Anal.
(1996) - et al.
Simultaneous mean and covariance estimation of partially linear models for longitudinal data with missing responses and covariate measurement error
Comput. Statist. Data Anal.
(2016) - et al.
An energy-minimization framework for monotonic cubic spline interpolation
J. Comput. Appl. Math.
(2002) - et al.
Using a technology-based intervention to promote weight loss in sedentary overweight or obese adults: a randomized controlled trial study design
Diabetes Metab. Syndr. Obes.: Targets Ther.
(2011) Measurement Error: Models, Methods, and Applications
(2010)- et al.
Measurement error in nonlinear models
Technometrics
(2012) - et al.
A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error
Biometrika
(2012)