Joint estimation of mean-covariance model for longitudinal data with basis function approximations
Introduction
The estimation of the covariance matrix is important in a longitudinal study. A good estimator for the covariance can improve the efficiency of the regression coefficients. Furthermore, the covariance estimation itself is also of interest (Diggle and Verbyla, 1998). A number of authors have studied the problem of estimating the covariance matrix. Pourahmadi, 1999, Pourahmadi, 2000 considered generalized linear models for the components of the modified Cholesky decomposition of the covariance matrix. Fan et al. (2007) and Fan and Wu (2008) proposed to use a semiparametric model for the covariance function. However, the mean and covariance estimators could have considerable bias when the specified parametric or semiparametric model for the covariance structure is far from the truth (Huang et al., 2007).
To balance the variability and bias of the covariance estimator, nonparametric estimators of the covariance structures are being proposed. There are several nonparametric methods used in estimating the covariance matrix. Diggle and Verbyla (1998) provided a nonparametric estimator for the covariance structure without assuming stationarity, but their estimator is not guaranteed to be positive definite. To overcome the positive–definiteness constraint, Wu and Pourahmadi (2003) proposed a nonparametric smoothing to regularize the estimation of a large covariance matrix based on the modified Cholesky decomposition method, but their first step raw estimate is too noisy and thus an inefficient estimate may result. Huang et al. (2007) proposed to apply a smoothing-based regularization after using the modified Cholesky decomposition of the covariance matrix and found that their estimation could be more efficient than Wu and Pourahmadi’s. However, they only considered balanced data which is not common in practice. Thus we present an extension of their method to unbalanced data. In addition, all these works focus on the estimation of the covariance matrix and pay little attention to the mean structure. As shown in Pan and Mackenzie (2003), a misspecified estimator of the mean structure may well lead to a poor estimator of the covariance structure. Thus, we suggest using a flexible estimation of the mean component so as to avoid such a possible drawback.
In this article, we propose to consider a partially linear model which keeps the flexibility of the nonparametric model while maintaining the explanatory power of the parametric model for the mean. We first model the nonparametric term, the within-subject correlation and variation by spline functions after decomposing the covariance matrix based on the modified Cholesky decomposition. The joint mean-covariance model is then constructed. Finally, we estimate the associated parameters using the maximum likelihood approach. The main focus is on the estimation efficiency gain in the regression coefficients by incorporating the covariance matrix.
The proposed estimation procedure is more general than that given by Huang et al. (2007). Their estimation procedure is confined to the analysis of balanced longitudinal data. Although we can deal with the variation using a similar method as that in Huang et al. (2007), it is not a case for the within-subject correlation. Different from that in Huang et al., the within-subject correlation in the paper is supposed to depend only on the elapsed time.
The remainder of the paper is organized as follows. The proposed spline method is given in Section 2. Section 3 develops the estimation procedure for regression coefficients and the covariance function. A simulation study and a real data analysis are given in Sections 4 Simulation study, 5 Application to real data.
Section snippets
Joint mean-covariance model
Assume that we have a sample of subjects. For the th subject, , the response variable and the covariate vector are collected at time points , where is the total number of observations for the th subject. The following partially linear model is considered, where is a -dimensional unknown parameter vector, is an unknown smooth function, . Let and . Donate .
Estimation of and
In the above section, we have modeled the nonparametric term, and the main diagonal of and the subdiagonals of by spline functions and then we will employ the likelihood approach as given in the following.
Let , , and . Then, the logarithm of the likelihood function of can be written as up to a constant that can be neglected, where . With being replaced by
Simulation study
In this section, we investigate the performance of the proposed method by the Monte Carlo simulation. For comparison, we estimate and using a working independence covariance structure (WI) and the true covariance structure (True). We also include the sample covariance estimator in the comparison. Moreover, we demonstrate the flexibility and efficiency of model (2.1) by comparing with the linear model (4.3) and investigate the effect of a misspecification of the mean structure on the
Application to real data
Here we apply the proposed method to the actual longitudinal data. The data is the longitudinal hormone study on progesterone (Zhang et al., 1998) which collected urine samples from 34 healthy women in a menstrual cycle and urinary progesterone on alternative days. A total of 492 observations were obtained from the 34 participants with each contributing from 11 to 28 observations. The menstrual cycle lengths of these women ranged from 23 to 56 days, with an average of 29.6 days. Biologically,
Conclusion
In this article, we proposed a partially linear model which keeps the flexibility of the nonparametric model while maintaining the explanatory power of the parametric model. We model the nonparametric term, the within-subject correlation and variation by spline functions after decomposing the covariance matrix based on the modified Cholesky decomposition. Here we consider the unbalanced data while Huang et al. (2007) considered balanced data with equal time intervals. In the simulation study,
Acknowledgements
The authors would like to thank the Editor and the referees for their constructive comments and helpful suggestions that largely improve the presentation of the paper. The research is supported by the Natural Science Foundation of China Grant 10931002, 1091120386.
References (13)
- et al.
Nonparametric estimation of covariance structure in longitudinal data
Biometrics
(1998) - et al.
Analysis of longitudinal data with semiparametric estimation of covariance function
Journal of the American Statistical Association
(2007) - et al.
New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis
Journal of the American Statistical Association
(2004) - et al.
Semiparametric estimation of covariance matrices for longitudinal data
Journal of the American Statistical Association
(2008) - et al.
Robust estimation in generalized partial linear models for clustered data
Journal of the American Statistical Association
(2005) - et al.
Estimation in a semiparametric model for longitudinal data with unspecified dependence structure
Biometrika
(2002)
Cited by (12)
A moving average Cholesky factor model in covariance modeling for composite quantile regression with longitudinal data
2017, Computational Statistics and Data AnalysisCitation Excerpt :Thus, it is still blank in the development of a general regression approach to model the covariance matrix in composite quantile regression with longitudinal data. Recently, the modified Cholesky decomposition has become a popular approach to parameterize the covariance matrix (Ye and Pan, 2006; Leng et al., 2010; Mao et al., 2011; Zhang and Leng, 2012; Yao and Li, 2013; Liu and Zhang, 2013; Liu and Li, 2015; Qin et al., 2016; Guo et al., 2016). The main merits of decomposition include the following aspects: (i) it guarantees the positive definiteness of estimated covariance matrix, (ii) the parameters of covariance matrix are related to well-founded statistical concepts.
Robust variable selection in semiparametric mean-covariance regression for longitudinal data analysis
2014, Applied Mathematics and ComputationCitation Excerpt :Here we only list a few. See Leng et al. [10] and Mao et al. [13] constructed PLMs for the mean and the covariance structure for longitudinal data, which are more flexible than that of Ye and Pan [28]. However, the above approaches based on GEEs are also highly sensitive to outliers in the sample.
Nonparametric tests for panel count data with unequal observation processes
2014, Computational Statistics and Data AnalysisRobust estimation via modified Cholesky decomposition for modal partially nonlinear models with longitudinal data
2023, Communications in Statistics: Simulation and ComputationQuantile estimations via modified Cholesky decomposition for longitudinal single-index models
2019, Annals of the Institute of Statistical MathematicsSubject-wise empirical likelihood inference for robust joint mean-covariance model with longitudinal data
2019, Statistics and its Interface