Joint estimation of mean-covariance model for longitudinal data with basis function approximations

https://doi.org/10.1016/j.csda.2010.08.003Get rights and content

Abstract

When the selected parametric model for the covariance structure is far from the true one, the corresponding covariance estimator could have considerable bias. To balance the variability and bias of the covariance estimator, we employ a nonparametric method. In addition, as different mean structures may lead to different estimators of the covariance matrix, we choose a semiparametric model for the mean so as to provide a stable estimate of the covariance matrix. Based on the modified Cholesky decomposition of the covariance matrix, we construct the joint mean-covariance model by modeling the smooth functions using the spline method and estimate the associated parameters using the maximum likelihood approach. A simulation study and a real data analysis are conducted to illustrate the proposed approach and demonstrate the flexibility of the suggested model.

Introduction

The estimation of the covariance matrix is important in a longitudinal study. A good estimator for the covariance can improve the efficiency of the regression coefficients. Furthermore, the covariance estimation itself is also of interest (Diggle and Verbyla, 1998). A number of authors have studied the problem of estimating the covariance matrix. Pourahmadi, 1999, Pourahmadi, 2000 considered generalized linear models for the components of the modified Cholesky decomposition of the covariance matrix. Fan et al. (2007) and Fan and Wu (2008) proposed to use a semiparametric model for the covariance function. However, the mean and covariance estimators could have considerable bias when the specified parametric or semiparametric model for the covariance structure is far from the truth (Huang et al., 2007).

To balance the variability and bias of the covariance estimator, nonparametric estimators of the covariance structures are being proposed. There are several nonparametric methods used in estimating the covariance matrix. Diggle and Verbyla (1998) provided a nonparametric estimator for the covariance structure without assuming stationarity, but their estimator is not guaranteed to be positive definite. To overcome the positive–definiteness constraint, Wu and Pourahmadi (2003) proposed a nonparametric smoothing to regularize the estimation of a large covariance matrix based on the modified Cholesky decomposition method, but their first step raw estimate is too noisy and thus an inefficient estimate may result. Huang et al. (2007) proposed to apply a smoothing-based regularization after using the modified Cholesky decomposition of the covariance matrix and found that their estimation could be more efficient than Wu and Pourahmadi’s. However, they only considered balanced data which is not common in practice. Thus we present an extension of their method to unbalanced data. In addition, all these works focus on the estimation of the covariance matrix and pay little attention to the mean structure. As shown in Pan and Mackenzie (2003), a misspecified estimator of the mean structure may well lead to a poor estimator of the covariance structure. Thus, we suggest using a flexible estimation of the mean component so as to avoid such a possible drawback.

In this article, we propose to consider a partially linear model which keeps the flexibility of the nonparametric model while maintaining the explanatory power of the parametric model for the mean. We first model the nonparametric term, the within-subject correlation and variation by spline functions after decomposing the covariance matrix based on the modified Cholesky decomposition. The joint mean-covariance model is then constructed. Finally, we estimate the associated parameters using the maximum likelihood approach. The main focus is on the estimation efficiency gain in the regression coefficients by incorporating the covariance matrix.

The proposed estimation procedure is more general than that given by Huang et al. (2007). Their estimation procedure is confined to the analysis of balanced longitudinal data. Although we can deal with the variation using a similar method as that in Huang et al. (2007), it is not a case for the within-subject correlation. Different from that in Huang et al., the within-subject correlation in the paper is supposed to depend only on the elapsed time.

The remainder of the paper is organized as follows. The proposed spline method is given in Section 2. Section 3 develops the estimation procedure for regression coefficients and the covariance function. A simulation study and a real data analysis are given in Sections 4 Simulation study, 5 Application to real data.

Section snippets

Joint mean-covariance model

Assume that we have a sample of n subjects. For the ith subject, i=1,,n, the response variable yi(tij) and the covariate vector xi(tij) are collected at time points t=tij,j=1,,ni, where ni is the total number of observations for the ith subject. The following partially linear model is considered, yij=xijβ+α(tij)+εij,j=1,,ni,i=1,,n, where β is a p-dimensional unknown parameter vector, α(tij) is an unknown smooth function, E(εij)=0. Let εi=(εi1,,εini) and Cov(εi)=Σi. Donate μij=xijβ+α(tij).

Estimation of θ,γ and λ

In the above section, we have modeled the nonparametric term, and the main diagonal of Di and the subdiagonals of Ti by spline functions and then we will employ the likelihood approach as given in the following.

Let yi=(yi1,,yini), Xi=(xi1,,xini), μi=(μi1,,μini) and εi=(εi1,,εini). Then, the logarithm of the likelihood function of y1,,yn can be written as L=12i=1nlog|Σi|12i=1ntr(Σi1Si) up to a constant that can be neglected, where Si=(yiμi)(yiμi). With μi being replaced by μ˜i=(μ˜

Simulation study

In this section, we investigate the performance of the proposed method by the Monte Carlo simulation. For comparison, we estimate β and α(.) using a working independence covariance structure (WI) and the true covariance structure (True). We also include the sample covariance estimator in the comparison. Moreover, we demonstrate the flexibility and efficiency of model (2.1) by comparing with the linear model (4.3) and investigate the effect of a misspecification of the mean structure on the

Application to real data

Here we apply the proposed method to the actual longitudinal data. The data is the longitudinal hormone study on progesterone (Zhang et al., 1998) which collected urine samples from 34 healthy women in a menstrual cycle and urinary progesterone on alternative days. A total of 492 observations were obtained from the 34 participants with each contributing from 11 to 28 observations. The menstrual cycle lengths of these women ranged from 23 to 56 days, with an average of 29.6 days. Biologically,

Conclusion

In this article, we proposed a partially linear model which keeps the flexibility of the nonparametric model while maintaining the explanatory power of the parametric model. We model the nonparametric term, the within-subject correlation and variation by spline functions after decomposing the covariance matrix based on the modified Cholesky decomposition. Here we consider the unbalanced data while Huang et al. (2007) considered balanced data with equal time intervals. In the simulation study,

Acknowledgements

The authors would like to thank the Editor and the referees for their constructive comments and helpful suggestions that largely improve the presentation of the paper. The research is supported by the Natural Science Foundation of China Grant 10931002, 1091120386.

References (13)

  • P.T. Diggle et al.

    Nonparametric estimation of covariance structure in longitudinal data

    Biometrics

    (1998)
  • J. Fan et al.

    Analysis of longitudinal data with semiparametric estimation of covariance function

    Journal of the American Statistical Association

    (2007)
  • J. Fan et al.

    New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis

    Journal of the American Statistical Association

    (2004)
  • J. Fan et al.

    Semiparametric estimation of covariance matrices for longitudinal data

    Journal of the American Statistical Association

    (2008)
  • X. He et al.

    Robust estimation in generalized partial linear models for clustered data

    Journal of the American Statistical Association

    (2005)
  • X. He et al.

    Estimation in a semiparametric model for longitudinal data with unspecified dependence structure

    Biometrika

    (2002)
There are more references available in the full text version of this article.

Cited by (12)

  • A moving average Cholesky factor model in covariance modeling for composite quantile regression with longitudinal data

    2017, Computational Statistics and Data Analysis
    Citation Excerpt :

    Thus, it is still blank in the development of a general regression approach to model the covariance matrix in composite quantile regression with longitudinal data. Recently, the modified Cholesky decomposition has become a popular approach to parameterize the covariance matrix (Ye and Pan, 2006; Leng et al., 2010; Mao et al., 2011; Zhang and Leng, 2012; Yao and Li, 2013; Liu and Zhang, 2013; Liu and Li, 2015; Qin et al., 2016; Guo et al., 2016). The main merits of decomposition include the following aspects: (i) it guarantees the positive definiteness of estimated covariance matrix, (ii) the parameters of covariance matrix are related to well-founded statistical concepts.

  • Robust variable selection in semiparametric mean-covariance regression for longitudinal data analysis

    2014, Applied Mathematics and Computation
    Citation Excerpt :

    Here we only list a few. See Leng et al. [10] and Mao et al. [13] constructed PLMs for the mean and the covariance structure for longitudinal data, which are more flexible than that of Ye and Pan [28]. However, the above approaches based on GEEs are also highly sensitive to outliers in the sample.

View all citing articles on Scopus
View full text