Intra-cluster correlation structure in longitudinal data analysis: Selection criteria and misspecification tests

https://doi.org/10.1016/j.csda.2014.06.013Get rights and content

Abstract

Selection criteria and misspecification tests for the intra-cluster correlation structure (ICS) in longitudinal data analysis are considered. In particular, the asymptotical distribution of the correlation information criterion (CIC) is derived and a new method for selecting a working ICS is proposed by standardizing the selection criterion as the p-value. The CIC test is found to be powerful in detecting misspecification of the working ICS structures, while with respect to the working ICS selection, the standardized CIC test is also shown to have satisfactory performance. Some simulation studies and applications to two real longitudinal datasets are made to illustrate how these criteria and tests might be useful.

Introduction

The generalized estimating equation (GEE) method first proposed by Liang and Zeger (1986) is a very popular approach for modeling correlated data from longitudinal or cluster studies. It is known that under mild regularity conditions, the parameter estimates from the GEE approach are consistent, even when the working intra-cluster correlation structure (ICS) is misspecified. However, as pointed out by many other researchers, such as Rotnitzky and Jewell (1990), Fitzmaurice (1995) and Wang and Carey, 2003, Wang and Carey, 2004, misspecification of the ICS can affect the asymptotic relative efficiency of parameter estimation without impairing consistency. Therefore, the selection of working ICS in GEE is pertinent to longitudinal and clustered data analysis and the appropriate specification of the correlation structure may improve estimation efficiency, hence leading to more reliable statistical inferences. The importance of the working ICS selection has also been addressed by Zhou et al. (2012), Wang (2003), Wang and Hin (2010), Zhou and Qu (2012) and Chen and Lazar (2012), among others.

Generally, we often have to deal with the following two problems about the correlation structures selection for longitudinal data analysis. One is to select the best correlated structure from a pool of candidate correlated structures, where various selection criteria have been proposed in the literature. However, it should be noted that the selected best ICS by different selection criteria may not be the true one. This is because the true correlation structure may not be included in the candidate correlated structures. The other problem is to test the working ICS misspecification, namely to test whether the selected or given ICS is true or not. That is if the null hypothesis that the chosen working ICS equals the true ICS is not rejected, then the corresponding hypothesized working ICS could be regarded as the true ICS in the following discussions.

For the various working ICS selection criteria based on different approximations in the literature, Pan (2001) proposed a quasi-likelihood under the independence model criterion (QIC), which is actually a modification of the Akaike information criterion (AIC) by Akaike (1973). Hin and Wang (2009) modified the QIC and proposed a correlation information criterion (CIC). As shown by Hin and Wang (2009), the CIC criterion greatly improves the QIC criterion in selecting working correlation structures. Rotnitzky and Jewell (1990) proposed a two-dimensional criterion, which was investigated and denoted as c̄1 and c̄2 criteria by Wang and Carey (2004) and Carey and Wang (2011), respectively. For the misspecification tests of the working ICS, O’Hara Hines and Hines, 2007, O’Hara Hines and Hines, 2010 considered the standardized form of the c̄2 criterion based on the graphical symmetrizing power-law transformation. More recently, Zhou et al. (2012) proposed an information ratio (IR) test for the misspecification for general covariance structures. In addition, a working ICS selection criterion was introduced based on the IR test and they concluded that in comparison to the CIC criterion, the IR test-based working ICS selection method shows stronger evidence.

Note that the IR test statistic could actually be regarded as the standardized form of the c̄1 criterion. We introduce the CIC test statistic in this paper to investigate the working ICS misspecification, where the CIC test p-value is a natural extension of the corresponding IR test p-value. Particularly, we shall first derive the asymptotic distribution of the CIC test statistic, and then a correlation structure selection procedure is similarly established based on the CIC test. The main goal of this paper is to compare the CIC test p-value method with the IR test p-value method, CIC, c̄1 and RJ criteria with respect to working ICS selection. The rest of the paper is organized as follows. In Section  2, the theoretical background is described and the asymptotic distribution of the CIC test statistic is derived. In Section  3, some simulation studies are conducted to evaluate the behaviors of the CIC test and the IR test, as well as the existing CIC, c̄1, and RJ criteria for working ICS selection. Applications of the different approaches are also made to two real datasets in Section  4 and some concluding remarks are provided in Section  5.

Section snippets

The theoretical background and asymptotic distribution of CIC test statistic

Consider that there are n clusters or subjects in the longitudinal dataset and each subject is repeatedly observed at times t=1,,li, where i=1,,n. For a particular subject i, let yi=(yi1,,yili)T be the li×1 response vector, and Xi=(xi1,,xili)T be the li×p covariate matrix with j-th row denoted by xijT,j=1,,li. Suppose the mean and variance values for response yij are specified as E(yij)=μij=g(xijTβ) and var(yij)=ϕv(μij), where β is the p×1 vector of regression parameters, ϕ is the

Simulation results: working ICS misspecification tests

In the simulation study in this subsection, we assess the performances of the IR statistic and CIC statistic in testing the working ICS misspecification. In particular, we shall focus on two aspects: one is to investigate the finite sample distributions of the IR test statistic and the CIC test statistic, where the empirical sizes of the IR test and CIC test are obtained for small and large sample sizes; the other is to examine the empirical powers of the IR test statistic and CIC test

Two real examples

In this section, we compare the performance of various criteria using the following two datasets: the Madras dataset from the Madras Longitudinal Schizophrenia Study available from Diggle et al. (2002) and the Labor Market Experience dataset from a study of the National Longitudinal Survey of Labor Market Experience available from the Center for Human Resource Research (1989).

Concluding remarks

In this paper, we proposed the CIC test statistic to test the misspecification of the working correlation structure for generalized estimating equations in longitudinal data analysis. The asymptotic distributions of the CIC test statistic when the dispersion parameter is either known or unknown are derived under the null hypothesis and a corresponding working ICS selection method is established based on the p-values from the CIC test. It is noted that the CIC, c̄1 and RJ criteria are introduced

Acknowledgments

Jianwen Xu’s research is mainly funded by Fundamental Research Funds for the Central UniversitiesCQDXWL-2013-Z009 and partly supported by National Natural Science Foundation of China11171361, 11201505 and Natural Science foundation Project of CQ cstc2013jcyj A00001. YGW’s research is funded by ARC Discovery Grant DP130100766.

References (21)

  • R.J. O’Hara Hines et al.

    Covariance mis-specification and the local influence approach in sensitivity analyses of longitudinal data with drop-outs

    Comput. Statist. Data Anal.

    (2007)
  • Y.-G. Wang et al.

    Modeling strategies in longitudinal data analysis: covariate, variance function and correlation structure selection

    Comput. Statist. Data Anal.

    (2010)
  • Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Proceedings of the 2nd...
  • V.J. Carey et al.

    Working covariance model selection for generalized estimating equations

    Stat. Med.

    (2011)
  • National Longitudinal Survey of Labor Market Experience, Young Women 14–26 Years of Age in 1968

    (1989)
  • J. Chen et al.

    Selection of working correlation structure in generalized estimating equations via empirical likelihood

    J. Comput. Graph. Statist.

    (2012)
  • P.J. Diggle et al.

    Analysis of Longitudinal data

    (2002)
  • G.M. Fitzmaurice

    A caveat concerning independence estimating equations with multivariate binary data

    Biometrics

    (1995)
  • M. Gosho et al.

    Criterion for the selection of a working correlation structure in the generalized estimating equation approach for longitudinal balanced data

    Comm. Statist. Theory Methods

    (2012)
  • L.Y. Hin et al.

    Criteria for working-correlation-structure selection in GEE: assessment via simulation

    Amer. Statist.

    (2007)
There are more references available in the full text version of this article.
View full text