Influence diagnostics in linear and nonlinear mixed-effects models with censored data

https://doi.org/10.1016/j.csda.2012.06.021Get rights and content

Abstract

HIV RNA viral load measures are often subjected to some upper and lower detection limits depending on the quantification assays, and consequently the responses are either left or right censored. Linear and nonlinear mixed-effects models, with modifications to accommodate censoring (LMEC and NLMEC), are routinely used to analyze this type of data. Recently, Vaida and Liu (2009) proposed an exact EM-type algorithm for LMEC/NLMEC, called the SAGE algorithm (Meng and Van Dyk, 1997), that uses closed-form expressions at the E-step, as opposed to Monte Carlo simulations. Motivated by this algorithm, we propose here an exact ECM algorithm (Meng and Rubin, 1993) for LMEC/NLMEC, which enables us to develop local influence analysis for mixed-effects models on the basis of conditional expectation of the complete-data log-likelihood function. This is because the observed data log-likelihood function associated with the proposed model is somewhat complex which makes it difficult to directly apply the approach of Cook, 1977, Cook, 1986. Some useful perturbation schemes are also discussed. Finally, the results obtained from the analyses of two HIV AIDS studies on viral loads are presented to illustrate the newly developed methodology.

Introduction

Studies of HIV viral dynamics, often considered to be a key issue in AIDS research, considers repeated/longitudinal measures over a period of treatment routinely analyzed using linear and non-linear mixed-effects models (LME/NLME) to assess rates of changes in HIV-1 RNA level or viral load (Wu, 2005, Wu, 2010). Viral load measures the amount of actively replicating virus and its reduction is frequently used as a primary endpoint in clinical trials of anti-retroviral (ARV) therapy. However, depending on the diagnostic assays used, its measurement may be subjected to some upper and lower detection limits, below or above which they are not quantifiable (resulting in left or right censoring). The proportion of censored data in these studies may not be small (Hughes, 1999) and so the use of crude/ad hoc methods, such as substituting a threshold value or some arbitrary point like a mid-point between zero and cut-off for detection (Vaida and Liu, 2009), might lead to biased estimates of fixed effects and variance components (Wu, 2010).

Our motivating datasets in this study are on HIV-1 viral loads, (i) after unstructured treatment interruption, or UTI (Saitoh et al., 2008) and (ii) setpoint for acutely infected subjects from the AIEDRP program (Vaida and Liu, 2009). The former has about 7% of observations below (left-censored) the detection limits, whereas the latter has about 22% lying above (right-censored) the limits of assay quantifications. As an alternative to crude imputation methods, Hughes (1999) proposed a likelihood-based Monte Carlo EM algorithm (MCEM) for LME with censored responses (LMEC). Vaida et al. (2007) proposed a hybrid EM using a more efficient Hughes’s algorithm, extending it to NLME with censored data (NLMEC). Recently, Vaida and Liu (2009) proposed an exact EM-type algorithm for LMEC/NLMEC, which uses closed-form expressions at the E-step, as opposed to Monte Carlo simulations. Strictly speaking, these algorithms are Space Alternating Generalized EM (SAGE) algorithms (see Vaida et al., 2007). In this paper, for performing diagnostics analysis in LMEC/NLMEC models, we first propose a slight modification to the EM-type algorithm proposed by Vaida and Liu (2009), wherein all the parameters are updated (M-step) by considering the random effects and the censored observations as missing data. Then, the diagnostic measures for assessing the local influence in LMEC/NLMEC are developed and presented.

The study of influence analysis is an important and key step in data analysis subsequent to parameter estimation. This can be carried out by conducting an influence analysis for detecting influential observations. There are two primary approaches for detecting influential observations. The first approach is the case-deletion approach (Cook, 1977) and it is an intuitively appealing method (see also Cook and Weisberg, 1982). Deletion diagnostics such as Cook’s distance or the likelihood distance have been applied to many statistical models. The second approach, which is a general statistical technique used to assess the stability of the estimation outputs with respect to the model inputs, is the local influence approach of Cook (1986). Following the pioneering work of Cook (1986), this method has received considerable attention recently in the statistical literature on mixed-effects models (LME/NLME); see, for example, Lesaffre and Verbeke (1998), Zhu and Lee (2001), Lee and Xu (2004), Osorio et al. (2007) and Russo et al. (2009), among others.

Although several diagnostic studies on LME/NLME have appeared in the literature, to the best of our knowledge, no study seems to have been made on influence diagnostics for NLMEC/LMEC and certainly not on the local influence analysis. The main difficulty is due to the fact that the observed log-likelihood function of the NLMEC/LMEC involves intractable integrals (for instance, the pdfs of truncated multinormal distributions), rendering the direct application of Cook’s approach (Cook, 1986) to this model to be very difficult if not impossible, since the measures involve the first and second partial derivatives of this function. Zhu and Lee (2001) developed an approach for performing local influence analysis for general statistical models with missing data, and it is based on the Q-displacement function that is closely related to the conditional expectation of the complete-data log-likelihood in the E-step of the EM algorithm. This approach produces results very similar to those obtained from Cook’s method. Moreover, the case-deletion can be studied by Q-displacement function following the approach of Zhu et al. (2001). So, we develop here methods to obtain case-deletion measures and local influence measures by using the method of Zhu et al. (2001) (see also Lee and Xu, 2004, Zhu and Lee, 2001) in the context of mixed-effects models with censored data. It is our opinion that the results developed here form a necessary supplement to those presented by Vaida and Liu (2009) for the analysis of mixed-effects models with censored response and HIV data.

The rest of this paper is organized as follows. In Section 2, the LMEC model is defined, and an EM-type algorithm for obtaining the ML estimates is described. In Section 3, we provide a brief sketch of the local influence approach for models with incomplete data, and also develop a methodology pertinent to the LMEC. Four different perturbation schemes are considered. In Section 4, the NLMEC model is defined. The methodology has been illustrated in Section 5 with the analysis of two examples involving HIV viral measure and by an empirical study. Finally, some concluding remarks are made in Section 6.

Section snippets

The linear mixed-effects with censored response

Ignoring censoring for the moment, the classical normal LME model is specified as follows (Laird and Ware, 1982): yi=Xiβ+Zibi+ϵi, where bii.i.d.Nq(0,D) is independent of ϵiind.Nni(0,σ2Ini),i=1,,n; the subscript i is the subject index; Ip denotes the p×p identity matrix; yi=(yi1,,yini) is an ni×1 vector of observed continuous responses for subject i; Xi is the ni×p design matrix corresponding to the fixed effects, β, of dimension p×1; Zi is the ni×q design matrix corresponding to the q×1

Diagnostic analysis

Influence diagnostic techniques are used to identify anomalous observations that impact on model fitting or statistical inference for the assumed statistical model. There are primarily two approaches for detecting influential observations. The case-deletion approach (Cook, 1977) is the most popular one for identifying influential observations. To assess the impact of influential observations on parameter estimates some metrics have been used for measuring the distance between θ̂[i] and θ̂, such

The nonlinear case

The NLME (Pinheiro and Bates, 2000) is defined as yi=η(ϕi,Xi)+ϵi,ϕi=Aiβ+Bibi,i=1,,n, where bii.i.d.Nq(0,D) and ϵiind.Nni(0,σ2Ini) are independent; yi is an (ni×1) vector of observed continuous responses for subject i; η is a nonlinear function of the individual random parameter ϕi; Ai and Bi are known design matrices of dimensions r×p and r×q, respectively, possibly depending on some covariate values; β is the (p×1) vector of fixed effects; and bi is the (q×1) vector of random effects.

As

Numerical illustrations

We illustrate the performance of the proposed methods with the analysis of two HIV datasets, previously analyzed by Vaida and Liu (2009), and of a simulated example.

Conclusions

This article provides a new insight into the classical diagnostic methods for censored linear and nonlinear mixed-effects models, typically used for analyzing censored HIV viral load outcomes, and also presents an expectation conditional maximization (EMC) algorithm, which enables the development of diagnostic influence measures. Explicit expressions are obtained for the Hessian matrix Q̈ and for the matrix Δ under different perturbation schemes. For NLMEC, the analysis is mathematically (and

Acknowledgments

The authors thank the, Editor, Associate Editor and two anonymous reviewers whose constructive comments on an earlier version led to this far improved manuscript. This study was supported by FAPESP and CNPq, Brazil.

References (28)

  • V.H. Lachos et al.

    Influence diagnostics for the Grubbs’s model

    Statistical Papers

    (2007)
  • V. Lachos et al.

    Linear and nonlinear mixed-effects models for censored HIV viral loads using normal/independent distributions

    Biometrics

    (2011)
  • N.M. Laird et al.

    Random effects models for longitudinal data

    Biometrics

    (1982)
  • E. Lesaffre et al.

    Local influence in linear mixed models

    Biometrics

    (1998)
  • Cited by (28)

    • Finite mixture modeling of censored data using the multivariate Student-t distribution

      2017, Journal of Multivariate Analysis
      Citation Excerpt :

      Vaida and Liu [39] proposed an exact Expectation–Maximization (EM) algorithm for maximum likelihood (ML) estimation in mixed effects models for censored data, which uses closed-form expressions at the E-step. Further, Matos et al. [29] developed diagnostic measures for assessing local influence in these models. Militino and Ugarte [34] developed an EM algorithm for conducting ML estimation in censored spatial data.

    • Influence assessment in censored mixed-effects models using the multivariate Student's-t distribution

      2015, Journal of Multivariate Analysis
      Citation Excerpt :

      Hence, developing influence diagnostics is a key in assessing the effect of a single observation on the predicted scores for other observations, and consequently the overall parameter estimates, all based on the mean function. Although diagnostics for the traditional normality based LME and LMEC [19] models exist, those for heavy-tailed LMEC/NLMEC models are not well developed. Influence analysis is generally conducted using two primary approaches.

    • Estimation methods for multivariate Tobit confirmatory factor analysis

      2014, Computational Statistics and Data Analysis
    • A new extended Birnbaum-Saunders regression model for lifetime modeling

      2013, Computational Statistics and Data Analysis
      Citation Excerpt :

      Many applications of the local influence method may be found in the statistical literature for various models and under different perturbation schemes. For instance, Espinheira et al. (2008), Vasconcellos and Fernandez (2009), Patriota et al. (2010), Lemonte and Patriota (2011), Zevallos et al. (2012) and Matos et al. (2013), among others. In this paper, we also propose a similar methodology to detect influential subjects in the new extended BS regression model.

    View all citing articles on Scopus
    View full text