Influence diagnostics in linear and nonlinear mixed-effects models with censored data
Introduction
Studies of HIV viral dynamics, often considered to be a key issue in AIDS research, considers repeated/longitudinal measures over a period of treatment routinely analyzed using linear and non-linear mixed-effects models (LME/NLME) to assess rates of changes in HIV-1 RNA level or viral load (Wu, 2005, Wu, 2010). Viral load measures the amount of actively replicating virus and its reduction is frequently used as a primary endpoint in clinical trials of anti-retroviral (ARV) therapy. However, depending on the diagnostic assays used, its measurement may be subjected to some upper and lower detection limits, below or above which they are not quantifiable (resulting in left or right censoring). The proportion of censored data in these studies may not be small (Hughes, 1999) and so the use of crude/ad hoc methods, such as substituting a threshold value or some arbitrary point like a mid-point between zero and cut-off for detection (Vaida and Liu, 2009), might lead to biased estimates of fixed effects and variance components (Wu, 2010).
Our motivating datasets in this study are on HIV-1 viral loads, (i) after unstructured treatment interruption, or UTI (Saitoh et al., 2008) and (ii) setpoint for acutely infected subjects from the AIEDRP program (Vaida and Liu, 2009). The former has about 7% of observations below (left-censored) the detection limits, whereas the latter has about 22% lying above (right-censored) the limits of assay quantifications. As an alternative to crude imputation methods, Hughes (1999) proposed a likelihood-based Monte Carlo EM algorithm (MCEM) for LME with censored responses (LMEC). Vaida et al. (2007) proposed a hybrid EM using a more efficient Hughes’s algorithm, extending it to NLME with censored data (NLMEC). Recently, Vaida and Liu (2009) proposed an exact EM-type algorithm for LMEC/NLMEC, which uses closed-form expressions at the E-step, as opposed to Monte Carlo simulations. Strictly speaking, these algorithms are Space Alternating Generalized EM (SAGE) algorithms (see Vaida et al., 2007). In this paper, for performing diagnostics analysis in LMEC/NLMEC models, we first propose a slight modification to the EM-type algorithm proposed by Vaida and Liu (2009), wherein all the parameters are updated (M-step) by considering the random effects and the censored observations as missing data. Then, the diagnostic measures for assessing the local influence in LMEC/NLMEC are developed and presented.
The study of influence analysis is an important and key step in data analysis subsequent to parameter estimation. This can be carried out by conducting an influence analysis for detecting influential observations. There are two primary approaches for detecting influential observations. The first approach is the case-deletion approach (Cook, 1977) and it is an intuitively appealing method (see also Cook and Weisberg, 1982). Deletion diagnostics such as Cook’s distance or the likelihood distance have been applied to many statistical models. The second approach, which is a general statistical technique used to assess the stability of the estimation outputs with respect to the model inputs, is the local influence approach of Cook (1986). Following the pioneering work of Cook (1986), this method has received considerable attention recently in the statistical literature on mixed-effects models (LME/NLME); see, for example, Lesaffre and Verbeke (1998), Zhu and Lee (2001), Lee and Xu (2004), Osorio et al. (2007) and Russo et al. (2009), among others.
Although several diagnostic studies on LME/NLME have appeared in the literature, to the best of our knowledge, no study seems to have been made on influence diagnostics for NLMEC/LMEC and certainly not on the local influence analysis. The main difficulty is due to the fact that the observed log-likelihood function of the NLMEC/LMEC involves intractable integrals (for instance, the pdfs of truncated multinormal distributions), rendering the direct application of Cook’s approach (Cook, 1986) to this model to be very difficult if not impossible, since the measures involve the first and second partial derivatives of this function. Zhu and Lee (2001) developed an approach for performing local influence analysis for general statistical models with missing data, and it is based on the -displacement function that is closely related to the conditional expectation of the complete-data log-likelihood in the E-step of the EM algorithm. This approach produces results very similar to those obtained from Cook’s method. Moreover, the case-deletion can be studied by -displacement function following the approach of Zhu et al. (2001). So, we develop here methods to obtain case-deletion measures and local influence measures by using the method of Zhu et al. (2001) (see also Lee and Xu, 2004, Zhu and Lee, 2001) in the context of mixed-effects models with censored data. It is our opinion that the results developed here form a necessary supplement to those presented by Vaida and Liu (2009) for the analysis of mixed-effects models with censored response and HIV data.
The rest of this paper is organized as follows. In Section 2, the LMEC model is defined, and an EM-type algorithm for obtaining the ML estimates is described. In Section 3, we provide a brief sketch of the local influence approach for models with incomplete data, and also develop a methodology pertinent to the LMEC. Four different perturbation schemes are considered. In Section 4, the NLMEC model is defined. The methodology has been illustrated in Section 5 with the analysis of two examples involving HIV viral measure and by an empirical study. Finally, some concluding remarks are made in Section 6.
Section snippets
The linear mixed-effects with censored response
Ignoring censoring for the moment, the classical normal LME model is specified as follows (Laird and Ware, 1982): where is independent of ; the subscript is the subject index; denotes the identity matrix; is an vector of observed continuous responses for subject ; is the design matrix corresponding to the fixed effects, , of dimension ; is the design matrix corresponding to the
Diagnostic analysis
Influence diagnostic techniques are used to identify anomalous observations that impact on model fitting or statistical inference for the assumed statistical model. There are primarily two approaches for detecting influential observations. The case-deletion approach (Cook, 1977) is the most popular one for identifying influential observations. To assess the impact of influential observations on parameter estimates some metrics have been used for measuring the distance between and , such
The nonlinear case
The NLME (Pinheiro and Bates, 2000) is defined as where and are independent; is an () vector of observed continuous responses for subject ; is a nonlinear function of the individual random parameter ; and are known design matrices of dimensions and , respectively, possibly depending on some covariate values; is the () vector of fixed effects; and is the () vector of random effects.
As
Numerical illustrations
We illustrate the performance of the proposed methods with the analysis of two HIV datasets, previously analyzed by Vaida and Liu (2009), and of a simulated example.
Conclusions
This article provides a new insight into the classical diagnostic methods for censored linear and nonlinear mixed-effects models, typically used for analyzing censored HIV viral load outcomes, and also presents an expectation conditional maximization (EMC) algorithm, which enables the development of diagnostic influence measures. Explicit expressions are obtained for the Hessian matrix and for the matrix under different perturbation schemes. For NLMEC, the analysis is mathematically (and
Acknowledgments
The authors thank the, Editor, Associate Editor and two anonymous reviewers whose constructive comments on an earlier version led to this far improved manuscript. This study was supported by FAPESP and CNPq, Brazil.
References (28)
- et al.
R influence analysis of nonlinear mixed-effects models
Computational Statistics & Data Analysis
(2004) - et al.
Assessment of local influence in elliptical linear models with longitudinal structure
Computational Statistics & Data Analysis
(2007) - et al.
Influence diagnostics in nonlinear mixed-effects elliptical models
Computational Statistics & Data Analysis
(2009) - et al.
Efficient hybrid EM for linear and nonlinear mixed effects models with censored response
Computational Statistics & Data Analysis
(2007) Detection of influential observation in linear regression
Technometrics
(1977)Assessment of local influence
Journal of the Royal Statistical Society, Series B
(1986)- et al.
Residuals and Influence in Regression
(1982) - Genz, A., Bretz, F., Hothorn, T., Miwa, T., Mi, X., Leisch, F., Scheipl, F., 2008. Mvtnorm: multivariate normal and t...
Mixed effects models with censored data with application to HIV RNA levels
Biometrics
(1999)- et al.
Analysis of left-censored longitudinal data with application to viral load in HIV infection
Biostatistics
(2000)
Influence diagnostics for the Grubbs’s model
Statistical Papers
Linear and nonlinear mixed-effects models for censored HIV viral loads using normal/independent distributions
Biometrics
Random effects models for longitudinal data
Biometrics
Local influence in linear mixed models
Biometrics
Cited by (28)
Heckman selection-t model: Parameter estimation via the EM-algorithm
2021, Journal of Multivariate AnalysisFinite mixture modeling of censored data using the multivariate Student-t distribution
2017, Journal of Multivariate AnalysisCitation Excerpt :Vaida and Liu [39] proposed an exact Expectation–Maximization (EM) algorithm for maximum likelihood (ML) estimation in mixed effects models for censored data, which uses closed-form expressions at the E-step. Further, Matos et al. [29] developed diagnostic measures for assessing local influence in these models. Militino and Ugarte [34] developed an EM algorithm for conducting ML estimation in censored spatial data.
Influence assessment in censored mixed-effects models using the multivariate Student's-t distribution
2015, Journal of Multivariate AnalysisCitation Excerpt :Hence, developing influence diagnostics is a key in assessing the effect of a single observation on the predicted scores for other observations, and consequently the overall parameter estimates, all based on the mean function. Although diagnostics for the traditional normality based LME and LMEC [19] models exist, those for heavy-tailed LMEC/NLMEC models are not well developed. Influence analysis is generally conducted using two primary approaches.
Estimation methods for multivariate Tobit confirmatory factor analysis
2014, Computational Statistics and Data AnalysisA new extended Birnbaum-Saunders regression model for lifetime modeling
2013, Computational Statistics and Data AnalysisCitation Excerpt :Many applications of the local influence method may be found in the statistical literature for various models and under different perturbation schemes. For instance, Espinheira et al. (2008), Vasconcellos and Fernandez (2009), Patriota et al. (2010), Lemonte and Patriota (2011), Zevallos et al. (2012) and Matos et al. (2013), among others. In this paper, we also propose a similar methodology to detect influential subjects in the new extended BS regression model.