Influence diagnostics in linear and nonlinear mixed-effects models with censored data

doi:10.1016/j.csda.2012.06.021

Computational Statistics & Data Analysis

Volume 57, Issue 1, January 2013, Pages 450-464

https://doi.org/10.1016/j.csda.2012.06.021 Get rights and content

Abstract

HIV RNA viral load measures are often subjected to some upper and lower detection limits depending on the quantification assays, and consequently the responses are either left or right censored. Linear and nonlinear mixed-effects models, with modifications to accommodate censoring (LMEC and NLMEC), are routinely used to analyze this type of data. Recently, Vaida and Liu (2009) proposed an exact EM-type algorithm for LMEC/NLMEC, called the SAGE algorithm (Meng and Van Dyk, 1997), that uses closed-form expressions at the E-step, as opposed to Monte Carlo simulations. Motivated by this algorithm, we propose here an exact ECM algorithm (Meng and Rubin, 1993) for LMEC/NLMEC, which enables us to develop local influence analysis for mixed-effects models on the basis of conditional expectation of the complete-data log-likelihood function. This is because the observed data log-likelihood function associated with the proposed model is somewhat complex which makes it difficult to directly apply the approach of Cook, 1977, Cook, 1986. Some useful perturbation schemes are also discussed. Finally, the results obtained from the analyses of two HIV AIDS studies on viral loads are presented to illustrate the newly developed methodology.

Introduction

Studies of HIV viral dynamics, often considered to be a key issue in AIDS research, considers repeated/longitudinal measures over a period of treatment routinely analyzed using linear and non-linear mixed-effects models (LME/NLME) to assess rates of changes in HIV-1 RNA level or viral load (Wu, 2005, Wu, 2010). Viral load measures the amount of actively replicating virus and its reduction is frequently used as a primary endpoint in clinical trials of anti-retroviral (ARV) therapy. However, depending on the diagnostic assays used, its measurement may be subjected to some upper and lower detection limits, below or above which they are not quantifiable (resulting in left or right censoring). The proportion of censored data in these studies may not be small (Hughes, 1999) and so the use of crude/ad hoc methods, such as substituting a threshold value or some arbitrary point like a mid-point between zero and cut-off for detection (Vaida and Liu, 2009), might lead to biased estimates of fixed effects and variance components (Wu, 2010).

Our motivating datasets in this study are on HIV-1 viral loads, (i) after unstructured treatment interruption, or UTI (Saitoh et al., 2008) and (ii) setpoint for acutely infected subjects from the AIEDRP program (Vaida and Liu, 2009). The former has about 7% of observations below (left-censored) the detection limits, whereas the latter has about 22% lying above (right-censored) the limits of assay quantifications. As an alternative to crude imputation methods, Hughes (1999) proposed a likelihood-based Monte Carlo EM algorithm (MCEM) for LME with censored responses (LMEC). Vaida et al. (2007) proposed a hybrid EM using a more efficient Hughes’s algorithm, extending it to NLME with censored data (NLMEC). Recently, Vaida and Liu (2009) proposed an exact EM-type algorithm for LMEC/NLMEC, which uses closed-form expressions at the E-step, as opposed to Monte Carlo simulations. Strictly speaking, these algorithms are Space Alternating Generalized EM (SAGE) algorithms (see Vaida et al., 2007). In this paper, for performing diagnostics analysis in LMEC/NLMEC models, we first propose a slight modification to the EM-type algorithm proposed by Vaida and Liu (2009), wherein all the parameters are updated (M-step) by considering the random effects and the censored observations as missing data. Then, the diagnostic measures for assessing the local influence in LMEC/NLMEC are developed and presented.

The study of influence analysis is an important and key step in data analysis subsequent to parameter estimation. This can be carried out by conducting an influence analysis for detecting influential observations. There are two primary approaches for detecting influential observations. The first approach is the case-deletion approach (Cook, 1977) and it is an intuitively appealing method (see also Cook and Weisberg, 1982). Deletion diagnostics such as Cook’s distance or the likelihood distance have been applied to many statistical models. The second approach, which is a general statistical technique used to assess the stability of the estimation outputs with respect to the model inputs, is the local influence approach of Cook (1986). Following the pioneering work of Cook (1986), this method has received considerable attention recently in the statistical literature on mixed-effects models (LME/NLME); see, for example, Lesaffre and Verbeke (1998), Zhu and Lee (2001), Lee and Xu (2004), Osorio et al. (2007) and Russo et al. (2009), among others.

Although several diagnostic studies on LME/NLME have appeared in the literature, to the best of our knowledge, no study seems to have been made on influence diagnostics for NLMEC/LMEC and certainly not on the local influence analysis. The main difficulty is due to the fact that the observed log-likelihood function of the NLMEC/LMEC involves intractable integrals (for instance, the pdfs of truncated multinormal distributions), rendering the direct application of Cook’s approach (Cook, 1986) to this model to be very difficult if not impossible, since the measures involve the first and second partial derivatives of this function. Zhu and Lee (2001) developed an approach for performing local influence analysis for general statistical models with missing data, and it is based on the $Q$ -displacement function that is closely related to the conditional expectation of the complete-data log-likelihood in the E-step of the EM algorithm. This approach produces results very similar to those obtained from Cook’s method. Moreover, the case-deletion can be studied by $Q$ -displacement function following the approach of Zhu et al. (2001). So, we develop here methods to obtain case-deletion measures and local influence measures by using the method of Zhu et al. (2001) (see also Lee and Xu, 2004, Zhu and Lee, 2001) in the context of mixed-effects models with censored data. It is our opinion that the results developed here form a necessary supplement to those presented by Vaida and Liu (2009) for the analysis of mixed-effects models with censored response and HIV data.

The rest of this paper is organized as follows. In Section 2, the LMEC model is defined, and an EM-type algorithm for obtaining the ML estimates is described. In Section 3, we provide a brief sketch of the local influence approach for models with incomplete data, and also develop a methodology pertinent to the LMEC. Four different perturbation schemes are considered. In Section 4, the NLMEC model is defined. The methodology has been illustrated in Section 5 with the analysis of two examples involving HIV viral measure and by an empirical study. Finally, some concluding remarks are made in Section 6.

Section snippets

The linear mixed-effects with censored response

Ignoring censoring for the moment, the classical normal LME model is specified as follows (Laird and Ware, 1982): $y_{i} = X_{i} β + Z_{i} b_{i} + ϵ_{i},$ where $b_{i} \overset{i.i.d.}{\sim} N_{q} (0, D)$ is independent of $ϵ_{i} \overset{ind.}{\sim} N_{n_{i}} (0, σ^{2} I_{n_{i}}), i = 1, \dots, n$ ; the subscript $i$ is the subject index; $I_{p}$ denotes the $p \times p$ identity matrix; $y_{i} = {(y_{i 1}, \dots, y_{i n_{i}})}^{⊤}$ is an $n_{i} \times 1$ vector of observed continuous responses for subject $i$ ; $X_{i}$ is the $n_{i} \times p$ design matrix corresponding to the fixed effects, $β$ , of dimension $p \times 1$ ; $Z_{i}$ is the $n_{i} \times q$ design matrix corresponding to the $q \times 1$

Diagnostic analysis

Influence diagnostic techniques are used to identify anomalous observations that impact on model fitting or statistical inference for the assumed statistical model. There are primarily two approaches for detecting influential observations. The case-deletion approach (Cook, 1977) is the most popular one for identifying influential observations. To assess the impact of influential observations on parameter estimates some metrics have been used for measuring the distance between ${\hat{θ}}_{[i]}$ and $\hat{θ}$ , such

The nonlinear case

The NLME (Pinheiro and Bates, 2000) is defined as $y_{i} = η (ϕ_{i}, X_{i}) + ϵ_{i}, ϕ_{i} = A_{i} β + B_{i} b_{i}, i = 1, \dots, n,$ where $b_{i} \overset{i.i.d.}{\sim} N_{q} (0, D)$ and $ϵ_{i} \overset{ind.}{\sim} N_{n_{i}} (0, σ^{2} I_{n i})$ are independent; $y_{i}$ is an ( $n_{i} \times 1$ ) vector of observed continuous responses for subject $i$ ; $η$ is a nonlinear function of the individual random parameter $ϕ_{i}$ ; $A_{i}$ and $B_{i}$ are known design matrices of dimensions $r \times p$ and $r \times q$ , respectively, possibly depending on some covariate values; $β$ is the ( $p \times 1$ ) vector of fixed effects; and $b_{i}$ is the ( $q \times 1$ ) vector of random effects.

Numerical illustrations

We illustrate the performance of the proposed methods with the analysis of two HIV datasets, previously analyzed by Vaida and Liu (2009), and of a simulated example.

Conclusions

This article provides a new insight into the classical diagnostic methods for censored linear and nonlinear mixed-effects models, typically used for analyzing censored HIV viral load outcomes, and also presents an expectation conditional maximization (EMC) algorithm, which enables the development of diagnostic influence measures. Explicit expressions are obtained for the Hessian matrix $\ddot{Q}$ and for the matrix $Δ$ under different perturbation schemes. For NLMEC, the analysis is mathematically (and

Acknowledgments

The authors thank the, Editor, Associate Editor and two anonymous reviewers whose constructive comments on an earlier version led to this far improved manuscript. This study was supported by FAPESP and CNPq, Brazil.

References (28)

S.Y. Lee et al.
R influence analysis of nonlinear mixed-effects models
Computational Statistics & Data Analysis
(2004)
F. Osorio et al.
Assessment of local influence in elliptical linear models with longitudinal structure
Computational Statistics & Data Analysis
(2007)
C. Russo et al.
Influence diagnostics in nonlinear mixed-effects elliptical models
Computational Statistics & Data Analysis
(2009)
F. Vaida et al.
Efficient hybrid EM for linear and nonlinear mixed effects models with censored response
Computational Statistics & Data Analysis
(2007)
R. Cook
Detection of influential observation in linear regression
Technometrics
(1977)
R.D. Cook
Assessment of local influence
Journal of the Royal Statistical Society, Series B
(1986)
R.D. Cook et al.
Residuals and Influence in Regression
(1982)
Genz, A., Bretz, F., Hothorn, T., Miwa, T., Mi, X., Leisch, F., Scheipl, F., 2008. Mvtnorm: multivariate normal and t...
J. Hughes
Mixed effects models with censored data with application to HIV RNA levels
Biometrics
(1999)
H. Jacqmin-Gadda et al.
Analysis of left-censored longitudinal data with application to viral load in HIV infection
Biostatistics
(2000)

V.H. Lachos et al.

Influence diagnostics for the Grubbs’s model

Statistical Papers

(2007)

V. Lachos et al.

Linear and nonlinear mixed-effects models for censored HIV viral loads using normal/independent distributions

Biometrics

(2011)

N.M. Laird et al.

Random effects models for longitudinal data

Biometrics

(1982)

E. Lesaffre et al.

Local influence in linear mixed models

Biometrics

(1998)

Cited by (28)

Heckman selection-t model: Parameter estimation via the EM-algorithm
2021, Journal of Multivariate Analysis
The Heckman selection model is perhaps the most popular econometric model in the analysis of data with sample selection. The analyses of this model are based on the normality assumption for the error terms, however, in some applications, the distribution of the error term departs significantly from normality, for instance, in the presence of heavy tails and/or atypical observation. In this paper, we explore the Heckman selection-t model where the random errors follow a bivariate Student’s-t distribution. We develop an analytically tractable and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated Student’s-t distributions. Simulation studies show the vulnerability of the Heckman selection-normal model, as well as the robustness aspects of the Heckman selection-t model. Two real examples are analyzed, illustrating the usefulness of the proposed methods. The proposed algorithms and methods are implemented in the new R package HeckmanEM.
Finite mixture modeling of censored data using the multivariate Student-t distribution
2017, Journal of Multivariate Analysis
Citation Excerpt :
Vaida and Liu [39] proposed an exact Expectation–Maximization (EM) algorithm for maximum likelihood (ML) estimation in mixed effects models for censored data, which uses closed-form expressions at the E-step. Further, Matos et al. [29] developed diagnostic measures for assessing local influence in these models. Militino and Ugarte [34] developed an EM algorithm for conducting ML estimation in censored spatial data.
Finite mixture models have been widely used for the modeling and analysis of data from a heterogeneous population. Moreover, data of this kind can be subject to some upper and/or lower detection limits because of the restriction of experimental apparatus. Another complication arises when measures of each population depart significantly from normality, for instance, in the presence of heavy tails or atypical observations. For such data structures, we propose a robust model for censored data based on finite mixtures of multivariate Student- $t$ distributions. This approach allows us to model data with great flexibility, accommodating multimodality, heavy tails and also skewness depending on the structure of the mixture components. We develop an analytically simple, yet efficient, EM-type algorithm for conducting maximum likelihood estimation of the parameters. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of the multivariate truncated Student- $t$ distributions. Further, a general information-based method for approximating the asymptotic covariance matrix of the estimators is also presented. Results obtained from the analysis of both simulated and real datasets are reported to demonstrate the effectiveness of the proposed methodology. The proposed algorithm and methods are implemented in the new R package CensMixReg.
Influence assessment in censored mixed-effects models using the multivariate Student's-t distribution
2015, Journal of Multivariate Analysis
Citation Excerpt :
Hence, developing influence diagnostics is a key in assessing the effect of a single observation on the predicted scores for other observations, and consequently the overall parameter estimates, all based on the mean function. Although diagnostics for the traditional normality based LME and LMEC [19] models exist, those for heavy-tailed LMEC/NLMEC models are not well developed. Influence analysis is generally conducted using two primary approaches.
In biomedical studies on HIV RNA dynamics, viral loads generate repeated measures that are often subjected to upper and lower detection limits, and hence these responses are either left- or right-censored. Linear and non-linear mixed-effects censored (LMEC/NLMEC) models are routinely used to analyze these longitudinal data, with normality assumptions for the random effects and residual errors. However, the derived inference may not be robust when these underlying normality assumptions are questionable, especially the presence of outliers and thick-tails. Motivated by this, Matos et al. (2013) recently proposed an exact EM-type algorithm for LMEC/NLMEC models using a multivariate Student’s- $t$ distribution, with closed-form expressions at the E-step. In this paper, we develop influence diagnostics for LMEC/NLMEC models using the multivariate Student’s- $t$ density, based on the conditional expectation of the complete data log-likelihood. This partially eliminates the complexity associated with the approach of Cook (1977, 1986) for censored mixed-effects models. The new methodology is illustrated via an application to a longitudinal HIV dataset. In addition, a simulation study explores the accuracy of the proposed measures in detecting possible influential observations for heavy-tailed censored data under different perturbation and censoring schemes.
Estimation methods for multivariate Tobit confirmatory factor analysis
2014, Computational Statistics and Data Analysis
Tobit confirmatory factor analysis is particularly useful in analysis of multivariate data with censored information. Two methods for estimating multivariate Tobit confirmatory factor analysis models with covariates from a Bayesian and likelihood-based perspectives are proposed. In contrast with previous likelihood-based developments that consider Monte Carlo simulations for maximum likelihood estimation, an exact EM-type algorithm is proposed. Also, the estimation of the parameters via MCMC techniques by considering a hierarchical formulation of the model is explored. Bayesian case deletion influence diagnostics based on the $q$ -divergence measure and model selection criteria is also developed and considered in the analysis of a real dataset related to the education assessment field. In addition, a simulation study is conducted to compare the performance of the proposed method with the traditional confirmatory factor analysis. The results show that both methods offer more precise inferences than the traditional confirmatory factor analysis, which ignores the information about the censoring threshold.
A new extended Birnbaum-Saunders regression model for lifetime modeling
2013, Computational Statistics and Data Analysis
Citation Excerpt :
Many applications of the local influence method may be found in the statistical literature for various models and under different perturbation schemes. For instance, Espinheira et al. (2008), Vasconcellos and Fernandez (2009), Patriota et al. (2010), Lemonte and Patriota (2011), Zevallos et al. (2012) and Matos et al. (2013), among others. In this paper, we also propose a similar methodology to detect influential subjects in the new extended BS regression model.
A new class of extended Birnbaum–Saunders regression models is introduced. It can be applied to censored data and be used more effectively in survival analysis and fatigue life studies. Maximum likelihood estimation of the model parameters with censored data as well as influence diagnostics for the new regression model are investigated. The normal curvatures for studying local influence are derived under various perturbation schemes and a martingale-type residual is considered to assess departures from the extended Birnbaum–Saunders error assumption as well as to detect outlying observations. Further, a test of homogeneity of the shape parameters of the new regression model is proposed. Two real data sets are analyzed for illustrative purposes.
THE USE OF THE EM ALGORITHM FOR REGULARIZATION PROBLEMS IN HIGH-DIMENSIONAL LINEAR MIXED-EFFECTS MODELS
2023, arXiv

View all citing articles on Scopus

View full text

Influence diagnostics in linear and nonlinear mixed-effects models with censored data

Abstract

Introduction

Section snippets

The linear mixed-effects with censored response

Diagnostic analysis

The nonlinear case

Numerical illustrations

Conclusions

Acknowledgments

Computational Statistics & Data Analysis

Computational Statistics & Data Analysis

Computational Statistics & Data Analysis

Computational Statistics & Data Analysis

Detection of influential observation in linear regression

Technometrics

Assessment of local influence

Journal of the Royal Statistical Society, Series B

Residuals and Influence in Regression

Mixed effects models with censored data with application to HIV RNA levels

Biometrics

Analysis of left-censored longitudinal data with application to viral load in HIV infection

Biostatistics

Influence diagnostics for the Grubbs’s model

Statistical Papers

Linear and nonlinear mixed-effects models for censored HIV viral loads using normal/independent distributions

Biometrics

Random effects models for longitudinal data

Biometrics

Local influence in linear mixed models

Biometrics