Abstract
Doubly-truncated data arise in many fields, including economics, engineering, medicine, and astronomy. This article develops likelihood-based inference methods for lifetime distributions under the log-location-scale model and the accelerated failure time model based on doubly-truncated data. These parametric models are practically useful, but the methodologies to fit these models to doubly-truncated data are missing. We develop algorithms for obtaining the maximum likelihood estimator under both models, and propose several types of interval estimation methods. Furthermore, we show that the confidence band for the cumulative distribution function has closed-form expressions. We conduct simulations to examine the accuracy of the proposed methods. We illustrate our proposed methods by real data from a field reliability study, called the Equipment-S data.
Similar content being viewed by others
References
Cheng RCH, Iles TC (1983) Confidence bands for cumulative distribution functions of continuous random variables. Technometrics 25(1):77–86
Cochran WG (1968) Errors of measurement in statistics. Technometrics 10(4):637–666
Dörre A (2020) Bayesian estimation of a lifetime distribution under double truncation caused by time-restricted data collection. Stat Pap 61(3):945–965
Dörre A, Emura T (2019) Analysis of doubly truncated data: an introduction. Springer, Berlin
Efron B, Petrosian V (1999) Nonparametric methods for doubly truncated data. J Am Stat Assoc 94(447):824–834
Emura T, Konno Y (2012) Multivariate normal distribution approaches for dependently truncated data. Stat Pap 53(1):133–149
Emura T, Pan C-H (2020) Parametric likelihood inference and goodness-of-fit for dependently left-truncated data, a copula-based approach. Stat Pap 61(1):479–501
Emura T, Konno Y, Michimae H (2015) Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation. Lifetime Data Anal 21(3):397–418
Emura T, Hu Y, Konno Y (2017) Asymptotic inference for maximum likelihood estimators under the special exponential family with double-truncation. Stat Pap 58(3):877–909
Emura T, Hu Y, and Huang C (2018) Double.truncation: analysis of doubly-truncated data. CRAN
Escobar LA, Hong Y, Meeker WQ (2008) Simultaneous confidence bands and regions for log-location-scale distributions with censored data. Iowa State University Digital Repository, Paper, Statistical Preprints, p 32
Frank G, Dörre A (2017) Linear regression with randomly double-truncated data. S Afr Stat J 51(1):1–18
Härtler G (1987) Estimation and test for the parameters of the Arrhenius model. Qual Reliab Eng Int 3(4):219–225
Hoadley B (1971) Asymptotic properties of maximum likelihood estimators for the independent not identically distributed case. Ann Math Stat 42(6):1977–1991
Hong Y, Meeker WQ, Escobar LA (2008) Avoiding problems with normal approximation confidence intervals for probabilities. Technometrics 50(1):64–68
Hu Y, Emura T (2015) Maximum likelihood estimation for a special exponential family under random double-truncation. Comput Stat 30(4):1199–1229
Jeng S, Meeker WQ (2001) Parametric simultaneous confidence bands for cumulative distributions from censored data. Technometrics 43(4):450–461
Lawless JF (2003) Statistical models and methods for lifetime data, 2nd edn. Springer, New York
Lehmann EL, Casella G (1998) Theory of point estimation. Springer, Berlin
Menon MV (1963) Estimation of the shape and scale parameters of the Weibull distribution. Technometrics 5(2):175–182
Moreira C, de Uña-Álvarez J (2010) Bootstrapping the NPMLE for doubly truncated data. J Nonparametr Stat 22:567–583
Moreira C, Van Keilegom I (2013) Bandwidth selection for kernel density estimation with doubly truncated data. Comput Stat Data Anal 61:107–123
Moreira C, de Uña-Álvarez J, Van Keilegom I (2014) Goodness-of-fit Tests for a semiparametric model under random double truncation. Comput Stat 29(5):1365–1379
Nelson W (1982) Applied life data analysis. Addison-Wesley, Boston
Sankaran PG, Sunoj SM (2004) Identification of models using failure rate and mean residual life of double truncated random variables. Stat Pap 45(1):97–109
Scheike TH, Keiding N (2006) Design and analysis of time-to-pregnancy. Stat Methods Med Res 15:127–140
Shen PS (2010) Nonparametric analysis of doubly truncated data. Ann Inst Stat Math 62(5):835–853
Shen PS (2011) Testing quasi-independence for doubly truncated data. J Nonparametr Stat 23(3):753–761
Woodroofe M (1985) Estimating a distribution function with truncaed data. Ann Stat 13(1):163–177
Ye Z, Tang L (2016) Augmenting the unreturned for field data with information on returned failures only. Technometrics 58(4):513–523
Acknowledgements
The authors thank the associate editor and two anonymous reviewers for their helpful comments that greatly improved this work. Takeshi Emura is financially supported by Ministry of Science and Technology, Taiwan (103-2118-M-008-MY2; 107-2118-M-008-003-MY3).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Derivatives of the log-likelihood function
To obtain he MLE numerically, we need the first and second derivatives of the log-likelihood \(\ell (\varvec{\theta })\). Define notations \(Z_{s}(\varvec{\theta }) = (s - \mu )/\sigma , s \in \{u_i, y_i, v_i\}\) and
Note that \(H_{v_i}(\varvec{\theta })\) is the backward hazard rate at time \(v_i\) and \(H_{u_i}(\varvec{\theta })\) is the forward hazard rate at time \(u_i\) under double-truncation (Sankaran and Sunoj 2004).
The first-order derivatives of the log-likelihood are
The second-order derivatives of the log-likelihood are
For the AFT model, we define \(Z_s(\varvec{\theta }) = (s - \beta _0 - {\varvec{x}}_i^{\mathrm T} \varvec{\beta })/\sigma , s \in \{u_i, y_i, v_i\}\) and define \(H_{v_i}(\varvec{\theta }), \ H_{u_i}(\varvec{\theta }), \ K_{v_i}(\varvec{\theta })\) and \(K_{u_i}(\varvec{\theta })\) by Equations (1) and (2).
The first-order derivatives of the log-likelihood are
The second-order derivatives of the log-likelihood are
Appendix B: Regularity conditions
We impose the following conditions to derive the asymptotic distribution of the MLEs.
Assumption (A) There exists a positive definite matrix \({\varvec{I}}(\varvec{\theta })\) such that, as \(n \rightarrow \infty \),
Assumption (B) The partial derivatives and integration can be exchangeable such that
Assumption (C) There is a measurable function \(M_{jsl}(\cdot )\) such that
with \(m_{i,jsl} \equiv \mathrm {E}_{\varvec{\theta }^0} \{ M_{jsl}(Y_i) \} < \infty \) and \(m_{i,jsl}^2 \equiv \mathrm {E}_{\varvec{\theta }^0} \{ M_{jsl}(Y_i)^2 \} < \infty \). For some \(m_{jsl}\) and \(m_{jsl}^2\), it holds that \(\sum _{i=1}^n m_{i,jsl}/n \rightarrow m_{jsl}\) and \(\sum _{i=1}^n m_{i,jsl}^2 /n \rightarrow m_{jsl}^2\) as \(n \rightarrow \infty \).
Assumption (D) There is a measurable function \(W_{js}(\cdot )\) such that
with \(w_{i,js} \equiv \mathrm {E}_{\varvec{\theta }^0} \{ W_{js}(Y_i) \} < \infty \) and \(w_{i,js}^2 \equiv \mathrm {E}_{\varvec{\theta }^0} \{ W_{js}(Y_i)^2 \} < \infty \). For some \(w_{js}\) and \(w_{js}^2\), it holds that \(\sum _{i=1}^n w_{i,js}/n \rightarrow w_{js}\) and \(\sum _{i=1}^n w_{i,js}^2 /n \rightarrow w_{js}^2\) as \(n \rightarrow \infty \).
Assumption (E) There is a measurable function \(A_j(\cdot )\) such that
with \(\sup _y A_j^2(y) < \infty \).
Assumption (A) requires the Fisher information matrix to be stable for large samples. Here, the matrix \({\varvec{I}}(\varvec{\theta })\) is regarded as the asymptotic information matrix. Assumptions (B)–(C) impose the smoothness and boundedness of \(f_{\varvec{\theta }}(y | Y \in [u_i, v_i])\), which are similar to those imposed for the i.i.d. models (pp. 462–463 of Lehmann and Casella 1998). Assumptions (D)–(E) require the boundedness of the score functions and Hessian matrix, as employed by Emura et al. (2017) for the i.n.i.d. models. While these assumptions are too strong without truncation (Hoadley 1971), they can be satisfied under double-truncation since the density is truncated from below and above. See Lemma 4 of Emura et al. (2017).
Appendix C: The derivation of the transformed CI for \(F_{\varvec{\theta }}(t)\)
The derivation is based on inverting the CI for the quantile. Define the pth quantile function \(g(\varvec{\theta }) = y_p = \log (t_p) = \mu + \sigma w_p\), where \(w_p = \varPhi ^{-1}(p)\) for \(0 \le p \le 1\). The SE of \(\widehat{y}_p = \widehat{\mu }+ \widehat{\sigma }w_p\) is
The \((1-\alpha ) 100\%\) CI for the pth quantile is \([\widehat{y}_p(\min ), \widehat{y}_p(\max )]\), where
Let \(w_p = a, \varOmega = Z_{\alpha /2}/\widehat{\sigma }\) and \(\widehat{\xi }= \{\log (t) - \widehat{\mu }\}/\widehat{\sigma }\). Inverting the CI for the quantile means that (1) the solution to \(\widehat{y}_p(\min ) = \log (t)\) with respect to p gives the upper CI for \(F_{\varvec{\theta }}(t)\) and (2) the solution to \(\widehat{y}_p(\max ) = \log (t)\) with respect to p gives the lower CI for \(F_{\varvec{\theta }}(t)\). Thus we need to solve
Manipulating these equations, we have
The solutions to the above equation are
where
Therefore, the transformed CI for \(F_{\varvec{\theta }}(t)\) is \([\varPhi (a_{\min }), \varPhi (a_{\max })]\) with the restriction \(1 - \varOmega ^2 \lambda ^*_{22} > 0\).
Appendix D: The derivation of the CB for \(F_{\varvec{\theta }}(t)\)
Here we assume the regularity conditions given in “Appendix C”. For the location-scale model, the observed information matrix can be written as
Define the scaled information matrix as \({\varvec{i}}_S(\widehat{\varvec{\theta }}) = \begin{bmatrix} i_{11} &{} i_{12} \\ i_{12} &{} i_{22} \end{bmatrix}\), and the inverse of the scaled information matrix \(\varLambda = \begin{bmatrix} i_{11} &{} i_{12} \\ i_{12} &{} i_{22} \end{bmatrix}^{-1} = \begin{bmatrix} \lambda _{11} &{} \lambda _{12} \\ \lambda _{12} &{} \lambda _{22} \end{bmatrix}\). Letting \(\varvec{\delta }^{\mathrm T} = (0, 1)\), we find \(\min \varvec{\delta }^{\mathrm T} \varvec{\theta }\) for the constrained region \((\widehat{\varvec{\theta }}- \varvec{\theta }) {\varvec{i}}(\widehat{\varvec{\theta }}) (\widehat{\varvec{\theta }}- \varvec{\theta }) \le \gamma \) such that
subject to the constraint \((\widehat{\varvec{\theta }}- \varvec{\theta })^{\mathrm T} {\varvec{i}}_S(\widehat{\varvec{\theta }}) (\widehat{\varvec{\theta }}- \varvec{\theta }) \le \widehat{\sigma }^2 (\gamma /n) \equiv \widehat{\sigma }^2 \gamma _0\).
Let \({\varvec{d}}^{\mathrm T} = \left( \sqrt{{\varvec{i}}_S^{-1}(\widehat{\varvec{\theta }})} \varvec{\delta }\right) ^{\mathrm T}\) and \({\varvec{k}}= \sqrt{{\varvec{i}}_S(\widehat{\varvec{\theta }})} (\varvec{\theta }- \widehat{\varvec{\theta }})\), we then have \({\varvec{k}}^{\mathrm T} {\varvec{k}}= \gamma _0 \widehat{\sigma }^2\). From the Cauchy-Schwartz inequality, \({\varvec{d}}^{\mathrm T} {\varvec{k}}\le \sqrt{{\varvec{d}}^{\mathrm T} {\varvec{d}}} \sqrt{{\varvec{k}}^{\mathrm T} {\varvec{k}}} \le \gamma _0 \widehat{\sigma }^2\). The equality holds when \({\varvec{k}}= a {\varvec{d}}\), where a is constant. Since we have \((a {\varvec{d}})^{\mathrm T} (a {\varvec{d}}) = \gamma _0 \widehat{\sigma }^2\), the minimum is attained when \(a = -\sqrt{\gamma _0 \widehat{\sigma }^2 / {\varvec{d}}^{\mathrm T} {\varvec{d}}}\) at
This minimum is attained when
and
This gives the restriction \((1 - \sqrt{\gamma _0 \lambda _{22}}) > 0\), that is, \(i_{11}(i_{22} - \gamma _0) - i_{12}^2 > 0\). Similarly, under \({\varvec{c}}^{\mathrm T} = (1, w_p)\) for the pth quantile,
The maximum and minimum hold when
The CBs for the quantile function are
The CBs for \(F_{\varvec{\theta }}(t)\) are derived by inverting the CB for the quantile function, that is,
where \(\widehat{\xi }= (y - \widehat{\mu })/\widehat{\sigma }= \{\log (t) - \widehat{\mu }\}/\widehat{\sigma }\) and \(w_p = q\). This equation yields the solutions
where \(\widehat{p} = \varPhi [\{\log (t) - \widehat{\mu }\}/\widehat{\sigma }], w_{\widehat{p}} = \varPhi {-1}(\widehat{p}) = \{\log (t) - \widehat{\mu }\}/\widehat{\sigma }\),
Therefore, the CB for \(F_{\varvec{\theta }}(t)\) is \([\varPhi (q_{\min }), \varPhi (q_{\max })]\).
Rights and permissions
About this article
Cite this article
Dörre, A., Huang, CY., Tseng, YK. et al. Likelihood-based analysis of doubly-truncated data under the location-scale and AFT model. Comput Stat 36, 375–408 (2021). https://doi.org/10.1007/s00180-020-01027-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-01027-6