Skip to main content
Log in

Rank-based estimation for semiparametric accelerated failure time model under length-biased sampling

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Length-biased sampling appears in many observational studies, including epidemiological studies, labor economics and cancer screening trials. To accommodate sampling bias, which can lead to substantial estimation bias if ignored, we propose a class of doubly-weighted rank-based estimating equations under the accelerated failure time model. The general weighting structures considered in our estimating equations allow great flexibility and include many existing methods as special cases. Different approaches for constructing estimating equations are investigated, and the estimators are shown to be consistent and asymptotically normal. Moreover, we propose efficient computational procedures to solve the estimating equations and to estimate the variances of the estimators. Simulation studies show that the proposed estimators outperform the existing estimators. Moreover, real data from a dementia study and a Spanish unemployment duration study are analyzed to illustrate the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Andersen, P.K., Borgan, Ø., Gill, R.D., Keiding, N.: Statistical Models Based on Counting Processes. Springer, New York (1993)

    Book  MATH  Google Scholar 

  • Asgharian, M., M’Lan, C.E., Wolfson, D.B.: Length-biased sampling with right censoring: an unconditional approach. J. Am. Stat. Assoc. 97, 201–209 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Asgharian, M., Wolfson, D.B.: Asymptotic behavior of the unconditional NPMLE of the length-biased survivor function from right censored prevalent cohort data. Ann. Stat. 33, 2109–2131 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Brown, B.M., Wang, Y.-G.: Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92, 149–158 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Brown, B.M., Wang, Y.-G.: Induced smoothing for rank regression with censored survival times. Stat. Med. 26, 828–836 (2007)

    Article  MathSciNet  Google Scholar 

  • Cheng, S.C., Wei, L.J., Ying, Z.: Analysis of transformation models with censored data. Biometrika 82, 835–845 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng, Y.-J., Huang, C.-Y.: Combined estimating equation approaches for semiparametric transformation models with length-biased survival data. Biometrics 70, 608–618 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Chiou, S.H., Kang, S., Yan, J.: Fast accelerated failure time modeling for case-cohort data. Stat. Comput. 24, 559–568 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • de Uña Álvarez, J., Otero-Giráldez, M.S., Álvarez Llorente, G.: Estimation under length-bias and right-censoring: an application to unemployment duration analysis for married women. J. Appl. Stat. 30, 283–291 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • de Uña-Álvarez, J., Iglesias-Pérez, M.C.: Nonparametric estimation of a conditional distribution from length-biased data. Ann. Inst. Stat. Math. 62, 323–341 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Fygenson, M., Ritov, Y.: Monotone estimating equations for censored data. Ann. Stat. 22, 732–746 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Harrington, D.P., Fleming, T.R.: A class of rank test procedures for censored survival data. Biometrika 69, 133–143 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  • Helsen, K., Schmittlein, D.C.: Analyzing duration times in marketing: evidence for the effectiveness of hazard rate models. Market. Sci. 12, 395–414 (1993)

    Article  Google Scholar 

  • Huang, C.-Y., Qin, J.: Composite partial likelihood estimation under length-biased sampling, with application to a prevalent cohort study of dementia. J. Am. Stat. Assoc. 107, 946–957 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Jin, Z., Lin, D.Y., Wei, L.J., Ying, Z.: Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson, L.M., Strawderman, R.L.: Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika 96, 577–590 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Lai, T.L., Ying, Z.: Rank regression methods for left-truncated and right-censored data. Ann. Stat. 19, 531–556 (1991)

  • Lancaster, T.: The Econometric Analysis of Transition Data. Cambridge University Press, Cambridge (1990)

    MATH  Google Scholar 

  • Lin, Y., Chen, K.: Efficient estimation of the censored linear regression model. Biometrika 100, 525–530 (2013)

  • Nan, B., Kalbfleisch, J.D., Yu, M.: Asymptotic theory for the semiparametric accelerated failure time model with missing data. Ann. Stat. 37, 2351–2376 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Ning, J., Qin, J., Shen, Y.: Semiparametric accelerated failure time model for length-biased data with application to dementia study. Stat. Sin. 24, 313–333 (2014)

    MATH  Google Scholar 

  • Prentice, R.L.: Linear rank tests with right censored dat. Biometrika 65, 167–180 (1978)

  • Qin, J., Shen, Y.: Statistical methods for analyzing right-censored length-biased data under Cox model. Biometrics 66, 382–391 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Shen, Y., Ning, J., Qin, J.: Analyzing length-biased data with semiparametric transformation and accelerated failure time models. J. Am. Stat. Assoc. 104, 1192–1202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Tsai, W.Y.: Pseudo-partial likelihood for proportional hazards models with biased-sampling data. Biometrika 96, 601–615 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Tsiatis, A.A.: Estimating regression parameters using linear rank tests for censored data. Ann. Stat. 18, 354–372 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  • Turnbull, B.W.: The empirical distribution function with arbitrarily grouped, censored and truncated data. J. R. Stat. Soc. Ser. B Methodol. 38, 290–295 (1976)

    MathSciNet  MATH  Google Scholar 

  • Varadhan, R., Gilbert, P.: BB: an R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. J. Stat. Softw. 32, 1–26 (2009)

    Article  Google Scholar 

  • Vardi, Y.: Nonparametric estimation in the presence of length bias. Ann. Stat. 10, 616–620 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  • Vardi, Y.: Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika 76, 751–761 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, H.J., Wang, L.: Quantile regression analysis of length-biased survival data. Stat 3, 31–47 (2014)

    Article  Google Scholar 

  • Wang, M.-C.: Nonparametric estimation from cross-sectional survival data. J. Am. Stat. Assoc. 86, 130–143 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, M.-C.: Hazards regression analysis for length-biased data. Biometrika 83, 343–354 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Wolfson, C., Wolfson, D.B., Asgharian, M., M’Lan, C.E., Østbye, T., Rockwood, K., Hogan, D.F.: A reevaluation of the duration of survival after the onset of dementia. N. Engl. J. Med. 344, 1111–1116 (2001)

    Article  Google Scholar 

  • Ying, Z.: A large sample study of rank estimation for censored regression data. Ann. Stat. 21, 76–99 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Zelen, M., Feinleib, M.: On the theory of screening for chronic diseases. Biometrika 56, 601–614 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  • Zeng, D., Lin, D.: Efficient estimation for the accelerated failure time model. J. Am. Stat. Assoc. 102, 1387–1396 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Zeng, D., Lin, D.Y.: Efficient resampling methods for nonsmooth estimating functions. Biostatistics 9, 355–363 (2008)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the editors and the reviewers for their helpful comments. The authors appreciate Professors Ian McDowell, Masoud Asgharian and Christina Wolfson for sharing the Canadian Study of Health and Aging data, and Professor Jacobo de Uña-Álvarez for providing the Spanish unemployment data set.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gongjun Xu.

Appendix: Analytical details

Appendix: Analytical details

Technical derivations for Eqs. (6) and (7) are presented in Sect. 8.1. The proof of Theorem 1 is in Sect. 8.2.

1.1 Derivation of Eqs. (6) and (7)

In Sect. 2.2, we constructed \(\nu _{1i}(\mathbf {\beta }, t)\) conditional on truncation time based on Eqs. (6) and (7). To verify Eq. (6), we show the two components, \(E_{\mathbf {X}}[\mathrm {d}N_{i}(\mathbf {\beta }_0, t)]\) and \(E_{\mathbf {X}}[I\{e_i^a(\mathbf {\beta }_0)\le t\} Y_{i}(\mathbf {\beta }_0, t) \mathrm {d}{\varLambda }_{\varepsilon }(t\mid \mathbf {X}_i)]\), can be simplified to the same quantity. Specifically,

$$\begin{aligned}&E_{\mathbf {X}}\{\mathrm {d}N_{i}(\mathbf {\beta }_0, t)\} \\&\quad = \mathrm {pr}\{\tilde{T}_i\in (e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t},e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t+\mathrm {d}t}), {\varDelta }_i=1 \mid \mathbf {X}_i\} \\&\quad =\int _0^{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}} f_{A,V}(e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}-v,v\mid \mathbf {X}_i) S_{C}(v\mid \mathbf {X}_i)\mathrm {d}v \,e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}\mathrm {d}t \\&\quad = \frac{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}f_{T^*}(e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}\mid \mathbf {X}_i)\mathrm {d}t}{\mu (\mathbf {X}_i)} \int _0^{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}} S_{C}(v\mid \mathbf {X}_i)\mathrm {d}v, \end{aligned}$$

and

$$\begin{aligned}&E_{\mathbf {X}}[I\{e_i^a(\mathbf {\beta }_0)\le t\} Y_{i}(\mathbf {\beta }_0,t)\mathrm {d}{\varLambda }_{\varepsilon }(t\mid \mathbf {X}_i)] \\&\quad = \mathrm {pr}\Big ( A_i +V_i\ge e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}, A_i +\tilde{C}_i\ge e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}, \\&\quad \quad \quad \quad A_i\le e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t} \mid \mathbf {X}_i\Big )\times \mathrm {d}{\varLambda }_{\varepsilon }(t\mid \mathbf {X}_i) \\&\quad = \int _{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}}^\infty \int _0^{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}} f_{A,V}(a,y-a\mid \mathbf {X}_i) \\&\quad \times S_{C}(e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}-a\mid \mathbf {X}_i)\mathrm {d}a\, \mathrm {d}y \times \mathrm {d}{\varLambda }_{\varepsilon }(t\mid \mathbf {X}_i)\\&\quad = \frac{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}f_{T^*}(e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}\mid \mathbf {X}_i)\mathrm {d}t}{\mu (\mathbf {X}_i)} \int _0^{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}} S_{C}(v\mid \mathbf {X}_i)\mathrm {d}v. \end{aligned}$$

To verify Eq. (7), it is sufficient to show

$$\begin{aligned}&E_{\mathbf {X}}[{\varDelta }_iI\{e_i^v(\mathbf {\beta }_0)\le t\} Y_{i}(\mathbf {\beta }_0, t)\mathrm {d}{\varLambda }_{\varepsilon }(t\mid \mathbf {X}_i)] \\&\quad =\mathrm {pr}(A_i + V_i \ge e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}, V_i \le \tilde{C}_i, V_i\le e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t} \mid \mathbf {X}_i) \\&\quad ~~~~\times \mathrm {d}{\varLambda }_{\varepsilon }(t\mid \mathbf {X}_i) \\&\quad =\int _{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}}^\infty \int _0^{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}} f_{A,V}(y- v,v\mid \mathbf {X}_i) S_{C}(v\mid \mathbf {X}_i)\mathrm {d}v\, \mathrm {d}y \\&\quad ~~~~ \times \mathrm {d}{\varLambda }_{\varepsilon }(t\mid \mathbf {X}_i)\\&\quad = \frac{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}f_{T^*}(e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}\mid \mathbf {X}_i)\mathrm {d}t}{\mu (\mathbf {X}_i)} \int _0^{e^{\mathbf {X}_i^\top \mathbf {\beta }_0+t}} S_{C}(v\mid \mathbf {X}_i)\mathrm {d}v. \end{aligned}$$

This gives the desired results.

1.2 Proof of Theorem 1

We impose the following regularity conditions:

  1. 1.

    Covariate vector, \(\mathbf {X}\), is uniformly bounded by some constant \(M_0\), of bounded total variation, and not contained in a \((p-1)\)-dimensional hyper-plane;

  2. 2.

    The parameter space B of \(\mathbf {\beta }\) is a compact set with \(\mathbf {\beta }_0\) in the interior;

  3. 3.

    The density function of T is a differentiable and its derivative is bounded over \([0,\tau ]\) for \(\tau = \inf \{t : P(T>t)= 1\} <\infty \).;

  4. 4.

    The residual censoring time \(\tilde{C}\) has a uniformly bounded density;

  5. 5.

    The weights, \(\phi \) and v, are differentiable in \(\mathbf {\beta }\) with bounded continuous derivative. The weights, \(\phi _n\), \(\phi \) and v, belong to Glivenko–Cantelli classes and \(\sup _{\mathbf {\beta }\in B,t\in [0,\tau ]}\Vert \phi _n-\phi \Vert \rightarrow 0.\)

  6. 6.

    The weights \(\phi _n\), \(\phi \) and v are Donsker; and \(\phi _n\) is is an asymptotic linear estimator of \(\phi \).

Conditions 1–4 are mild assumptions and are often assumed in survival analysis. The regularity of the weight functions assumed in Conditions 5 and 6 covers a large class of estimating equations and is satisfied for the methods introduced in the main article. We show that under the conditions 1–6, the proposed estimator consistent and asymptotically normal. The proof follows the general lines to Nan et al. (2009) and is outlined in the following. We write

$$\begin{aligned} \eta _n(\mathbf {\beta },t)=\frac{\sum _i \mathbf {X}_i \nu _i(\mathbf {\beta },t) Y_{i}(\mathbf {\beta },t)}{\sum _i \nu _i(\mathbf {\beta },t) Y_{i}(\mathbf {\beta },t)}. \end{aligned}$$

We have

$$\begin{aligned}&\sup _{\mathbf {\beta }\in B} \Vert U_n(\mathbf {\beta })-u(\mathbf {\beta })\Vert \\&\quad \le \sup _{\mathbf {\beta }\in B} \biggr \Vert U_n(\mathbf {\beta }) - E\left[ \int _{-\infty }^\infty \phi _{n}(\mathbf {\beta },t)\{\mathbf {X}_i -\eta _n(\mathbf {\beta },t)\}\mathrm {d}N_i(\mathbf {\beta },t) \right] \biggr \Vert \\&\quad + \sup _{\mathbf {\beta }\in B} \biggr \Vert E\left[ \int _{-\infty }^\infty \{\phi _{n}(\mathbf {\beta },t)-\phi (\mathbf {\beta },t)\} \mathbf {X}_i \mathrm {d}N_i(\mathbf {\beta },t) \right] \biggr \Vert \\&\quad + \sup _{\mathbf {\beta }\in B} \biggr \Vert E\left[ \int _{-\infty }^\infty \{\phi (\mathbf {\beta },t)-\phi _{n}(\mathbf {\beta },t)\} \eta (\mathbf {\beta },t) \mathrm {d}N_i(\mathbf {\beta },t) \right] \biggr \Vert \\&\quad + \sup _{\mathbf {\beta }\in B} \biggr \Vert E\left[ \int _{-\infty }^\infty \phi _{n}(\mathbf {\beta },t) \{\eta (\mathbf {\beta },t)-\eta _n(\mathbf {\beta },t)\} \mathrm {d}N_i(\mathbf {\beta },t) \right] \biggr \Vert . \end{aligned}$$

The first term converges to 0 in probability by the Glivenko–Cantelli property assumed in condition 5. The other terms also converge to 0 due to the convergence of \(\phi _n\) and \(\rho _n\) to \(\phi \) and \(\rho \), respectively. This result implies that the estimating function \(U_n(\mathbf {\beta })\) converges uniformly to the nonrandom function u and gives the consistency of the estimates as shown in Nan et al. (2009).

Under conditions 1–6, we have \(n^{1/2}(\phi _n-\phi )=O_p(1)\) and \(n^{1/2}(\eta _n-\eta )=O_p(1)\). In addition, we have the limit \(n^{1/2} \sup _{\mathbf {\beta }\in B} \Vert U_n(\mathbf {\beta })-E\{U_n(\mathbf {\beta })\}\Vert =O_p(1)\). Then, we obtain the following from the preceding inequality

$$\begin{aligned} n^{1/2} \sup _{\mathbf {\beta }\in B}\Vert U_n(\mathbf {\beta })-u(\mathbf {\beta })\Vert = O_p(1). \end{aligned}$$

This further implies that \(\Vert \mathbf {\beta }-\mathbf {\beta }_0\Vert =O_p(n^{-1/2})\). For some large M and \(\Vert \mathbf {\beta }-\mathbf {\beta }_0\Vert \le M n^{-1/2}\), following the argument in Nan et al. (2009), we have

$$\begin{aligned}&n^{1/2}\{U_n(\mathbf {\beta })-U_n(\mathbf {\beta }_0)\} \\&\quad = n^{-1/2} \sum _{i=1}^n \int _{-\infty }^{\infty } \{\phi _{n}(\mathbf {\beta }, t)-\phi _{n}(\mathbf {\beta }_0, t)\} \left\{ \mathbf {X}_i -\eta _n(\mathbf {\beta }_0,t)\right\} \mathrm {d}N_i(\mathbf {\beta }, t) \\&\qquad + n^{-1/2} \sum _{i=1}^n \int _{-\infty }^{\infty } \phi _{n}(\mathbf {\beta }, t) \left\{ -\eta _n(\mathbf {\beta },t)+ \eta _n(\mathbf {\beta }_0,t)\right\} \mathrm {d}N_i(\mathbf {\beta }, t) \\&\quad = n^{1/2}(\mathbf {\beta }-\mathbf {\beta }_0) E\left[ \int _{-\infty }^\infty \frac{\partial \phi (\mathbf {\beta },t)}{\partial \mathbf {\beta }} \Big |_{\mathbf {\beta }_0}\left\{ \mathbf {X}_i-\eta (\mathbf {\beta }_0,t)\right\} \mathrm {d}N_i(\mathbf {\beta },t) \right] \\&\qquad - n^{1/2}(\mathbf {\beta }-\mathbf {\beta }_0)E\left\{ \int _{-\infty }^\infty \frac{\partial \eta (\mathbf {\beta },t)}{\partial \mathbf {\beta }}\Big |_{\mathbf {\beta }_0}\phi (\mathbf {\beta },t) \mathrm {d}N_i(\mathbf {\beta },t)\right\} \\&\qquad +o_p(1) \\&\quad = n^{1/2}(\mathbf {\beta }-\mathbf {\beta }_0)A(\mathbf {\beta }_0)+o_p(1), \end{aligned}$$

where \(A(\mathbf {\beta })=\partial u(\mathbf {\beta })/\partial \mathbf {\beta }\) is the slope matrix. This gives the asymptotic linearity of \(U_n(\mathbf {\beta })\). Using the functional delta method, \(U_n(\mathbf {\beta }_0)\) is equivalent to a normalized sum of independent zero-mean random vectors, i.e.,

$$\begin{aligned} U_n(\mathbf {\beta }_0)=n^{-1/2}\sum _{i=1}^n\xi _i(\mathbf {\beta }_0)+o_p(1). \end{aligned}$$

Therefore the central limit theorem gives that \(n^{1/2}(\mathbf {\beta }-\mathbf {\beta }_0)\) converges in distribution to \(N(0, A^{-1}BA^{-1})\) almost surely, where \(B=E[\xi _i(\mathbf {\beta }_0)^{\otimes 2}]\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chiou, S.H., Xu, G. Rank-based estimation for semiparametric accelerated failure time model under length-biased sampling. Stat Comput 27, 483–500 (2017). https://doi.org/10.1007/s11222-016-9634-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9634-5

Keywords

Navigation