Skip to main content
Log in

Model-free feature screening for ultrahigh dimensional censored regression

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In this paper we design a sure independent ranking and screening procedure for censored regression (cSIRS, for short) with ultrahigh dimensional covariates. The inverse probability weighted cSIRS procedure is model-free in the sense that it does not specify a parametric or semiparametric regression function between the response variable and the covariates. Thus, it is robust to model mis-specification. This model-free property is very appealing in ultrahigh dimensional data analysis, particularly when there is lack of information for the underlying regression structure. The cSIRS procedure is also robust in the presence of outliers or extreme values as it merely uses the rank of the censored response variable. We establish both the sure screening and the ranking consistency properties for the cSIRS procedure when the number of covariates p satisfies \(p=o\{\exp (an)\}\), where a is a positive constant and n is the available sample size. The advantages of cSIRS over existing competitors are demonstrated through comprehensive simulations and an application to the diffuse large-B-cell lymphoma data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Fan, J., Feng, Y., Wu, Y.: High-dimensional variable selection for coxs proportional hazards model. IMS Collect. 6, 70–86 (2010)

    MathSciNet  Google Scholar 

  • Fan, J., Li, R.: Variable selection for cox’s proportional hazards model and frailty model. Ann. Stat. 30, 74–99 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space (with discussion). J. R. Stat. Soc. B 70, 849–911 (2008)

    Article  MathSciNet  Google Scholar 

  • Fan, J., Samworth, R., Wu, Y.: Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res. 10, 1829–1853 (2009)

    MathSciNet  MATH  Google Scholar 

  • Fan, J., Song, R.: Sure independence screening in generalized linear models with NP-Dimensionality. Ann. Stat. 38, 3567–3604 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Fang, K.T., Kotz, S., Ng, K.W.: Symmetric Multivariate and Related Distributions. Chapman & Hall, London (1989)

    MATH  Google Scholar 

  • Gorst-Rasmussen, A., Scheike, T.: Independent screening for single-index hazard rate models with ultrahigh dimensional features. J. R. Stat. Soc. B 75, 217–245 (2013)

    Article  MathSciNet  Google Scholar 

  • He, X., Wang, L., Hong, H.: Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 41, 342–369 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, G., Peng, H., Zhang, J., Zhu, L.: Robust rank correlation based screening. Ann. Stat. 40, 1846–1877 (2012a)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, R., Zhong, W., Zhu, L.: Feature screening via distance correlation learning. J. Am. Stat. Assoc. 107, 1129–1139 (2012b)

    Article  MathSciNet  MATH  Google Scholar 

  • Lo, S.H., Singh, K.: The product-limit estimator and the bootstrap: some asymptotic representations. Probab. Theory Relat. Fields 71, 455–465 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  • Lu, W., Li, L.: Boosting methods for nonlinear transformation models with censored survival data. Biostatistics 9, 658–667 (2008)

    Article  Google Scholar 

  • Rosenwald, A., Wright, G., Chan, W.C., Connors, J.M., Hermelink, H.K., Smeland, E.B., Staudt, L.M.: The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346, 1937–1947 (2002)

    Article  Google Scholar 

  • Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (1980)

    Book  MATH  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Uno, H., Cai, T., Pencina, M.J., D’Agostino, R.B., Wei, L.J.: On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. 30, 1105–1117 (2011)

    MathSciNet  Google Scholar 

  • Zhao, S.D., Li, Y.: Principled sure independence screening for cox models with ultra-high-dimensional covariates. J. Multivar. Anal. 105, 397–411 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu, L.P., Li, L., Li, R., Zhu, L.X.: Model-free feature screening for ultrahigh dimensional data. J. Am. Stat. Assoc. 106, 1464–1475 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Tingyou Zhou’s research is supported by Shanghai University of Finance and Economics Innovation Fund of Graduate Student (CXJJ-2014-447). Liping Zhus research is supported by National Natural Science Foundation of China (11371236 and 11422107), Henry Fok Education Foundation Fund of Young College Teachers (141002) and Innovative Research Team in University of China (IRT13077), Ministry of Education of China. All correspondence should be directed to Liping Zhu at zhu.liping@ruc.edu.cn. The authors thank the Editor, an Associate Editor and the anonymous reviewers for their constructive suggestions, which have helped greatly improve the presentation of our paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liping Zhu.

Appendix 1: Proof of theorems

Appendix 1: Proof of theorems

1.1 Appendix 1.1: Proof of theorem 1

We first observe that \(\omega _{k} = \omega _{k,1}\). Thus it suffices to show that

$$\begin{aligned} \underset{k\in {\mathcal {A}}}{\min }\omega _{k,1} > \underset{k\in {\mathcal {I}}}{\max }\omega _{k,1}. \end{aligned}$$
(4.1)

With the conditional independence model (2.1) and the linearity condition, we have

$$\begin{aligned} \Omega _{k,1}(t)= & {} E\left[ E\left\{ X_k \mathbf {1}(Y<t)\mid \mathbf {x}_{\mathcal {A}}\right\} \right] \\= & {} E\left[ E(X_k \mid \mathbf {x}_{\mathcal {A}})E\left\{ \mathbf {1}(Y<t)\mid \mathbf {x}_{\mathcal {A}}\right\} \right] \\= & {} \mathrm {cov}\left( X_k,\mathbf {x}_{{\mathcal {A}}}^\mathrm{\tiny {T}}\right) \left\{ \mathrm {var}(\mathbf {x}_{\mathcal {A}}) \right\} ^{-1}E\left\{ \mathbf {x}_{\mathcal {A}}\mathbf {1}(Y<t) \right\} . \end{aligned}$$

Let \({\varvec{\Omega }}_{{\mathcal {A}}}(t) \!=\! E\left\{ \mathbf {x}_{\mathcal {A}}\mathbf {1}(Y\!<\!t) \right\} \) and \({\varvec{\Omega }}_{{\mathcal {A}}} = E\left\{ {\varvec{\Omega }}_{{\mathcal {A}}}(T) {\varvec{\Omega }}^\mathrm{\tiny {T}}_{{\mathcal {A}}}(T)\right\} \). Thus,

$$\begin{aligned} \omega _k = E\left\{ \Omega _{k,1}^2(T) \right\}= & {} \mathrm {cov}\left( X_k,\mathbf {x}_{{\mathcal {A}}}^\mathrm{\tiny {T}}\right) \left\{ \mathrm {var}(\mathbf {x}_{\mathcal {A}}) \right\} ^{-1}\\&\times \,{\varvec{\Omega }}_{{\mathcal {A}}} \left\{ \mathrm {var}(\mathbf {x}_{\mathcal {A}}) \right\} ^{-1} \mathrm {cov}(\mathbf {x}_{{\mathcal {A}}}, X_k). \end{aligned}$$

Without much difficulty, we can obtain that

$$\begin{aligned} \underset{k\in {\mathcal {I}}}{\max } \omega _k\le & {} \lambda _{\max }\left\{ \mathrm {cov}\left( \mathbf {x}_{\mathcal {I}},\mathbf {x}_{\mathcal {A}}^\mathrm{\tiny {T}}\right) \mathrm {cov}\left( \mathbf {x}_{\mathcal {A}},\mathbf {x}_{\mathcal {I}}^\mathrm{\tiny {T}}\right) \right\} \\&\times \,\lambda _{\max }({\varvec{\Omega }}_{{\mathcal {A}}})\lambda ^{-2}_{\min } \left\{ \mathrm {var}(\mathbf {x}_{\mathcal {A}})\right\} , \end{aligned}$$

which implies the desired result.

1.2 Appendix 1.2: Proof of theorem 2

We merely prove the case \({\widehat{G}}_k(t\mid X_k) = {\widehat{G}}(t)\) as the proof for the other two cases are very similar. We first show that \(\widehat{\omega }_k= n^3/\{n(n-1)(n-2)\}{\widetilde{\omega }}_k\), a scaled version of \({\widetilde{\omega }}_k\), can be expressed as follows:

Lemma 1

Under Condition (C4), the Kaplan-Meier estimator \(\widehat{G}(\cdot )\) satisfies:

  1. (1)

    \({\sup }_{0\le t \le T}|\widehat{G}(t)-G(t)|= O\{(\frac{\log n}{n})^{\frac{1}{2}}\}\) almost surely.

  2. (2)

    \(\{\widehat{G}(t)\}^{-1}-\{G(t)\}^{-1}\!=\! n^{-1}\{G(t)\}^{-2} \sum _{g=1}^n{\xi (T_g,\delta _g,t)}\!+\!R_n(t)\), where \(\xi (T_g,\delta _g,t),g=1,\ldots ,n\) are i.i.d. random variables with mean zero and \({\sup }_{0\le t \le T}|R_n(t)|=O\{(\frac{\log n}{n})^{\frac{3}{4}}\}\) almost surely.

  3. (3)

    \({\sup }_{0\le t \le T}|\frac{1}{\widehat{G}(t)}-\frac{1}{G(t)}|=O\{(\frac{\log n}{n})^{\frac{1}{2}}\}\) almost surely.

Result (1) can be found in Lemma 3 of Lo and Singh (1986). Direct application of Taylor expansion yields (2) and (3).

Using Lemma 1, we can write

$$\begin{aligned} \widehat{\omega }_k= & {} \frac{6}{n(n-1)(n-2)}\\&\times \,\sum _{j<i<l}^{n}{h(X_{jk},T_{j},\delta _{j};X_{ik},T_{i},\delta _{i};X_{lk},T_{l},\delta _{l})}\\&+\,O\left\{ \left( \frac{\log n}{n}\right) ^{\frac{1}{2}}\right\} \\&\mathop {=}\limits ^{{\tiny \hbox {def}}}U_n+O\left\{ \left( \frac{\log n}{n}\right) ^{\frac{1}{2}}\right\} \end{aligned}$$

where \(h(\cdot )\) stands for the kernel of the U-statistic \(U_n\), which can be expressed as follows:

$$\begin{aligned}&h(X_{jk},T_{j},\delta _{j};X_{ik},T_{i},\delta _{i};X_{lk},T_{l},\delta _{l})\\&\quad =\,\Bigg \{\frac{\delta _{i}X_{ik}\mathbf {1}(T_i<T_j)}{G(T_i)} \frac{\delta _{l}X_{lk}\mathbf {1}(T_l<T_j)}{G(T_l)}\\&\qquad +\,\frac{\delta _{l}X_{lk}\mathbf {1}(T_l<T_i)}{G(T_l)} \frac{\delta _{j}X_{jk}\mathbf {1}(T_j<T_i)}{G(T_j)}\\&\qquad +\,\frac{\delta _{j}X_{jk}\mathbf {1}(T_j<T_l)}{G(T_j)} \frac{\delta _{i}X_{ik}\mathbf {1}(T_i<T_l)}{G(T_i)}\Bigg \}\Bigg /3. \end{aligned}$$

In other words, \(U_n\) is a standard U-statistics which can be expressed as

$$\begin{aligned} U_n=(n!)^{-1}\sum _{n!}{\mathcal {W}(X_{1k},T_1,\delta _1;\ldots ;X_{nk},T_n,\delta _n)}, \end{aligned}$$

where each \({\mathcal {W}(X_{1k},T_1,\delta _1;\ldots ;X_{nk},T_n,\delta _n)}\) is an average of \(k^{*}=[n/3]\) independent and identically distributed random variables, \(\sum _{n!}\) means the summation of n! permutations \((i_1,\ldots ,i_n)\) of \((1,\ldots ,n).\)

For any \(t\in (0,s_0 k^{*})\), where \(s_0\) is a positive constant, it follows that

$$\begin{aligned} \hbox {pr}(\widehat{\omega }_k-\omega _k \ge \varepsilon )= & {} \hbox {pr}[\exp (t\widehat{\omega }_k)\ge \exp \{t(\omega _k+\varepsilon )\}]\\\le & {} \frac{E\{\exp (t\widehat{\omega }_k)\}}{\exp (t\varepsilon ) \exp (t \omega _k)}. \end{aligned}$$

The first equality stands because of the monotony of the exponential function and the inequality follows by Markov’s inequality.

Since \(E\{\exp (t\widehat{\omega }_k)\}=E\{\exp (t[U_n+O\{(\frac{\log n}{n})^{\frac{1}{2}}\}])\}=E\{\exp (t U_n)\} \exp [t \cdot O\{(\frac{\log n}{n})^{\frac{1}{2}}\}]\), for any fixed \(t\in (0,s_0 k^{*})\), \(\exp [t \cdot O\{(\frac{\log n}{n})^{\frac{1}{2}}\}]\) goes to 1 as n goes to infinity, together with the application of Jensne’s inequality, we get that

where \({\psi _{h}(s)}=E[\exp \{s \cdot h(X_{jk},T_{j},\delta _{j};X_{ik},T_{i},\delta _{i}; X_{lk},T_{l},\delta _{l})\}],\,\,s\in (0,s_0)\).

The combination of the above results shows that

$$\begin{aligned} \hbox {pr}(\widehat{\omega }_k-\omega _k \ge \varepsilon )\le & {} \exp (-t\varepsilon )\exp (-t \omega _k)\{\psi _{h}(t/k^{*})\}^{k^{*}}\\= & {} \{\exp (-s\varepsilon )\exp (-s \omega _k)\psi _{h}(s)\}^{k^{*}}. \end{aligned}$$

where \(s=t/k^{*}.\) Since \(E\{h(X_{jk},T_{j},\delta _{j};X_{ik},T_{i},\delta _{i}; X_{lk},T_{l},\delta _{l})\}=\omega _k\), and Taylor expansion shows that for any generic random variable Y, there exists a constant \(s_1\in (0,s)\), such that one can find some random variable Z bounded by \(0<Z<Y^2\exp (s_1 Y)\) satisfies \(\exp (s Y)=1+sY+s^2 Z/2.\) So, we obtain that

Applying Condition (C3) on \(X_k\), we obtain that there exists a constant \(C_0\) such that

$$\begin{aligned} \underset{1\le k \le p}{\max }\exp (-s \omega _k)\psi _{h}(s)\le 1+C_0 s^2. \end{aligned}$$

Together with Taylor expansion that \(\exp (-s\varepsilon )=1-\varepsilon s+O(s^2)\), it follows that

$$\begin{aligned} \underset{1\le k \le p}{\max }\{\exp (-s\varepsilon )\exp (-s \omega _k)\psi _{h}(s)\}\le 1-\varepsilon s/2, \end{aligned}$$

where \(s=t/k^{*}\in (0,s_0)\) is sufficiently small (as long as t is sufficiently small). Thus, for an arbitrary \(\varepsilon >0\), there exists a small enough constant \(s_{\varepsilon }\) such that

$$\begin{aligned} \underset{1\le k \le p}{\max }\hbox {pr}\{\widehat{\omega }_k-\omega _k \ge \varepsilon \}\le (1-\varepsilon s_{\varepsilon }/2)^{n/3}. \end{aligned}$$

Similarly, we can get that

$$\begin{aligned} \underset{1\le k \le p}{\max }\hbox {pr}\{\widehat{\omega }_k-\omega _k \le -\varepsilon \}\le (1-\varepsilon s_{\varepsilon }/2)^{n/3}. \end{aligned}$$

Consequently,

$$\begin{aligned} \hbox {pr}\left( \underset{k=1,\ldots ,p}{\sup }|\widehat{\omega }_k-\omega _k|> \varepsilon \right) \le (2p)\exp \{n\log (1-\varepsilon s_{\varepsilon }/2)/3\}. \end{aligned}$$

Next, we prove that

$$\begin{aligned} \hbox {pr}\left( \underset{k\in {\mathcal {I}}}{\max }\widehat{\omega }_k<\underset{k\in {\mathcal {A}}}{\min }\widehat{\omega }_k\right) \ge 1-(4p)\exp \{n\log (1-\eta s_{\eta /2}/4)/3\}. \end{aligned}$$

Recall that we set \(\eta ={\min }{\omega _{k}}_{k\in {\mathcal {A}}}-{\max }{\omega _{k}}_{k\in {\mathcal {I}}}\). Therefore,

$$\begin{aligned} \hbox {pr}\left( \underset{k\in {\mathcal {A}}}{\min }\widehat{\omega }_k\le \underset{k\in {\mathcal {A}}}{\min }\widehat{\omega }_k\right)= & {} \hbox {pr}\left( \underset{k\in {\mathcal {A}}}{\min }\widehat{\omega }_k-\underset{k\in {\mathcal {A}}}{\min }\omega _k\right. \\&\quad \left. +\eta <\underset{k\in {\mathcal {A}}}{\min }\widehat{\omega }_k -\underset{k\in {\mathcal {A}}}{\min }\omega _k\right) \\&\le \hbox {pr}\left( \mathop {\sup }_{k\in {\mathcal {A}}}|\widehat{\omega }_k-\omega _k|\ge \eta /2\right) \\&+\,\hbox {pr}\left( \mathop {\sup }_{k\in {\mathcal {I}}}|\widehat{\omega }_k-\omega _k|\ge \eta /2\right) . \end{aligned}$$

Applying (2.12) with \(\varepsilon =\eta /2\), we complete the proof of (2.13).

1.3 Appendix 1.3: Proof of theorem 3

We first prove that for any \(\varepsilon > 0\),

$$\begin{aligned} \underset{k=1,\ldots ,p}{\max }\{\hbox {pr}\left( |\widehat{\omega }_k-\omega _k| > \varepsilon \right) \} \le 2\exp \left( -\frac{\tau _0^2 k^{*} \varepsilon ^2}{2a^2}\right) . \end{aligned}$$
(4.2)

From the uniform bound condition of \(\mathbf {x}\), we can see that \(h(X_{jk},T_{j},\delta _{j};X_{ik},T_{i},\delta _{i};X_{lk},T_{l},\delta _{l}) \), the kernel of the U-statistic \(\widehat{\omega }_k\), is also bounded, that is,

$$\begin{aligned} - \frac{a^2}{\tau _0^2}<h(X_{jk},T_{j},\delta _{j}; X_{ik},T_{i},\delta _{i};X_{lk},T_{l},\delta _{l})<\frac{a^2}{\tau _0^2}. \end{aligned}$$

Our foregoing arguments show that for any \(t\in (0,s_0 k^{*})\), we have

$$\begin{aligned} \hbox {pr}(\widehat{\omega }_k-\omega _k \ge \varepsilon )\le \exp (-t\varepsilon )\exp (-t \omega _k)\{\psi _{h}(t/k^{*})\}^{k^{*}}, \end{aligned}$$

where \(k^{*}=[n/3]\). Together with the exponential inequality in Lemma 5.6.1.A of Serfling (1980), we obtain that

$$\begin{aligned} \hbox {pr}(\widehat{\omega }_k-\omega _k \ge \varepsilon ) \le \exp \left( -\varepsilon t + \frac{a^2 t^2}{2\tau _0^2 k^{*}}\right) . \end{aligned}$$

By choosing \(t=\frac{\tau _0^2 k^{*}}{a^2} \varepsilon \), the right hand side attains its minimum \(\exp (-\frac{\tau _0^2 k^{*}}{2a^2} \varepsilon ^2)\), which together with the symmetry of U-statistic implies the validity of (4.2). Let \(\varepsilon \mathop {=}\limits ^{{\tiny \hbox {def}}}cn^{-\kappa }\). We have

$$\begin{aligned} \underset{k=1,\ldots ,p}{\max }\{\hbox {pr}\left( |\widehat{\omega }_k-\omega _k| \!>\! cn^{-\kappa }\right) \} \!\le \!2\exp \left( -n^{1-2\kappa } \frac{\tau _0^2 c^2}{6a^2}\right) . \end{aligned}$$
(4.3)

To facilitate our subsequent proof, we write the event \({\mathcal {C}}_n \mathop {=}\limits ^{{\tiny \hbox {def}}}\{{\max }_{k \in {\mathcal {A}}}|\widehat{\omega }_k-\omega _k| \le cn^{-\kappa }\}\). Recall that we assume \({\min }{\omega _k}_{k\in {\mathcal {A}}}\ge 2cn^{-\kappa }\). Under this assumption, if the event \({\mathcal {C}}_n\) occurs, it holds for all \(k \in {\mathcal {A}}\) that \(\widehat{\omega }\ge cn^{-\kappa }\). Thus, we obtain that

$$\begin{aligned} \hbox {pr}\left( {\mathcal {A}}\subseteq {\widehat{{\mathcal {A}}}}\right) \ge \hbox {pr}\left( {\mathcal {C}}_n\right) . \end{aligned}$$

Since

$$\begin{aligned} \hbox {pr}\left( {\mathcal {C}}_n \right)= & {} 1-\hbox {pr}\left( {\mathcal {C}}_n^c \right) =1-\hbox {pr}\{\underset{k \in {\mathcal {A}}}{\max }|\widehat{\omega }_k-\omega _k|> cn^{-\kappa }\}\\\ge & {} 1-s_n\underset{k \in {\mathcal {A}}}{\max }\{\hbox {pr}\left( |\widehat{\omega }_k-\omega _k|> cn^{-\kappa }\right) \}\\\ge & {} 1-s_n\underset{k=1,\ldots ,p}{\max }\{\hbox {pr}\left( |\widehat{\omega }_k-\omega _k| > cn^{-\kappa }\right) \}\\= & {} 1-O\left\{ s_n\exp \left( -n^{1-2\kappa } \cdot \frac{\tau _0^2 c^2}{6a^2}\right) \right\} . \end{aligned}$$

The last equation holds because of (4.3). This completes the proof of Theorem 3.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, T., Zhu, L. Model-free feature screening for ultrahigh dimensional censored regression. Stat Comput 27, 947–961 (2017). https://doi.org/10.1007/s11222-016-9664-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9664-z

Keywords

Navigation