Skip to main content

A Bayesian Approach to Sparse Cox Regression in High-Dimentional Survival Analysis

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9166))

Abstract

Survival prediction and prognostic factor identification play an important role in machine learning research. This paper employs the machine learning regression algorithms for constructing survival model. The paper suggests a new Bayesian framework for feature selection in high-dimensional Cox regression problems. The proposed approach gives a strong probabilistic statement of the shrinkage criterion for feature selection. The proposed regularization gives the estimates that are unbiased, possesses grouping and oracle properties, their maximal risk diverges to a finite value. Experimental results show that the proposed framework is competitive on both simulated data and publicly available real data sets.

The work is supported by grants of the Russian Foundation for Basic Research No. 11-07-00409 and No. 11-07-00634.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aalen, O., Borgan, O., Gjessing, H., Gjessing, S.: Survival and Event History Analysis: A Process Point of View, ser. Statistics for Biology and Health. Springer-Verlag, New York (2008)

    Book  Google Scholar 

  2. Klein, J.P., Moeschberger, M.L.: Survival Analysis, 2nd edn. Springer, New York (2005)

    MATH  Google Scholar 

  3. Cox, D.R.: Regression models and life-tables (with discussion). J. Roy. Stat. Soc. B 34, 187–220 (1972)

    MATH  Google Scholar 

  4. Gui, J., Li, H.: Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21, 3001–3008 (2005)

    Article  Google Scholar 

  5. Fan, J., Li, R.: Variable selection for Coxs proportional hazards model and frailty model. Ann. Stat. 30, 74–99 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  6. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  7. Van Houwelingen, H.C., et al.: Cross-validated Cox regression on microarray gene expression data. Stat. Med. 25, 3201–3216 (2006)

    Article  MathSciNet  Google Scholar 

  8. Lin, D.W., Porter, M., Montgomery, B.: Treatment and survival outcomes in young men diagnosed with prostate cancer: a Population-based Cohort Study. Cancer 115(13), 2863–2871 (2009)

    Article  Google Scholar 

  9. Ying, Z.L.: A large sample study of rank estimation for censored regression data. Ann. Stat. 21, 76–99 (1993)

    Article  MATH  Google Scholar 

  10. Jin, Z., Lin, D.Y., Wei, L.J., Ying, Z.L.: Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  11. Sauerbrei, W.: The use of resampling methods to simplify regression models in medical statistics. Apply Stat. 48, 313–339 (1999)

    MATH  Google Scholar 

  12. Sauerbrei, W., Schumacher, M.: A bootstrap resampling procedure for model building: application to the cox regression model. Stat. Med. 11, 2093–2109 (1992)

    Article  Google Scholar 

  13. Tibshirani, R.: The lasso method for variable selection in the Cox model. Stat. Med. 16, 385–395 (1997)

    Article  Google Scholar 

  14. Zhang, H.H., Lu, W.: Adaptive lasso for Coxs proportional hazards model. Biometrika 94, 691–703 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  15. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67, 301–320 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  16. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MATH  Google Scholar 

  17. Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann. Stat. 36(4), 1509–1566 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  18. Fan, J., Samworth, R., Wu, Y.: Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res. 10, 2013–2038 (2009)

    MATH  MathSciNet  Google Scholar 

  19. Leeb, H., Potscher, B.M.: Sparse estimators and the oracle property, or the return of Hodges estimator. J. Econometrics 142(1), 201–211 (2008)

    Article  MathSciNet  Google Scholar 

  20. Hastie, T.: Tibshirani R Generalized Additive Models. Chapman and Hall, London (1990)

    Google Scholar 

  21. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. J. Stat. Softw. 39(5), 1–13 (2011)

    Google Scholar 

  22. Rosenwald, A., et al.: The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. The New England J. Med. 25, 1937–1947 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vadim Mottl .

Editor information

Editors and Affiliations

Appendix: Proofs

Appendix: Proofs

Proof of Theorem 1. The function \(l(\varvec{\beta })\) (11) is strictly convex [5]. The penalty term (15) is strictly convex in subspace \({\beta _l} \in [-1/\mu ,1/\mu ], l = 1,...,p.\) In fact, the second derivatives of \(p(\varvec{\beta })\) are

$$\begin{aligned} \frac{\partial ^2 p(\varvec{\beta })}{\partial \beta _i\partial \beta _j}= {\left\{ \begin{array}{ll} \frac{1-\mu \beta _i^{2}}{(\mu \beta _i+1)^2}, i=j\\ 0, i\not =j. \end{array}\right. } \end{aligned}$$
(18)

So the Hessian of \(p(\varvec{\beta })\) non-negative defined in subspace \({\beta _l} \in [-1/\mu ,1/\mu ], l = 1,...,p.\) Define estimator \(\hat{\varvec{\beta }}^*\): let \(\hat{\beta }^*_k=\hat{\beta }_k\) for all \(i \not = j\), otherwise let \(\hat{\beta }_k^*= a\hat{\beta }_i+(1-a)\hat{\beta }_j\) for \(a=1/2\). Since \(\varvec{x}^{(i)}=\varvec{x}^{(j)}\), \(\varvec{\tilde{X}}\varvec{\hat{\beta }^*}= \tilde{\varvec{X}}\varvec{\hat{\beta }}\) and \(|\varvec{\tilde{z}} -\tilde{\varvec{X}}\varvec{\hat{\beta }^*} |=|\varvec{\tilde{z}} -\tilde{\varvec{X}}\hat{\varvec{\beta }} |\). However, the penalization function is convex in \({\beta _l} \in [-1/\mu ,1/\mu ], l = 1,...,p.\), that

$$\begin{aligned} p(\hat{\varvec{\beta }}^*) = p(a\hat{\beta }_i+(1-a)\hat{\varvec{\beta }}_j) <a p(\hat{\varvec{\beta _i}})+(1-a)p(\hat{\varvec{\beta }}_j)< p(\hat{\varvec{\beta }}) . \end{aligned}$$

Because \(p(\hat{\varvec{\beta }}^*)=p(\hat{\varvec{\beta }})\) and because p(.) is additive, \(p(\hat{\varvec{\beta }}^*) < p(\hat{\varvec{\beta }})\) and therefore cannot be the case that \(\hat{\varvec{\beta }}\) is a minimizer. Hence \(\hat{\beta _i} = \hat{\beta _j}\).

Proof of Theorem 2. By definition,

$$\begin{aligned} \frac{\partial J(\varvec{\beta }|\mu )}{\partial {\beta _k}}\mid _{\beta =\hat{\beta }}=0. \end{aligned}$$
(19)

By (19) (for non-zero \(\hat{\beta }_i\) and \(\hat{\beta }_j\) ),

$$\begin{aligned} -2\tilde{\varvec{x}}_i^T(\tilde{\varvec{z}}-\tilde{\varvec{X}}\hat{\varvec{\beta }})+(1+1/\mu )\frac{2\mu \hat{\varvec{\beta }_i}}{\mu \hat{\varvec{\beta }_i^2}+1}=0 \end{aligned}$$
(20)

and

$$\begin{aligned} -2\tilde{\varvec{x}}_j^T(\tilde{\varvec{z}}-\tilde{\varvec{X}}\hat{\varvec{\beta }})+(1+1/\mu )\frac{2\mu \hat{\varvec{\beta }_j}}{\mu \hat{\varvec{\beta }_j^2}+1}=0 \end{aligned}$$
(21)

Hence

$$\begin{aligned} \frac{\hat{\varvec{\beta }_i}}{\mu \hat{\varvec{\beta }_i^2}+1} - \frac{\hat{\varvec{\beta }_j}}{\mu \hat{\varvec{\beta }_j^2}+1} = \frac{1}{1+\mu }(\tilde{\varvec{x}}_i - \tilde{\varvec{x}}_j)^T(\tilde{\varvec{z}}-\tilde{\varvec{X}}\hat{\varvec{\beta }})\le \frac{1}{1+\mu }|\tilde{\varvec{x}}_i - \tilde{\varvec{x}}_j||\tilde{\varvec{z}}-\tilde{\varvec{X}}\hat{\varvec{\beta }}|. \end{aligned}$$
(22)

Also, note that \(J(\varvec{\hat{\beta }}|\mu )\le J(\varvec{\hat{\beta }=\varvec{0}}|\mu )\), so \(|\tilde{\varvec{z}}-\tilde{\varvec{X}}\hat{\varvec{\beta }}|\le |\tilde{\varvec{z}}| = 1\), since \(\tilde{\varvec{z}}\) is centered and standardize . Hence,

$$\begin{aligned} \frac{\hat{\varvec{\beta }_i}}{\mu \hat{\varvec{\beta }_i^2}+1} - \frac{\hat{\varvec{\beta }_j}}{\mu \hat{\varvec{\beta }_j^2}+1} \le \frac{1}{1+\mu }|\tilde{\varvec{x}}_i - \tilde{\varvec{x}}_j| =\frac{\sqrt{2(1-\rho )}}{\mu +1} . \end{aligned}$$
(23)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Krasotkina, O., Mottl, V. (2015). A Bayesian Approach to Sparse Cox Regression in High-Dimentional Survival Analysis. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2015. Lecture Notes in Computer Science(), vol 9166. Springer, Cham. https://doi.org/10.1007/978-3-319-21024-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21024-7_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21023-0

  • Online ISBN: 978-3-319-21024-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics