Skip to main content
Log in

Large-margin learning of Cox proportional hazard models for survival analysis

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Machine learning approaches have been recently attempted to tackle the prediction tasks in survival analysis. However, most existing methods aim to learn the prognostic function directly via linear regression or ranking models, unable to exploit the underlying density family, notably the famous CoxPH model. In this paper we propose a novel estimator for the CoxPH model based on the margin maximization principle, which was proven to achieve superb generalization performance in standard classification problems in machine learning. The censored data are effectively handled by incorporating cost-sensitive margin violation loss. We demonstrate the improved prediction performance on several survival datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Technically it is called right-censored. There are other types of censoring: the left-censored means the observed time is no earlier than the event time, and the in-between-censored indicates that we only have an interval observation within which the true event time lies. As these types of censoring are relatively rare, we deal with only right-censored cases in this paper.

  2. Technically, one should replace it with: \(\frac {P(t\leq T\leq t+dt|T\leq t,\mathbf {x})}{dt}\) for dt → 0.

  3. The prognostic index of the subject with the covariates x can be chosen as −bx (i.e., higher for longer survival) or any monotone non-decreasing link applied to it.

  4. Here, the bias term is ignored presuming that the features are centralized (i.e., having zero mean).

References

  1. Adams RP, Murray I, Mackay DJ (2009) Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. International Conference on Machine Learning

  2. Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 44(2):525–536

    Article  MathSciNet  MATH  Google Scholar 

  3. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory

  4. Cheng S, Wei L, Ying Z (1997) Predicting survival probabilities with semiparametric transformation models. J Am Stat Assoc 92(437):227–235

    Article  MathSciNet  MATH  Google Scholar 

  5. Cox D (1972) Regression models and life-tables (with discussion). J R Stat Soc Ser B 34(2):187–220

    MATH  Google Scholar 

  6. Dabrowska D, Doksum K (1988) Partial likelihood in transformation models with censored data. Scand J Stat 15(1):1–23

    MathSciNet  MATH  Google Scholar 

  7. Dempsey WH, Moreno A, Scott CK, Dennis ML, Gustafson DH, Murphy SA, Rehg JM (2017) Isurvive: an interpretable, event-time prediction model for mHealth. International Conference on Machine Learning

  8. Fernández T, Rivera N, Teh YW (2016) Gaussian processes for survival analysis. In: Advances in Neural Information Processing Systems

  9. Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. European Conference on Computational Learning Theory

  10. Gill P, Murray W, Wright M (1981) Practical optimization. Academic Press, London

    MATH  Google Scholar 

  11. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2(3):841–860

    Article  MathSciNet  MATH  Google Scholar 

  12. Kalbfleisch J (1978) Likelihood methods and nonparametric tests. J Am Stat Assoc 73(361):167–170

    Article  MathSciNet  MATH  Google Scholar 

  13. Kalbfleisch J, Prentice R (2002) The statistical analysis of failure time data. Wiley Series in Probability and Statistics, New York

    Book  MATH  Google Scholar 

  14. Khan F, Zubek V (2008) Support vector regression for censored data (SVRc): A novel tool for survival analysis. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM)

  15. Kim M, Pavlovic V (2011) Sequence classification via large margin hidden Markov models. Data Min Knowl Disc 23(2):322–344

    Article  MathSciNet  MATH  Google Scholar 

  16. Kleinbaum DG, Klein M (2005) Survival analysis: a self-learning text (statistics for biology and health). Springer, Berlin

    MATH  Google Scholar 

  17. Lillard P (2000) aml multilevel multiprocess statistical software. Release 1.0, EconWare, LA, California

  18. Prentice RL (1974) A log gamma model and its maximum likelihood estimation. Biometrika 61(3):539–544

    Article  MathSciNet  MATH  Google Scholar 

  19. Ranganath R, Perotte A, Elhadad N, Blei D (2016) Deep survival analysis. Machine Learning for Health Care

  20. Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, Cambridge

    MATH  Google Scholar 

  21. Ross SM (2006) Simulation. Academic Press, New York

    MATH  Google Scholar 

  22. Sauerbrei W, Royston P (1999) Building multivariable prognostic and diagnostic models: Transformation of the predictors by using fractional polynomials. J R Stat Soc Ser A 162(1):71–94

    Article  Google Scholar 

  23. Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge

    MATH  Google Scholar 

  24. Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th Annual Conference on Computational Learning Theory, Desenzano sul Garda, Italy

  25. Shivaswamy P, Chu W, Jansche M (2007) A support vector approach to censored targets. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM)

  26. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222

    Article  MathSciNet  Google Scholar 

  27. Sorensen DC (1982) Newton’s method with a model trust region modification. SIAM J Numer Anal 19 (2):409–426

    Article  MathSciNet  MATH  Google Scholar 

  28. Taskar B, Guestrin C, Koller D (2003) Max-margin Markov networks. Neural Information Processing Systems, Vancouver, BC, Canada

  29. Therneau TM, Grambsch PM (2000) Modeling Survival Data: Extending the Cox Model. Springer, New York

    Book  MATH  Google Scholar 

  30. Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. ICML

  31. Van Belle V, Pelckmans K, Suykens J, Van Huffel S (2009) Learning transformation models for ranking and survival analysis. Tech. Rep., 09-45, ESAT-SISTA, K.U.Leuven (Leuven, Belgium)

  32. Van Belle V, Pelckmans K, Van Huffel S, Suykens J (2011) Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artif Intell Med 53(2):107–118

    Article  Google Scholar 

  33. Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin

    Book  MATH  Google Scholar 

  34. Zhang T (2002) Covering number bounds of certain regularized linear function classes. J Mach Learn Res 2:527–550

    MathSciNet  MATH  Google Scholar 

Download references

Funding

This study was supported by the Research Program funded by the SeoulTech (Seoul National University of Science & Technology).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minyoung Kim.

Ethics declarations

Conflict of interests

The authors have no conflict of interest.

Consent for Publication

Consent to submit this manuscript has been received tacitly from the authors’ institution, Seoul National University of Science & Technology.

Additional information

Informed Consent

This research does not involve human participants nor animals.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, M. Large-margin learning of Cox proportional hazard models for survival analysis. Appl Intell 49, 1675–1687 (2019). https://doi.org/10.1007/s10489-018-1363-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1363-3

Keywords

Navigation