Abstract
Machine learning approaches have been recently attempted to tackle the prediction tasks in survival analysis. However, most existing methods aim to learn the prognostic function directly via linear regression or ranking models, unable to exploit the underlying density family, notably the famous CoxPH model. In this paper we propose a novel estimator for the CoxPH model based on the margin maximization principle, which was proven to achieve superb generalization performance in standard classification problems in machine learning. The censored data are effectively handled by incorporating cost-sensitive margin violation loss. We demonstrate the improved prediction performance on several survival datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Technically it is called right-censored. There are other types of censoring: the left-censored means the observed time is no earlier than the event time, and the in-between-censored indicates that we only have an interval observation within which the true event time lies. As these types of censoring are relatively rare, we deal with only right-censored cases in this paper.
Technically, one should replace it with: \(\frac {P(t\leq T\leq t+dt|T\leq t,\mathbf {x})}{dt}\) for dt → 0.
The prognostic index of the subject with the covariates x can be chosen as −b⊤x (i.e., higher for longer survival) or any monotone non-decreasing link applied to it.
Here, the bias term is ignored presuming that the features are centralized (i.e., having zero mean).
References
Adams RP, Murray I, Mackay DJ (2009) Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. International Conference on Machine Learning
Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 44(2):525–536
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory
Cheng S, Wei L, Ying Z (1997) Predicting survival probabilities with semiparametric transformation models. J Am Stat Assoc 92(437):227–235
Cox D (1972) Regression models and life-tables (with discussion). J R Stat Soc Ser B 34(2):187–220
Dabrowska D, Doksum K (1988) Partial likelihood in transformation models with censored data. Scand J Stat 15(1):1–23
Dempsey WH, Moreno A, Scott CK, Dennis ML, Gustafson DH, Murphy SA, Rehg JM (2017) Isurvive: an interpretable, event-time prediction model for mHealth. International Conference on Machine Learning
Fernández T, Rivera N, Teh YW (2016) Gaussian processes for survival analysis. In: Advances in Neural Information Processing Systems
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. European Conference on Computational Learning Theory
Gill P, Murray W, Wright M (1981) Practical optimization. Academic Press, London
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2(3):841–860
Kalbfleisch J (1978) Likelihood methods and nonparametric tests. J Am Stat Assoc 73(361):167–170
Kalbfleisch J, Prentice R (2002) The statistical analysis of failure time data. Wiley Series in Probability and Statistics, New York
Khan F, Zubek V (2008) Support vector regression for censored data (SVRc): A novel tool for survival analysis. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM)
Kim M, Pavlovic V (2011) Sequence classification via large margin hidden Markov models. Data Min Knowl Disc 23(2):322–344
Kleinbaum DG, Klein M (2005) Survival analysis: a self-learning text (statistics for biology and health). Springer, Berlin
Lillard P (2000) aml multilevel multiprocess statistical software. Release 1.0, EconWare, LA, California
Prentice RL (1974) A log gamma model and its maximum likelihood estimation. Biometrika 61(3):539–544
Ranganath R, Perotte A, Elhadad N, Blei D (2016) Deep survival analysis. Machine Learning for Health Care
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, Cambridge
Ross SM (2006) Simulation. Academic Press, New York
Sauerbrei W, Royston P (1999) Building multivariable prognostic and diagnostic models: Transformation of the predictors by using fractional polynomials. J R Stat Soc Ser A 162(1):71–94
Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge
Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th Annual Conference on Computational Learning Theory, Desenzano sul Garda, Italy
Shivaswamy P, Chu W, Jansche M (2007) A support vector approach to censored targets. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM)
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Sorensen DC (1982) Newton’s method with a model trust region modification. SIAM J Numer Anal 19 (2):409–426
Taskar B, Guestrin C, Koller D (2003) Max-margin Markov networks. Neural Information Processing Systems, Vancouver, BC, Canada
Therneau TM, Grambsch PM (2000) Modeling Survival Data: Extending the Cox Model. Springer, New York
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. ICML
Van Belle V, Pelckmans K, Suykens J, Van Huffel S (2009) Learning transformation models for ranking and survival analysis. Tech. Rep., 09-45, ESAT-SISTA, K.U.Leuven (Leuven, Belgium)
Van Belle V, Pelckmans K, Van Huffel S, Suykens J (2011) Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artif Intell Med 53(2):107–118
Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin
Zhang T (2002) Covering number bounds of certain regularized linear function classes. J Mach Learn Res 2:527–550
Funding
This study was supported by the Research Program funded by the SeoulTech (Seoul National University of Science & Technology).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors have no conflict of interest.
Consent for Publication
Consent to submit this manuscript has been received tacitly from the authors’ institution, Seoul National University of Science & Technology.
Additional information
Informed Consent
This research does not involve human participants nor animals.
Rights and permissions
About this article
Cite this article
Kim, M. Large-margin learning of Cox proportional hazard models for survival analysis. Appl Intell 49, 1675–1687 (2019). https://doi.org/10.1007/s10489-018-1363-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1363-3