research-article

Feature selection methods for high-dimensional biomedical time-to-event data: a review

Authors:
Huaning Tan

International School, Jinan University, China

International School, Jinan University, China

0000-0003-0999-2125
View Profile

,
Chutong Deng

International School, Jinan University, China

International School, Jinan University, China

0000-0002-1987-5574
View Profile

,
Shaobo Chen

College of Information Science and Technology, Jinan University, China

College of Information Science and Technology, Jinan University, China

0000-0002-6125-8879
View Profile

,
Qianlin Luo

International School, Jinan University, China

International School, Jinan University, China

0000-0001-7417-2954
View Profile

,
Guoqiang Hu

International School, Jinan University, China

International School, Jinan University, China

0000-0001-9333-1498
View Profile

,
Yujuan Quan

College of Information Science and Technology, Jinan University, China

College of Information Science and Technology, Jinan University, China

0000-0001-6206-7022
View Profile

ICBDT '22: Proceedings of the 5th International Conference on Big Data TechnologiesSeptember 2022Pages 113–119https://doi.org/10.1145/3565291.3565309

Published:16 December 2022Publication History

ICBDT '22: Proceedings of the 5th International Conference on Big Data Technologies

Pages 113–119

ABSTRACT

In digital era, time-to-event data collected from biomedical studies and healthcare are often of high dimensionality, presenting computational challenges for traditional survival models. To make full use of these data, feature selection (FS), a data processing technique for dimensionality reduction, shows great significance. This work introduces statistical, machine learning, and deep learning FS methods for time-to-event data, mainly focusing on lasso, elastic net, adaptive lasso, adaptive elastic net, random survival forest, and XGBoost. We also describe three state-of-art FS methods – BASIL, FilterDeepHit+, and SparseDeepHit+. Then, we compare C-Index of 4 basic FS methods in experiment. Finally, we discuss future challenges and draw a conclusion.

References

Kleinbaum, David G., and Mitchel Klein. "Survival analysis. Statistics for biology and health." Survival 510 (2005).Google Scholar
Wang W, Liu W. Integration of gene interaction information into a reweighted Lasso-Cox model for accurate survival prediction[J]. Bioinformatics, 2020, 36(22-23). https://doi.org/10.1093/bioinformatics/btaa1046Google Scholar
Qing Z, Xingjie S, Yang X, Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. [J]. Briefings in Bioinformatics(2):291. https://doi.org/10.1093/bib/bbu003Google Scholar
Ren Z, Zhang L, Ding W, Development and validation of a novel survival model for head and neck squamous cell carcinoma based on autophagy-related genes[J]. Genomics, 2020, 113(1). https://doi.org/10.1016/j.ygeno.2020.11.017Google Scholar
Kevin, He, Yue, An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis.[J]. Lifetime Data Analysis, 2018.https://doi.org/10.1007/s10985-018-9455-2Google Scholar
Saha S, Ryu D, Ebrahimi N. Variable Selection with Random Survival Forest and Bayesian Additive Regression Tree for Survival Data[J]. 2019.https://doi.org/Google Scholar
Walschaerts M, Leconte E, Besse P. Stable variable selection for right censored data: comparison of methods[J]. Tse Working Papers, 2012.https://doi.org/10.48550/arXiv.1203.4928Google Scholar
Ni A, Cai J, Zeng D. Variable selection for case-cohort studies with failure time outcome[J]. Biometrika, 2016, 103(3):547-562.https://doi.org/10.1093/biomet/asw027Google ScholarCross Ref
Wright R B E M. Adaptive Control Processes: a Guided Tour. By Richard Bellman. 1961. 42s. Pp. xvi + 255. (Princeton University Press)[J]. The Mathematical Gazette, 1962, 46(356): xvi-161.https://doi.org/10.2307/3611672Google Scholar
Rietschel C, Yoon J, Mihaela V. Feature Selection for Survival Analysis with Competing Risks using Deep Learning[J]. 2018.https://doi.org/10.48550/arXiv.1811.09317Google Scholar
Shahraki H R, Salehi A, Zare N. Survival Prognostic Factors of Male Breast Cancer in Southern Iran: a LASSO-Cox Regression Approach[J]. Asian Pac J Cancer Prev, 2015, 16(15):6773-6777.https://doi.org/10.7314/APJCP.2015.16.15.6773Google ScholarCross Ref
Kim J, Sohn I, Jung S H, Analysis of Survival Data with Group Lasso[J]. Communications in Statistics - Simulation and Computation, 2012, 41(9).https://doi.org/10.1080/03610918.2011.611311Google Scholar
Cunningham P, Kathirgamanathan B, Delany S J. Feature Selection Tutorial with Python Examples[J]. 2021.https://doi.org/10.48550/arXiv.2106.06437Google Scholar
Shen Z, Wang H, Zhang Z, A fast adaptive Lasso for the cox regression via safe screening rules[J]. Journal of Statistical Computation and Simulation, 2021, 91(14):3005-3027.https://doi.org/10.1080/00949655.2021.1914043Google ScholarCross Ref
Attallah O, Karthikesalingam A, Holt P J E, feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re- intervention[J]. 2018.https://doi.org/10.1186/s12911-017-0508-3Google Scholar
Li J, Cheng K, Wang S, Feature Selection: A Data Perspective[J]. Acm Computing Surveys, 2016, 50(6).https://doi.org/10.1145/3136625Google ScholarDigital Library
Remeseiro B , Bolon-Canedo V . A review of feature selection methods in medical applications[J]. Computers in Biology and Medicine, 2019:103375.Google Scholar
Robert, Tibshirani. Regression Shrinkage and Selection via the Lasso[J]. Journal of the Royal Statistical Society. Series B (Methodological), 1996.https://doi.org/10.2307/2346178Google Scholar
JL Jiménez, Dorronsoro J R. Proximal Methods for Lasso Penalties in the Cox Proportional Hazards Model.Google Scholar
Zou H, Hastie T. Addendum: "Regularization and variable selection via the elastic net'' [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005), no. 2, 301–320; MR2137327].[J]. journal of the royal statistical society, 2010, 67(5):768-768.https://doi.org/10.1111/j.1467-9868.2005.00527.xGoogle Scholar
Simon N, Friedman J H, Hastie T, Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent[J]. Journal of Statistical Software, 2011, 39(05):1-13.https://doi.org/10.18637/jss.v039.i05Google ScholarCross Ref
Zou, Hui. The Adaptive Lasso and Its Oracle Properties[J]. Publications of the American Statistical Association, 2006, 101(476):1418-1429.https://doi.org/10.1198/016214506000000735Google ScholarCross Ref
Zhang, Hao Helen, and Wenbin Lu. "Adaptive Lasso for Cox's proportional hazards model." Biometrika 94.3 (2007): 691-703.Google ScholarCross Ref
Zou H, Zhang H H. On the adaptive elastic-net with a diverging number of parameters[J]. Annals of Statistics, 2009, 37(4):1733-1751.https://doi.org/10.1214/08-AOS625Google ScholarCross Ref
Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. The Annals of Applied Statistics, 2(3), 841–860.https://doi.org/10.1214/08-AOAS169Google Scholar
Binder H. CoxBoost: Cox Models by Likelihood Based Boosting for a Single Survival Endpoint or Competing Risks, 2013. https://CRAN.R-project.org/package=CoxBoost (17 October 2019, date last accessed).Google Scholar
Hothorn T, Bühlmann P, Kneib T, et al. mboost: Model-Based Boosting, 2018. https://CRAN.R-project.org/package=mboost (17 October 2019, date last accessed).Google Scholar
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System[J]. ACM, 2016.https://doi.org/10.1145/2939672.2939785Google ScholarDigital Library
Belle V V. Support Vector Machine for Survival Analysis. 2007. 1-8. 2007Google Scholar
Evers, Ludger, and Claudia-Martina Messow. "Sparse kernel methods for high-dimensional survival data." Bioinformatics 24.14 (2008): 1632-1638.Google ScholarDigital Library
Van Belle, V., Pelckmans, K., Suykens, J.A., Van Huffel, S., “Survival SVM: a practical scalable algorithm”, In: Proc. of 16th European Symposium on Artificial Neural Networks, 89-94, 2008.Google Scholar
Ching T, Zhu X, Garmire LX (2018) Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data. PLOS Computational Biology 14(4): e1006076. https://doi.org/10.1371/journal.pcbi.1006076Google Scholar
Katzman, J.L., Shaham, U., Cloninger, A. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 18, 24 (2018). https://doi.org/10.1186/s12874-018-0482-1Google ScholarCross Ref
Lee, C., Zame, W., Yoon, J. and van der Schaar, M. 2018. DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks. Proceedings of the AAAI Conference on Artificial Intelligence. 32, 1 (Apr. 2018). DOI:https://doi.org/10.1609/aaai.v32i1.11842.Google Scholar
W. A. Knaus, F. E. Harrell, J. Lynn, L. Goldman, R. S. Phillips, A. F. Connors, N. V. Dawson, W. J. Fulkerson, R. M. Califf, N. Desbiens, , “The support prognostic model: objective estimates of survival for seriously ill hospitalized adults,” Annals of internal medicine, vol. 122, no. 3, pp. 191–203, 1995.Google ScholarCross Ref
Curtis C, Shah SP, Chin S-F, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y, The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;486(7403):346–52.Google Scholar
Hosmer DW, Lemeshow S, May S. Applied Survival Analysis: Regression Modeling of Time to Event Data. 2nd ed. New York: Wiley-Interscience; 2008.Google ScholarDigital Library
Schumacher M, Bastert G, Bojar H, Huebner K, Olschewski M, Sauerbrei W, Schmoor C, Beyerle C, Neumann R, Rauschecker H. Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. german breast cancer study group. J Clin Oncol. 1994;12(10):2086–93.Google Scholar
A. Johnson, T. Pollard, and R. Mark, “Mimic-iii clinical database (version 1.4),” https://doi.org/10.13026/C2XW26, 2016.Google Scholar
[A. Johnson, T. Pollard, O. Badawi , “eicu collaborative research database (version 2.0),” https://doi.org/10.13026/4mxk-na84, 2019.Google Scholar
C. Lee, J. Yoon, M. van der Schaar, "Dynamic-DeepHit: A Deep Learning Approach for Dynamic Survival Analysis With Competing Risks Based on Longitudinal Data," IEEE Transactions on Biomedical Engineering (TBME). 2020Google ScholarCross Ref
Chowdhury MZI, Turin TC. Variable selection strategies and its importance in clinical prediction modelling. Fam Med Com Health 2020;8:e000262. https://doi.org/10.1136/fmch-2019-000262Google Scholar
Uno H, Cai T, Pencina M J, On the C‐statistics for evaluating overall adequacy of risk prediction procedures with censored survival data[J]. Statistics in medicine, 2011, 30(10): 1105-1117.https://doi.org/10.1002/sim.4154Google Scholar
Pencina M J, D'Agostino R B. Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation[J]. Statistics in medicine, 2004, 23(13): 2109-2123.https://doi.org/10.1002/sim.1802Google Scholar
Cavanaugh J E, Neath A A. The Akaike information criterion: Background, derivation, properties, application, interpretation, and refinements[J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2019, 11(3): e1460.https://doi.org/10.1002/wics.1460Google ScholarDigital Library
Graf, Erika, "Assessment and comparison of prognostic classification schemes for survival data." Statistics in medicine 18.17‐18 (1999): 2529-2545.Google ScholarCross Ref
Haider H, Hoehn B, Davis S, Effective Ways to Build and Evaluate Individual Survival Distributions[J]. 2020(85).Google Scholar
Herrmann M, Probst P, Hornung R, Large-scale benchmark study of survival prediction methods using multi-omics data.[J]. Ludwig-Maximilians-Universität München, 2021(3). https://doi.org/10.1093/BIB/BBAA167Google Scholar
Qian J, Tanigawa Y, Du W, A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank[J]. PLoS Genetics, 2020, 16(10):e1009141. https://doi.org/10.1371/journal.pgen.1009141Google ScholarCross Ref
Qian J, Du W, Tanigawa Y, A Fast and Flexible Algorithm for Solving the Lasso in Large-scale and Ultrahigh-dimensional Problems. Cold Spring Harbor Laboratory, 2019.https://doi.org/10.1101/630079Google Scholar
Li R, Chang C, Justesen J M, Fast Lasso method for Large-scale and Ultrahigh-dimensional Cox Model with applications to UK Biobank[J]. Oxford University Press (OUP), 2020.https://doi.org/10.1101/2020.01.20.913194Google ScholarCross Ref
Bycroft C, Freeman C, Petkova D, The UK Biobank resource with deep phenotyping and genomic data[J]. Nature, 2018, 562(7726): 203-209. https://doi.org/10.1038/s41586-018-0579-zGoogle ScholarCross Ref
FRIEDMAN, J., HASTIE, T.AND TIBSHIRANI, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1–22.Google Scholar
Breheny P, Huang J. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection[J]. Annals of Applied Stats, 2011, 5(1):232-253.https://doi.org/10.1214/10-AOAS388Google ScholarCross Ref
Moncada-Torres A, Maaren M C V, Hendriks M P, Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival[J]. Scientific Reports.https://doi.org/10.1038/s41598-021-86327-7Google Scholar
V. Arya, R.K.E. Bellamy, P.-Y. Chen, A. Dhurandhar, M. Hind, S.C. Hoffman, S. Houde, Q.V. Liao, R. Luss, A. Mojsilovic, S. Mourad, P. Pedemonte, R. Raghavendra, J. Richards, P. Sattigeri, K. Shanmugam, M. Singh, K.R. Varshney, D. Wei, Y. Zhang, One explanation does not fit all: A toolkit and taxonomy of AI explainability techniques, 2019, arXiv:1909.03012.Google Scholar
R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, D. Pedreschi, A survey of methods for explaining black box models, ACM Comput. Surv. 51 (5) (2019) 93.Google ScholarDigital Library

Index Terms

Feature selection methods for high-dimensional biomedical time-to-event data: a review
1. Applied computing
  1. Life and medical sciences
    1. Bioinformatics
2. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

Machine Learning for Survival Analysis: A Survey

Survival analysis is a subfield of statistics where the goal is to analyze and model data where the outcome is the time until an event of interest occurs. One of the main challenges in this context is the presence of instances whose event outcomes ...
Read More
Survival analysis for high-dimensional, heterogeneous medical data

HighlightsWe propose random survival forests for feature extraction for survival analysis.We formulate two constraints on the neighborhood graph specific to survival analysis.We implement a comparative analysis of 16 feature extraction/selection ...
Read More
A Monte-Carlo comparison of several methods for the analysis of censored survival data with treatment and covariate effects

We present the results of a Monte-Carlo study comparing several methods used to test for treatment effect with censored survival data while adjusting for a covariate. The methods studied are based on the Cox proportional hazards model, the Mantel-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICBDT '22: Proceedings of the 5th International Conference on Big Data Technologies
September 2022
454 pages
ISBN:9781450396875
DOI:10.1145/3565291

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 December 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Feature Selection
High Dimensional Data
Machine Learning
Survival Analysis
Time-to-event Data
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 64
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Feature selection methods for high-dimensional biomedical time-to-event data: a review

ICBDT '22: Proceedings of the 5th International Conference on Big Data Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

Machine Learning for Survival Analysis: A Survey

Survival analysis for high-dimensional, heterogeneous medical data

A Monte-Carlo comparison of several methods for the analysis of censored survival data with treatment and covariate effects

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Feature selection methods for high-dimensional biomedical time-to-event data: a review

ICBDT '22: Proceedings of the 5th International Conference on Big Data Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

Machine Learning for Survival Analysis: A Survey

Survival analysis for high-dimensional, heterogeneous medical data

A Monte-Carlo comparison of several methods for the analysis of censored survival data with treatment and covariate effects

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media