Abstract
The accurate prediction of postoperative survival time of patients with Barcelona Clinic Liver Cancer (BCLC) stage B hepatocellular carcinoma (HCC) is important for postoperative health care. Survival analysis is a common method used to predict the occurrence time of events of interest in the medical field. At present, the mainstream survival analysis models, such as the Cox proportional risk model, should make strict assumptions about the potential random process to solve the censored data, thus potentially limiting their application in clinical practice. In this paper, we propose a novel deep multitask survival model (DMSM) to analyze HCC survival data. Specifically, DMSM transforms the traditional survival time prediction problem of patients with HCC into a survival probability prediction problem at multiple time points and applies entropy regularization and ranking loss to optimize a multitask neural network. Compared with the traditional methods of deleting censored data and strong hypothesis, DMSM makes full use of all the information in the censored data but does not need to make any assumption. In addition, we identify the risk factors affecting the prognosis of patients with HCC and visualize the importance of ranking these factors. On the basis of the analysis of a real dataset of patients with BCLC stage B HCC, experimental results on three different validation datasets show that the DMSM achieves competitive performance with concordance index of 0.779, 0.727, and 0.780 and integrated Brier score (IBS) of 0.172, 0.138, and 0.135, respectively. Our DMSM has a comparatively small standard deviation (0.002, 0.002, and 0.003) for IBS of bootstrapping 100 times. The DMSM we proposed can be utilized as an effective survival analysis model and provide an important means for the accurate prediction of postoperative survival time of patients with BCLC stage B HCC.
Similar content being viewed by others
Data Availability
Shen, Lujun et al. (2019), Data from: Dynamically prognosticating patients with hepatocellular carcinoma through survival paths mapping based on time-series data, Dryad, Dataset, https://doi.org/10.5061/dryad.pd44k8r
References
Alejandro F , R María, Jordi B. Hepatocellular carcinoma. Lancet (London, England), 2018;391(10127):1301-1314
Bray F, Ferlay J, Soerjomataram I et al (2018) Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J Clin 68(6):394–424
Sung H, Ferlay J, Siegel RL et al (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J Clin 71(3):209–249
European Association for The Study of the Liver (2018) EASL clinical practice guidelines for the management of patients with decompensated cirrhosis. J Hepatol 69(2):406–460
Lencioni R, de Baere T, Soulen MC et al (2016) Lipiodol transarterial chemoembolization for hepatocellular carcinoma: a systematic review of efficacy and safety data. Hepatology 64(1):106–116
Marrero JA, Kulik LM, Sirlin CB et al (2019) Diagnosis, staging, and management of hepatocellular carcinoma: 2018 practice guidance by the American Association for the Study of Liver Diseases. Clin Liver Dis 13(1):1
Tsilimigras DI, Bagante F, Sahara K et al (2019) Prognosis after resection of Barcelona clinic liver cancer (BCLC) stage 0, A, and B hepatocellular carcinoma: a comprehensive assessment of the current BCLC classification. Ann Surg Oncol 26(11):3693–3700
Burrel M, Reig M, Forner A et al (2012) Survival of patients with hepatocellular carcinoma treated by transarterial chemoembolisation (TACE) using drug eluting beads. Implications for clinical practice and trial design. J Hepatol 56(6):1330–1335
Wang P, Li Y, Reddy CK (2019) Machine learning for survival analysis: a survey. ACM Comput Surveys (CSUR) 51(6):1–36
Lee ET, Wang J (2003) Statistical methods for survival data analysis[M]. John Wiley & Sons
Moreno-Betancur M, Sadaoui H, Piffaretti C et al (2017) Survival analysis with multiple causes of death. Epidemiology 28(1):12–19
Cox DR (1972) Regression models and life-tables. J Royal Stat Soc: Series B (Methodological) 34(2):187–202
Cox DR (1975) Partial likelihood. Biometrika 62(2):269–276
Simon N, Friedman J, Hastie T et al (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1
Lawless JF (2014) Parametric models in survival analysis. Statistics Reference Online, Wiley StatsRef
Mittal S, Madigan D, Cheng JQ et al (2013) Large-scale parametric survival analysis. Stat Med 32(23):3955–3971
Martinsson E (2017) WTTE-RNN: Weibull time to event recurrent neural network a model for sequential prediction of time-to-event in the case of discrete or continuous censored data, recurrent events or time-varying covariates. Gothenburg: Chalmers University of Technology University of Gothenburg
Singh R, Mukhopadhyay K (2011) Survival analysis in clinical trials: basics and must know areas[J]. Perspect Clin Res 2(4):145
Yu CN, Greiner R, Lin HC et al (2011) Learning patient-specific cancer survival distributions as a sequence of dependent regressors. Adv Neural Inf Process Syst 24:1845–1853
Ranganath R, Perotte A, Elhadad N et al (2016) Deep survival analysis[C]//Machine Learning for Healthcare Conference. PMLR:101–114
Katzman JL, Shaham U, Cloninger A et al (2016) Deep survival: a deep Cox proportional hazards network. BMC Med Res Methodol 1050:1–10
Luck M, Sylvain T, Cardinal H et al (2017) Deep learning for patient-specific kidney graft survival analysis[J]. arXiv preprint arXiv:1705.10245
Yousefi S, Amrollahi F, Amgad M et al (2017) Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models[J]. Sci Rep 7(1):1–11
Martinsson E (2016) Wtte-rnn: Weibull time to event recurrent neural network. Chalmers University of Technology & University of Gothenburg
Lin H, Zeng L, Yang J et al (2021) A machine learning-based model to predict survival after transarterial chemoembolization for BCLC stage B hepatocellular carcinoma. Front Oncol 11:608260
Roy B, Stepišnik T, TP ALS et al (2022) Survival analysis with semi-supervised predictive clustering trees. Comp Biol Med 141:105001
Ishwaran H, Kogalur UB, Blackstone EH et al (2008) Random survival forests[J]. The annals of applied statistics 2(3):841–860
Kretowska M (2019) Oblique survival trees in discrete event time analysis[J]. IEEE J Biomed Health Inform 24(1):247–258
Adele C et al (2004) Random forests. Mach. Learn 45:157–176
Książek W, Turza F, Pławiak P (2022) NCA-GA-SVM: a new two-level feature selection method based on neighborhood component analysis and genetic algorithm in hepatocellular carcinoma fatality prognosis[J]. Int J Num Method Biomed Eng 38(6):e3599
Shivaswamy PK, Chu W, Jansche M (2007) A support vector approach to censored targets. Seventh IEEE Int Conf Data Mining (ICDM) 2007:655–660
Ali MAS, Orban R, Rajammal Ramasamy R et al (2022) A novel method for survival prediction of hepatocellular carcinoma using feature-selection techniques. Appl Sci 12(13):6427
Noh B, Park YM, Kwon Y et al (2022) Machine learning-based survival rate prediction of Korean hepatocellular carcinoma patients using multi-center data. BMC Gastroenterol 22(1):85
Santos MS, Abreu PH, García-Laencina PJ et al (2015) A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J Biomed Inf 58:49–59
Yun S, Du B, Mao Y (2021) Robust deep multi-task learning framework for cancer survival analysis. Int Joint Conf Neural Netw (IJCNN):1–8
Zhang L, Dong D, Liu Z et al (2021) Joint multi-task learning for survival prediction of gastric cancer Patients using CT images IEEE 18th IEEE. In: Int Symp Biomed Imag (ISBI), pp 895–898
Gu W, Zhang Z, Xie X et al (2019) An improved muti-task learning algorithm for analyzing cancer survival data. IEEE/ACM Transact Comput Biol Bioinform 18(2):500–511
Viganò A, Dorgan M, Buckingham J et al (2000) Survival prediction in terminal cancer patients: a systematic review of the medical literature. Palliat Med 14(5):363–374
Kourou K, Exarchos TP, Exarchos KP et al (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481
Faraggi D, Simon R (1995) A neural network model for survival data. Stat Med 14(1):73–82
Zhu X, Yao J, Huang J (2016) Deep convolutional neural network for survival analysis with pathological images IEEE. Int Conf Bioinform Biomed, IEEE:544–547
Katzman JL, Shaham U, Cloninger A et al (2018) DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 18(1):1–12
Chen L, Shao K, Long X et al (2020) Multi-task regression learning for survival analysis via prior information guided transductive matrix completion. Front Comput Sci 14(5):1–14
Bolondi L, Burroughs A, Dufour JF et al (2012) Heterogeneity of patients with intermediate (BCLC B) hepatocellular carcinoma: proposal for a subclassification to facilitate treatment decisions Seminars in liver disease. Thieme Medical Publishers 32(04):348–359
Kadalayil L, Benini R, Pallan L et al (2013) A simple prognostic scoring system for patients receiving transarterial embolisation for hepatocellular cancer. Ann Oncol 24(10):2565–2570
Lee DH (2013) Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks Workshop on challenges in representation learning. ICML 3(2):896
Shen L, Zeng Q, Guo P et al (2018) Dynamically prognosticating patients with hepatocellular carcinoma through survival paths mapping based on time-series data. Nat Commun 9(1):1–10
Tsoris A, Marlar CA (2020) Use of the Child Pugh Score in Liver Disease; StatPearls: Treasure Island. FL, USA
Fotso S (2018) Deep neural networks for survival analysis based on a multi-task framework. arXiv preprint arXiv:1801.05512
Lee C, Zame W, Yoon J et al (2018) Deephit: a deep learning approach to survival analysis with competing risks. Proc AAAI Conf Artif Intell 32(1)
Kvamme H, Borgan Ø (2019) Continuous and discrete-time survival prediction with neural networks. Lifetime Data Anal 1910:06724
Gensheimer MF, Narasimhan B (2019) A scalable discrete-time survival model for neural networks. PeerJ 7:e6257
Zhong BY, Yan ZP, Sun JH et al (2021) Random survival forests to predict disease control for hepatocellular carcinoma treated with transarterial chemoembolization combined with sorafenib. Front Mol Biosci:437
Xie J, Liu C (2005) Adjusted Kaplan–Meier estimator and log-rank test with inverse probability of treatment weighting for survival data. Stat Med 24(20):3089–3110
Acknowledgements
The authors would like to thank the College of Computer Science, Chongqing University, for providing the computing resources for this study.
Author information
Authors and Affiliations
Contributions
All authors collected, extracted, and analyzed the data and wrote the article. GH and HJL conceived and designed this study. HJL and SG provided critical revisions to the manuscript. All authors have approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethical Approval
This declaration is “not applicable.”
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
-
A.
Data Preprocessing
Three cohorts (i.e., derivation, internal, and multicenter testing cohorts) have missing values. The number and percentage of HCC patients with missing values in each variable of datasets are provided in Table 4.
Missing values are a common problem in clinical medical data. Therefore, to ensure data quality, we must take appropriate measures to reduce data problems as much as possible. Specifically, our HCC data include continuous variables and categorical variables. For continuous features, we impute the median value of each feature and fill in the missing value, and for categorical features, we completed missing values by replacing values by the most common occurrence. On the basis of the recommendations of HCC clinical medical experts, we used the inclusion and exclusion criteria mentioned above, excluded some abnormal feature values, and removed nine features (lymph node metastasis, distant metastasis, invasion of vena cava or atrium, invasion of hepatic veins, branch of invaded portal vein, portal vein invasion, PS score, hepatic encephalopathy, and treatment response in last time slice). Therefore, the last 21 features (include Child-Pugh) related to demographic, clinical, and biological features of patients were filtered into the model.
Then, we use the standard score method for continuous features and one-hot encoding for categorical variables to perform data normalization. For feature X, the standard score method’s output is
μ is the mean, and δ is the standard deviation.
-
2
Hyperparameter Tuning for the Baselines
For each experiment, we use fivefold cross-validation to maximize the concordance index (C-index) on the validation dataset and obtain the optimal hyperparameters of the model. In our DMSM, hyperparameters include batch size, learning rate, nodes, λ, β, and activation. The hyperparameters tuned for DMSM and other comparative models are presented below, and the optimal selection of hyperparameters is based on grid search.
DMSM: The number of nodes in hidden layer is selected from [5, 10, 15, 20, 25, 30]; the activation function is selected from [RELU, Sigmoid, Tanh]; the batch size is selected from [32, 64, 128, 256, 512]; the size of the learning rate is selected from [0.0001, 0.001, 0.01, 0.1]; the size of λ is selected from [0.01, 0.05, 0.1, 0.5, 1.0]; the size of β is selected from [0.001, 0.005, 0.01, 0.05, 0.1, 0.5]. We perform fivefold cross-validation, randomly select parameters from a given grid, and finally select the set of hyperparameters with the highest C-index. The hyperparameters for DMSM method and two-baseline models are shown in Tables 5 and 6.
RSF: We followed the experimental settings provided in the RSF GitHub repository.Footnote 1
The size of Max_depth is selected from 5, 10, 15, and 20; the max_features is selected from [“sqrt,” “int,” “float,” “log2”]; the sample_size_pct is selected from [0.55, 0.60, 0.65, 0.70]; the min_node_size is selected from [10, 20, 30, 40, 50]; the number of num_trees is selected from [100, 200, 300, 400, 500]. The hyperparameters are shown in Table 7.
DeepHit: We followed the experimental settings provided in the DeepHit GitHub repository.Footnote 2 The size of alpha is selected from [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]; the size of sigma is selected from [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]; the num_nodes is selected from [[4, 8], [4, 16], [8, 16], [8, 32], [16, 32], [32, 32].The selection interval of the learning rate and batch size is the same as above. The hyperparameters are shown in Table 8.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, G., Liu, H., Gong, S. et al. Survival Prediction After Transarterial Chemoembolization for Hepatocellular Carcinoma: a Deep Multitask Survival Analysis Approach. J Healthc Inform Res 7, 332–358 (2023). https://doi.org/10.1007/s41666-023-00139-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41666-023-00139-0