Abstract
Oral Cancer is one of the prevailing diseases worldwide. Taking previous studies into account, observations have shown that oral cancer has a poor prognosis due to the delay in the detection of the disease. The outcomes of cancer detection and prevention are ineffective unless the mutation of genetic factors is thoroughly understood. Nevertheless, understanding and identifying genetic mutation is a challenging issue for researchers. Determining the survival time is one of the essential outcomes in cancer detection. The existing survival time-based studies introduced models that use one type of genomic data or on clinical data, which do not consider the structural and biological relationships of genomic data in cancer. However, the current work is being carried out by integrating different types of genomic and clinical data to get a better understanding of cancer characterization. The key component to understand the complex molecular mechanisms of cancer is data integration. However, the integration of multi-genomic data poses significant challenges due to the existence of high dimensions and diverse approaches in it. The focus of this study is to create an integrative model for improved prediction accuracy of clinical outcomes in the survivability of oral cancer. The proposed model initially uses dimensionality reduction and feature selection techniques for the identification and elimination of features with insignificant and meaningless values from the Head and Neck Squamous Cell Carcinoma (HNSC) dataset taken from The Cancer Genome Atlas (TCGA). The integrative model's predictive performance is then compared to the performance of the model based on clinical features only. The proposed model performed well on the training and testing sets, achieving a c-index of 0.9439 and 0.916, respectively. It can be concluded from the results that the integrative model can effectively differentiate the interaction of genomic data types and it can be beneficial for the patients having oral cancer in terms of significant diagnostics and treatment plans.
Similar content being viewed by others
Data availability
My manuscript has associated data in repository.
References
Gupta B, Bray F, Kumar N, Johnson NW (2017) Associations between oral hygiene habits, diet, tobacco and alcohol and risk of oral cancer: a case–control study from India. Cancer Epidemiol 51(March):7–14. https://doi.org/10.1016/j.canep.2017.09.003
Song B, Sunny S, Uthoff RD, Patrick S, Suresh A, Kolur T et al (2018) Automatic classification of dual-modalilty, smartphone-based oral dysplasia and malignancy images using deep learning. Biomed Opt Express 9(11):5318. https://doi.org/10.1364/boe.9.005318
Laprise C, Shahul HP, Madathil SA, Thekkepurakkal AS, Castonguay G, Varghese I et al (2016) Periodontal diseases and risk of oral cancer in Southern India: results from the HeNCe Life study. Int J Cancer 139(7):1512–1519. https://doi.org/10.1002/ijc.30201
Johnson NW, Jayasekara P, Amarasinghe AA, Hemantha K (2011) Squamous cell carcinoma and precursor lesions of the oral cavity: epidemiology and aetiology. Periodontol 57(1):19–37. https://doi.org/10.1111/j.1600-0757.2011.00401.x
Sharma D, Goel N, Garg VK (2022) Predicting survivability in oral cancer patients. In: Mathur G, Bundele M, Lalwani M, Paprzycki M (eds) Proceedings of 2nd international conference on artificial intelligence: advances and applications. Algorithms for intelligent systems. Springer, Singapore
The American Cancer Society (2020) Oral cavity and oropharyngeal cancer causes, risk factors, and prevention risk factors for oral cavity and oropharyngeal cancers, 1–12. cancer.org|1.800.227.2345
Shams WK, Htike ZZ (2017) Oral cancer prediction using gene expression profiling and machine learnSing. Int J Appl Eng Res 12(15):4893–4898
Kann BH, Aneja S, Loganadane GV, Kelly JR, Smith SM, Decker RH et al (2018) Pretreatment identification of head and neck cancer nodal metastasis and extranodal extension using deep learning neural networks. Sci Rep 8(1):1–11. https://doi.org/10.1038/s41598-018-32441-y
Chang S-W, Sameem Abdul-Kareem AFM, R. B. Z. (2013) Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods. Am J Surg. https://doi.org/10.1186/1471-2105-14-170
Tseng, W. T., Chiang, W. F., Liu, S. Y., Roan, J., & Lin, C. N. (2015). The Application of Data Mining Techniques to Oral Cancer Prognosis. Journal of Medical Systems, 39(5). https://doi.org/10.1007/s10916-015-0241-3
Kim K-Y, Li S-J, Cha I-H (2010) Nomogram for predicting survival for oral squamous cell carcinoma. Genom Informat 8(4):212–218. https://doi.org/10.5808/gi.2010.8.4.212
Sharma N, Om H (2013) Data mining models for predicting oral cancer survivability. Netw Model Anal Health Informat Bioinformat 2(4):285–295. https://doi.org/10.1007/s13721-013-0045-7
Zhang ZL, Zhao LJ, Chai L, Zhou SH, Wang F, Wei Y et al (2017) Seven LncRNA-mRNA based risk score predicts the survival of head and neck squamous cell carcinoma. Sci Rep 7(1):1–9. https://doi.org/10.1038/s41598-017-00252-2
Shen S, Wang G, Shi Q, Zhang R, Zhao Y, Wei Y, et al. (2017) Seven-CpG-based prognostic signature coupled with gene expression predicts survival of oral squamous cell carcinoma. Clin Epigenet. https://doi.org/10.1186/s13148-017-0392-9
Fakhry C, Zhang Q, Nguyen-Tân PF, Rosenthal DI, Weber RS, Lambert L et al (2017) Development and validation of nomograms predictive of overall and progression-free survival in patients with oropharyngeal cancer. J Clin Oncol 35(36):4057–4065. https://doi.org/10.1200/JCO.2016.72.0748
Fang J, Li X, Ma D, Liu X, Chen Y, Wang Y et al (2017) Prognostic significance of tumor infiltrating immune cells in oral squamous cell carcinoma. BMC Cancer 17(1):375. https://doi.org/10.1186/s12885-017-3317-2
Karadaghy OA, Shew M, New J, Bur AM (2019) Development and assessment of a machine learning model to help predict survival among patients with oral squamous cell carcinoma. JAMA Otolaryngol—Head Neck Surg 145(12):1115–1120. https://doi.org/10.1001/jamaoto.2019.0981
Kim DW, Lee S, Kwon S, Nam W, Cha IH, Kim HJ (2019) Deep learning-based survival prediction of oral cancer patients. Sci Rep 9(1):1–10. https://doi.org/10.1038/s41598-019-43372-7
Alkhadar H, Macluskey M, White S, Ellis I, Gardner A (2021) Comparison of machine learning algorithms for the prediction of five-year survival in oral squamous cell carcinoma. J Oral Pathol Med 50(4):378–384. https://doi.org/10.1111/jop.13135
Lu Z, Yan W, Liang J, Yu M, Liu J, Hao J et al (2020) Nomogram based on systemic immune-inflammation index to predict survival of tongue cancer patients who underwent cervical dissection. Front Oncol 10:1–11. https://doi.org/10.3389/fonc.2020.00341
Chen Q, Fan Y, Li Y, Wang J, Chen L, Lin J et al (2020) A novel nutritional risk score and prognosis of oral cancer patients: a prospective study. Oral Dis. https://doi.org/10.1111/odi.13733
Kim Y, Kang JW, Kang J, Kwon EJ, Ha M, Kim YK, et al. (2021) Novel deep learning-based survival prediction for oral cancer by analyzing tumor-infiltrating lymphocyte profiles through CIBERSORT. OncoImmunology. https://doi.org/10.1080/2162402X.2021.1904573
Alabi RO, Mäkitie AA, Pirinen M, Elmusrati M, Leivo I, Almangush A (2021) Comparison of nomogram with machine learning techniques for prediction of overall survival in patients with tongue cancer. Int J Med Inform. https://doi.org/10.1016/j.ijmedinf.2020.104313
Wang J, Chen X, Tian Y, Zhu G, Qin Y, Chen X et al (2020) Six-gene signature for predicting survival in patients with head and neck squamous cell carcinoma. Aging (Albany NY) 12(1):767
Kumar N (2019) Data wrangling: removing null values from dataset in python using pandas library. http://theprofessionalspoint.blogspot.com/2019/03/data-wrangling-removing-null-values.html
Brownlee J (2020) How to Use StandardScaler and MinMaxScaler transforms in python. https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/
Phillips T (2008) The role of methylation in gene expression. Nat Edu 1(1):116. https://www.nature.com/scitable/topicpage/the-role-of-methylation-in-gene-expression-1070/
Network CGA (2015) Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 517(7536):576
Esteves L, Caramelo F, Ribeiro IP, Carreira IM, de Melo JB (2020) Probability distribution of copy number alterations along the genome: an algorithm to distinguish different tumour profiles. Sci Rep 10(1):1–14. https://doi.org/10.1038/s41598-020-71859-1
Brody LC. (n.d.). Messenger RNA. NIH National Human Genome Research Institute. https://www.genome.gov/genetics-glossary/messenger-rna
Sharma N, Saroha K (2015) A novel dimensionality reduction method for cancer dataset using PCA and feature ranking. In: 2015 international conference on advances in computing, communications and informatics, ICACCI 2015, pp 2261–2264. https://doi.org/10.1109/ICACCI.2015.7275954
van der Maaten LJP, Postma EO, van den Herik HJ (2009) Dimensionality reduction: a comparative review. Technical Report TiCC TR 2009-005
Abhigyan (2020) Importance of dimensionality reduction. https://medium.com/analytics-vidhya/importance-of-dimensionality-reduction-d6a4c7289b92
Salih Hasan BM, Abdulazeez AM (2021) A review of principal component analysis algorithm for dimensionality reduction. J Soft Comput Data Min 2(1):20–30. https://doi.org/10.30880/jscdm.2021.02.01.003
Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC (2016) Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform 17(4):628–641. https://doi.org/10.1093/bib/bbv108 (Epub 2016 Mar 11. PMID: 26969681; PMCID: PMC4945831)
Lateef Z (2020) All you need to know about principal component analysis (PCA). https://www.edureka.co/blog/principal-component-analysis/
Jaadi, Z. (2021). A Step-by-Step Explanation of Principal Component Analysis (PCA). Retrieved from https://builtin.com/data-science/step-step-explanation-principal-component-analysis
How To Use Scree Plot In Python To Explain PCA Variance (2021) pythonpool. Retrieved from https://www.pythonpool.com/scree-plot-python/
Wicklin R (2019) How to interpret graphs in a principal component analysis. https://blogs.sas.com/content/iml/2019/11/04/interpret-graphs-principal-components.html
Mele B, Altarelli G (1993) Lepton spectra as a measure of b quark polarization at LEP. Phys Lett B 299(3–4):345–350. https://doi.org/10.1016/0370-2693(93)90272-J
Brownlee J (2018) A gentle introduction to activation regularization in deep learning. https://machinelearningmastery.com/activation-regularization-for-reducing-generalization-error-in-deep-learning-neural-networks/
Brownlee J (2018). How to reduce generalization error with activity regularization in Keras. https://machinelearningmastery.com/how-to-reduce-generalization-error-in-deep-neural-networks-with-activity-regularization-in-keras/
Manuscript A (2012) NIH Public Access 30(10):1105–1117. https://doi.org/10.1002/sim.4154.On
Glen S (2016) C-statistic: definition, examples, weighting and significance. https://www.statisticshowto.com/c-statistic/
Liang M, Li Z, Chen T, Zeng J (2015) Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Trans Comput Biol Bioinf 12(4):928–937. https://doi.org/10.1109/TCBB.2014.2377729
Chaudhary K, Poirion OB, Lu L, Garmire LX (2018) Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 24(6):1248–1259. https://doi.org/10.1158/1078-0432.CCR-17-0853
Poirion OB, Chaudhary K, Garmire LX (2018) Deep learning data integration for better risk stratification models of bladder cancer. AMIA Joint Summits Transl Sci Proc 2017(Iv):197–206. http://www.ncbi.nlm.nih.gov/pubmed/29888072. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5961799
Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M et al (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11(3):333–337. https://doi.org/10.1038/nmeth.2810
Johnson SR (2018) Advanced epidemiologic methods for the study of rheumatic and musculoskeletal diseases. In: Rheumatic disease clinics of North America, vol 44. https://doi.org/10.1016/j.rdc.2018.02.001
Prabhu (2018) Understanding hyperparameters and its optimisation techniques. https://towardsdatascience.com/understanding-hyperparameters-and-its-optimisation-techniques-f0debba07568
Brownlee J (2018) Difference between a batch and an epoch in a neural network. https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sharma, D., Deepali, Garg, V.K. et al. A deep learning-based integrative model for survival time prediction of head and neck squamous cell carcinoma patients. Neural Comput & Applic 34, 21353–21365 (2022). https://doi.org/10.1007/s00521-022-07615-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07615-5