Abstract
Artificial Intelligence (AI) is critical for data-driven decision making to increase resource utilization, operational performance, and service quality in various industry domains, particularly in healthcare. Using AI in healthcare operations can significantly improve treatment outcomes and enhance patient satisfaction while reducing costs. In this paper, we propose a multi-stage framework to build an AI-based decision support tool that can predict the 5-year survivability of lung cancer patients. We evaluate the proposed framework using the Surveillance, Epidemiology, and End Results dataset pertaining to the 1973–2015 period obtained from the National Institutes of Health. The first stage entails data preprocessing and target creation. The second stage applies six AI algorithms with feature selection through Particle Swarm Optimization and hyperparameter tuning with Cross-Validation. These Algorithms include Logistic Regression, Decision Trees, Random Forests (RF), Adaptive Boosting (AdaBoost), Artificial Neural Network, and Naïve Bayes. The results show that RF and AdaBoost models yield an AUC rate of 0.94 and outperform the other models. Stage 3 utilizes permutation importance to interpret the RF and AdaBoost models and applies Tree-based Augmented Naïve Bayes to gain insights regarding the interrelations among important features. The results of Stage 3 delineate that the number of lymph nodes containing metastases), the number of tumors that patients have had in their lifetime, the patient’s age, and the microscopic composition of cells rank among the topmost important features and can significantly impact patient survivability. We think this study has significant practical implications in helping physicians predict prognosis and develop treatment plans for lung cancer patients.
Similar content being viewed by others
References
Agrawal, A., Misra, S., Narayanan, R., Polepeddi, L., & Choudhary, A. (2012). Lung cancer survival prediction using ensemble data mining on SEER data 1. Scientific Programming, 20, 29–42. https://doi.org/10.3233/SPR-2012-0335.
Akaike, H. (1998). Information theory and an extension of the maximum likelihood principle (pp. 199–213). New York, NY: Springer. https://doi.org/https://doi.org/10.1007/978-1-4612-1694-0_15
Akter, S., Michael, K., Uddin, M. R., et al. (2020). Transforming business using digital innovations: the application of AI, blockchain, cloud and data analytics. Annals of Operations Research. https://doi.org/10.1007/s10479-020-03620-w.
American Association for Cancer Research. (2018). Lung cancer mortality rates among women projected to increase by over 40 percent by 2030. ScienceDaily. https://www.sciencedaily.com/releases/2018/08/180801084051.htm. Accessed November 18, 2019
American Cancer Society. (2020). Key Statistics for Lung Cancer.
American Society of Clinical Oncology. (2020). Understanding statistics used to guide prognosis and evaluate treatment.
Bawack, R., Wamba, S., & Carillo, K. (2019). Artificial intelligence in practice: Implications for information systems research. In Americas conference on information systems. Cancun. https://www.researchgate.net/publication/333853703_Artificial_Intelligence_in_Practice_Implications_for_Information_Systems_Research. Accessed March 13, 2020
Bermingham, M. L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., et al. (2015). Application of high-dimensional feature selection: Evaluation for genomic prediction in man. Scientific Reports, 5(1), 10312. https://doi.org/10.1038/srep10312.
Bianchi, F., Nuciforo, P., Vecchi, M., Bernard, L., Tizzoni, L., Marchetti, A., et al. (2007). Survival prediction of stage I lung adenocarcinomas by expression of 10 genes. Journal of Clinical Investigation, 117(11), 3436–3444. https://doi.org/10.1172/JCI32007.
Breiman, L. (2001). Documentation for R package randomForest. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
Bundred, N. J. (2001). Prognostic and predictive factors in breast cancer. Cancer Treatment Reviews, 27(3), 137–142. https://doi.org/10.1053/ctrv.2000.0207.
Cam, A., Chui, M., & Hall, B. (2018). Global AI Survey: AI proves its worth, but few scale impact. McKinsey.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2011). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953.
Chow, C. K., & Liu, C. N. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3), 462–467. https://doi.org/10.1109/TIT.1968.1054142.
Cruz, J. A., & Wishart, D. S. (2006). Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2, 117693510600200. https://doi.org/10.1177/117693510600200030.
Cutler, A., Cutler, D. R., & Stevens, J. R. (2012). Random forests BT - ensemble machine learning: Methods and applications. In Ensemble machine learning (Vol. 45, pp. 157–175). https://doi.org/https://doi.org/10.1007/978-1-4419-9326-7_5.
Dag, A., Oztekin, A., Yucel, A., Bulur, S., & Megahed, F. M. (2017). Predicting heart transplantation outcomes through data analytics. Decision Support Systems, 94, 42–52. https://doi.org/10.1016/j.dss.2016.10.005.
Dhanalakshmi, L., Ranjitha, S., & Suresh, H. N. (2016). A novel method for image processing using Particle Swarm Optimization technique. In 2016 International conference on electrical, electronics, and optimization techniques (ICEEOT) (pp. 3357–3363). IEEE. https://doi.org/https://doi.org/10.1109/ICEEOT.2016.7755326.
Fan, W., Liu, J., Zhu, S., et al. (2018). Investigating the impacting factors for the healthcare professionals to adopt artificial intelligence-based medical diagnosis support system (AIMDSS). Annals of Operations Research. https://doi.org/10.1007/s10479-018-2818-y.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/AOS/1013203451.
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2), 131–163. https://doi.org/10.1023/A:1007465528199.
Friedman, N., Geiger, D., Provan, G., Langley, P., & Smyth, P. (1997). Bayesian network classifiers * (Vol. 29). Kluwer Academic Publishers.
Fu, C., Liu, W., & Chang, W. (2018). Data-driven multiple criteria decision making for diagnosis of thyroid cancer. Annals of Operations Research. https://doi.org/10.1007/s10479-018-3093-7.
Gupta, S., Tran, T., Luo, W., Phung, D., Kennedy, R. L., Broad, A., et al. (2014). Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry. MBJ Open, 4, 1–7. https://doi.org/10.1136/bmjopen-2013.
Haykin, S. (2009). Neural networks and learning machines (3rd Editio.). London: Prentice Hall.
Heshmat, M., & Eltawil, A. (2019). Solving operational problems in outpatient chemotherapy clinics using mathematical programming and simulation. Annals of Operations Research. https://doi.org/10.1007/s10479-019-03500-y.
Hopp, W. J., Li, J., & Wang, G. (2018). Big Data and the precision medicine revolution. Production and Operations Management, 27(9), 1647–1664. https://doi.org/10.1111/poms.12891.
Hou, J., Aerts, J., den Hamer, B., van IJcken, W., den Bakker, M., Riegman, P., , et al. (2010). Gene expression-based classification of non-small cell lung carcinomas and survival prediction. PLoS ONE, 5(4), e10312. https://doi.org/10.1371/journal.pone.0010312.
Iqbal, J., Ginsburg, O., Rochon, P. A., Sun, P., & Narod, S. A. (2015). Differences in breast cancer stage at diagnosis and cancer-specific survival by race and ethnicity in the United States. JAMA, 313(2), 165. https://doi.org/10.1001/jama.2014.17322.
Islami, F., Miller, K. D., Siegel, R. L., Zheng, Z., Zhao, J., Han, X., et al. (2019). National and state estimates of lost earnings from cancer deaths in the United States. JAMA Oncology. https://doi.org/10.1001/jamaoncol.2019.1460.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 103). New York, NY: Springer. https://doi.org/10.1007/978-1-4614-7138-7.
Jayasurya, K., Fung, G., Yu, S., Dehing-Oberije, C., De Ruysscher, D., Hope, A., et al. (2010). Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy. Medical Physics, 37(4), 1401–1407. https://doi.org/10.1118/1.3352709.
Kennedy, J. (2011). Particle Swarm Optimization. In Encyclopedia of machine learning (pp. 760–766). Boston, MA: Springer. https://doi.org/https://doi.org/10.1007/978-0-387-30164-8_630
Kocheturov, A., Pardalos, P. M., & Karakitsiou, A. (2019). Massive datasets and machine learning for computational biomedicine: trends and challenges. Annals of Operations Research, 276, 5–34. https://doi.org/10.1007/s10479-018-2891-2.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Appears in the international joint conference on artificial intelligence (IJCAI) (pp. 1–7). https://doi.org/https://doi.org/10.1067/mod.2000.109031.
Kohavi, R. (1996). Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In Proceedings of the second international conference on knowledge discovery and data mining (pp. 202–207).
Kononenko, I. (2001). Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89–109. https://doi.org/10.1016/S0933-3657(01)00077-X.
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., & Fotiadis, D. I. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal: Elsevier. https://doi.org/10.1016/j.csbj.2014.11.005.
Kratz, J. R., He, J., Van Den Eeden, S. K., Zhu, Z. H., Gao, W., Pham, P. T., et al. (2012). A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: Development and international validation studies. The Lancet, 379(9818), 823–832. https://doi.org/10.1016/S0140-6736(11)61941-7.
Lin, S.-W., Ying, K.-C., Chen, S.-C., & Lee, Z.-J. (2008). Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Systems with Applications, 35(4), 1817–1824. https://doi.org/10.1016/J.ESWA.2007.08.088.
Malekpoor, H., Mishra, N., & Kumar, S. (2018). A novel TOPSIS-CBR goal programming approach to sustainable healthcare treatment. Annals of Operations Research. https://doi.org/10.1007/s10479-018-2992-y.
Malik, M. M., Abdallah, S., & Ala’raj, M. (2018). Data mining and predictive analytics applications for the delivery of healthcare services: a systematic literature review. Annals of Operations Research, 270, 287–312. https://doi.org/10.1007/s10479-016-2393-z.
National Cancer Institution. (2019a). Cancer Facts and Figures 2019. https://www.cancer.gov/types/common-cancers. Accessed November 18, 2019.
National Cancer Institution. (2019b). Financial Burden of Cancer Care | Cancer Trends Progress Report. https://progressreport.cancer.gov/after/economic_burden. Accessed November 18, 2019.
Olson, D. L., & Delen, D. (2008). Advanced data mining techniques. Springer Publishing Company, Incorporated. https://doi.org/10.1007/978-3-540-76917-0.
Parr, T., Turgutlu, K., Csiszar, C., & Howard, J. (2018). Beware Default Random Forest Importances. https://explained.ai/rf-importance/index.html. Accessed 15 July 2020
Parvin, H., Goel, P., & Gautam, N. (2012). An analytic framework to develop policies for testing, prevention, and treatment of two-stage contagious diseases. Annals of Operations Research, 196, 707–735. https://doi.org/10.1007/s10479-012-1103-8.
Pavel, P., Petr, S., & Stritecky, R. (2007). Methodology of selecting the most informative variables for decision-making problems of classification type. In Proc. of the 6th International Conference on Information and Management Sciences, (pp. 212–229). Lhasa, Tibet, China.
Pearl, J., & Judea. (1997). Probabilistic reasoning in intelligent systems : networks of plausible inference. Morgan Kaufmann Publishers.
Podolsky, M., Barchuk, A., Kuznetcov, V., Gusarova, N., Gaidukov, V., & Tarakanov, S. (2016). Evaluation of machine learning algorithm utilization for lung cancer classification based on gene expression levels. Asian Pacific Journal of Cancer Prevention, 17(2), 835–838.
Powers, D. M. W. (2011). EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION. Journal of Machine Learning Technologies, 2(1), 37–63. http://dspace.flinders.edu.au/dspace/http://www.bioinfo.in/contents.php?id=51. Accessed August 24, 2020.
Probst, P., & Bischl, B. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research (Vol. 20). http://jmlr.org/papers/v20/18-444.html. Accessed July 23, 2020.
Quantin, C., Abrahamowicz, M., Moreau, T., Bartlett, G., Mackenzie, T., Tazi, M. A., et al. (1999). Variation over time of the effects of prognostic factors in a population-based study of colon cancer: Comparison of statistical models.
Ramos, C., Cataldo, A., & Ferrer, J. (2020). Appointment and patient scheduling in chemotherapy: A case study in Chilean hospitals. Annals of Operations Research, 286, 411–439. https://doi.org/10.1007/s10479-018-3085-7.
Rampaul, R. S., Pinder, S. E., Elston, C. W., & Ellis, I. O. (2001). Prognostic and predictive factors in primary breast cancer and their role in patient management: The Nottingham breast team. European Journal of Surgical Oncology, 27(3), 229–238. https://doi.org/10.1053/ejso.2001.1114.
Sava, M. G., Vargas, L. G., May, J. H., et al. (2019). An analysis of the sensitivity and stability of patients’ preferences can lead to more appropriate medical decisions. Annals of Operations Research. https://doi.org/10.1007/s10479-018-3109-3.
Sesen, M. B., Kadir, T., Alcantara, R. B., Fox, J., & Brady, M. (2012). Survival prediction and treatment recommendation with Bayesian techniques in lung cancer. AMIA … Annual Symposium proceedings/AMIA Symposium. AMIA Symposium, 2012, 838–847.
Siegel, R. L., Miller, K. D., & Jemal, A. (2018). Cancer statistics, 2018. CA: A Cancer Journal for Clinicians, 68(1), 7–30. https://doi.org/https://doi.org/10.3322/caac.21442
Sun, Z., Wigle, D., & Yang, P. (2008). Non-overlapping and non-cell-type-specific gene expression signatures predict lung cancer survival. Journal of Clinical Oncology, 26(6), 877–833. https://doi.org/10.1200/JCO.2007.13.1516.
Tibben, W. J., Fosso Wamba, S., & Tibben, W. (2018). Exploring the potential of big data on the health care delivery Exploring the potential of big data on the health care delivery value chain (CDVC): a preliminary literature and research agenda value chain (CDVC): a preliminary literature and research agenda Exploring the potential of big data on the health care delivery value chain (CDVC): a preliminary literature and research agenda. Faculty of Engineering and Information Sciences - Papers: Part B., 2045–2054. https://ro.uow.edu.au/eispapers1/1277. Accessed 13 March 2020
Trelea, I. C. (2003). The particle swarm optimization algorithm: Convergence analysis and parameter selection. Information Processing Letters, 85(6), 317–325. https://doi.org/10.1016/S0020-0190(02)00447-7.
Välk, K., Vooder, T., Kolde, R., Reintam, M.-A., Petzold, C., Vilo, J., & Metspalu, A. (2010). Gene expression profiles of non-small cell lung cancer: Survival prediction and new biomarkers. Oncology, 79(3–4), 283–292. https://doi.org/10.1159/000322116.
Wang, L., Ni, H., Yang, R., Pappu, V., Fenn, M. B., & Pardalos, P. M. (2014). Feature selection based on meta-heuristics for biomedicine. Optimization Methods and Software, 29(4), 703–719. https://doi.org/10.1080/10556788.2013.834900.
Wit, E., Heuvel, E. van den, & Romeijn, J.-W. (2012). ‘All models are wrong...’: An introduction to model uncertainty. Statistica Neerlandica, 66(3), 217–236. https://doi.org/https://doi.org/10.1111/j.1467-9574.2012.00530.x.
Yabroff, K. R., Lund, J., Kepka, D., & Mariotto, A. (2011). Economic burden of cancer in the United States: Estimates, projections, and future research. Cancer Epidemiology Biomarkers and Prevention. https://doi.org/https://doi.org/10.1158/1055-9965.EPI-11-0650
Yao, J., Wang, S., Zhu, X., & Huang, J. (2016). Imaging biomarker discovery for lung cancer survival prediction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9901 LNCS, pp. 649–657). Springer. https://doi.org/https://doi.org/10.1007/978-3-319-46723-8_75.
Zhang, H. (2004). The optimality of Naïve Bayes. In FLAIRS2004 conference.
Zhang, L. P., Yu, H. J., & Hu, S. X. (2005). Optimal choice of parameters for particle swarm optimization. Journal of Zhejiang University: Science, 6 A(6), 528–534. https://doi.org/https://doi.org/10.1631/jzus.2005.A0528
Zhou, M., Guo, M., He, D., Wang, X., Cui, Y., Yang, H., et al. (2015). A potential signature of eight long non-coding RNAs predicts survival in patients with non-small cell lung cancer. Journal of Translational Medicine, 13(1), 231. https://doi.org/10.1186/s12967-015-0556-3.
Zhu, X., Yao, J., Luo, X., Xiao, G., Xie, Y., Gazdar, A., & Huang, J. (2016). Lung cancer survival prediction from pathological images and genetic data - An integration study. In Proceedings - International symposium on biomedical imaging (Vol. 2016-June, pp. 1173–1176). IEEE Computer Society. https://doi.org/https://doi.org/10.1109/ISBI.2016.7493475
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Johnson, M., Albizri, A. & Simsek, S. Artificial intelligence in healthcare operations to enhance treatment outcomes: a framework to predict lung cancer prognosis. Ann Oper Res 308, 275–305 (2022). https://doi.org/10.1007/s10479-020-03872-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-020-03872-6