Skip to main content
Log in

Efficient management of pulmonary embolism diagnosis using a two-step interconnected machine learning model based on electronic health records data

  • Research
  • Published:
Health Information Science and Systems Aims and scope Submit manuscript

Abstract

Pulmonary Embolism (PE) is a life-threatening clinical disease with no specific clinical symptoms and Computed Tomography Angiography (CTA) is used for diagnosis. Clinical decision support scoring systems like Wells and rGeneva based on PE risk factors have been developed to estimate the pre-test probability but are underused, leading to continuous overuse of CTA imaging. This diagnostic study aimed to propose a novel approach for efficient management of PE diagnosis using a two-step interconnected machine learning framework directly by analyzing patients' Electronic Health Records data. First, we performed feature importance analysis according to the result of LightGBM superiority for PE prediction, then four state-of-the-art machine learning methods were applied for PE prediction based on the feature importance results, enabling swift and accurate pre-test diagnosis. Throughout the study patients' data from different departments were collected from Sina educational hospital, affiliated with the Tehran University of medical sciences in Iran. Generally, the Ridge classification method obtained the best performance with an F1 score of 0.96. Extensive experimental findings showed the effectiveness and simplicity of this diagnostic process of PE in comparison with the existing scoring systems. The main strength of this approach centered on PE disease management procedures, which would reduce avoidable invasive CTA imaging and be applied as a primary prognosis of PE, hence assisting the healthcare system, clinicians, and patients by reducing costs and promoting treatment quality and patient satisfaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Banerjee I, Sofela M, Yang J, et al. Development and performance of the pulmonary embolism result forecast model (PERFORM) for computed tomography clinical decision support. JAMA Netw Open. 2019. https://doi.org/10.1001/jamanetworkopen.2019.8719.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Ma H, Sheng W, Li J, et al. A novel hierarchical machine learning model for hospital-acquired venous thromboembolism risk assessment among multiple-departments. J Biomed Inform. 2021;122: 103892. https://doi.org/10.1016/j.jbi.2021.103892.

    Article  PubMed  Google Scholar 

  3. Cano-Espinosa C, Cazorla M, González G. Computer aided detection of pulmonary embolism using multi-slice multi-axial segmentation. Appl Sci. 2020. https://doi.org/10.3390/APP10082945.

    Article  Google Scholar 

  4. Huang SC, Kothari T, Banerjee I, et al. PENet—a scalable deep-learning model for automated diagnosis of pulmonary embolism using volumetric CT imaging. NPJ Digit Med. 2020. https://doi.org/10.1038/s41746-020-0266-y.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Shi L, Rajan D, Abedin S, et al (2020) Automatic diagnosis of pulmonary embolism using an attention-guided framework: a large-scale study. In Medical imaging with deep learning, pp 743–754. PMLR

  6. Shi L, Dehghan E (2020) Automatic diagnosis of pulmonary embolism using an attention-guided framework : a large-scale study. 1–12

  7. Kiourt C, Feretzakis G, Dalamarinis K, Kalles D (2021) Pulmonary embolism identification in computerized tomography pulmonary angiography scans with deep learning technologies in COVID-19 patients. arXiv:2105.11187

  8. Valle C, Bonaffini PA, Dal Corso M, et al. Association between pulmonary embolism and COVID-19 severe pneumonia: experience from two centers in the core of the infection Italian peak. Eur J Radiol. 2021. https://doi.org/10.1016/j.ejrad.2021.109613.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Sakr Y, Giovini M, Leone M, et al. Pulmonary embolism in patients with coronavirus disease-2019 (COVID-19) pneumonia: a narrative review. Ann Intensive Care. 2020;10:1–13.

    Article  Google Scholar 

  10. Thachil R, Nagraj S, Kharawala A, Sokol SI. Pulmonary embolism in women: a systematic review of the current literature. J Cardiovasc Dev Dis. 2022. https://doi.org/10.3390/jcdd9080234.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Morís DI, de Moura Ramos JJ, Buján JN, Hortas MO. Data augmentation approaches using cycle-consistent adversarial networks for improving COVID-19 screening in portable chest X-ray images. Expert Syst Appl. 2021;185: 115681. https://doi.org/10.1016/j.eswa.2021.115681.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Kiourt C, Feretzakis G, Dalamarinis K, et al (2021) Pulmonary embolism identification in computerized tomography pulmonary angiography scans with deep learning technologies in COVID-19 patients. arXiv:2105.11187

  13. Mountain D, Keijzers G, Chu K, et al. Correction: RESPECT-ED: rates of pulmonary emboli (PE) and sub-segmental PE with modern computed tomographic pulmonary angiograms in emergency departments: a multi-center observational study finds significant yield variation, uncorrelated with use or smal. PLoS ONE. 2017;12:2015–8. https://doi.org/10.1371/journal.pone.0184219.

    Article  Google Scholar 

  14. Kocher KE, Meurer WJ, Fazel R, Scott PA. National trends in use of computed tomography in the emergency department. YMEM. 2011;58:452-462.e3. https://doi.org/10.1016/j.annemergmed.2011.05.020.

    Article  Google Scholar 

  15. Wang RC, Bent S, Weber E, et al. The impact of clinical decision rules on computed tomography use and yield for pulmonary embolism: a systematic review and meta-analysis. Ann Emerg Med. 2016;67:693-701.e3. https://doi.org/10.1016/j.annemergmed.2015.11.005.

    Article  PubMed  Google Scholar 

  16. Shahid O, Nasajpour M, Pouriyeh S, et al. Machine learning research towards combating COVID-19: virus detection, spread prevention, and medical assistance. J Biomed Inform. 2021;117: 103751. https://doi.org/10.1016/j.jbi.2021.103751.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Rucco M, Rodrigues DS, Merelli E, et al. Neural hypernetwork approach for pulmonary embolism diagnosis. BMC Res Notes. 2015. https://doi.org/10.1186/s13104-015-1554-5.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Puaschunder JM. The potential for artificial intelligence in healthcare. SSRN Electron J. 2020;6:94–8. https://doi.org/10.2139/ssrn.3525037.

    Article  Google Scholar 

  19. Rysavy M. Evidence-based medicine: a science of uncertainty and an art of probability. Virtual Mentor. 2013;15:4–8. https://doi.org/10.1001/virtualmentor.2013.15.1.fred1-1301.

    Article  PubMed  Google Scholar 

  20. Menegotto AB, Becker CDL, Cazella SC. Computer-aided diagnosis of hepatocellular carcinoma fusing imaging and structured health data. Heal Inf Sci Syst. 2021. https://doi.org/10.1007/s13755-021-00151-x.

    Article  Google Scholar 

  21. Wu C, Guo S, Hong Y, et al. Discrimination and conversion prediction of mild cognitive impairment using convolutional neural networks. Quant Imaging Med Surg. 2018;8:992–1003.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Fisher CK, Smith AM, Walsh JR, et al. Machine learning for comprehensive forecasting of Alzheimer’s disease progression. Sci Rep. 2019. https://doi.org/10.1038/s41598-019-49656-2.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Arco JE, Ramírez J, Górriz JM, Ruz M. Data fusion based on Searchlight analysis for the prediction of Alzheimer’s disease. Expert Syst Appl. 2021. https://doi.org/10.1016/j.eswa.2021.115549.

    Article  Google Scholar 

  24. Thabtah F, Spencer R, Ye Y. The correlation of everyday cognition test scores and the progression of Alzheimer’s disease: a data analytics study. Heal Inf Sci Syst. 2020. https://doi.org/10.1007/s13755-020-00114-8.

    Article  Google Scholar 

  25. Ryan L, Mataraso S, Siefkas A, et al. A machine learning approach to predict deep venous thrombosis among hospitalized patients. Clin Appl Thromb. 2021. https://doi.org/10.1177/1076029621991185.

    Article  Google Scholar 

  26. Wiener RS, Gould MK, Arenberg DA, et al. An official American Thoracic Society/American College of Chest Physicians policy statement: implementation of low-dose computed tomography lung cancer screening programs in clinical practice. Am J Respir Crit Care Med. 2015;192:881–91. https://doi.org/10.1164/rccm.201508-1671ST.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Danzi GB, Loffi M, Galeazzi G, Gherbesi E. Acute pulmonary embolism and COVID-19 pneumonia: a random association? Eur Heart J. 2020;41:1858. https://doi.org/10.1093/eurheartj/ehaa254.

    Article  CAS  PubMed  Google Scholar 

  28. Sadik F, Dastider AG, Subah MR, et al. A dual-stage deep convolutional neural network for automatic diagnosis of COVID-19 and pneumonia from chest CT images ✩. Comput Biol Med. 2022;149: 105806. https://doi.org/10.1016/j.compbiomed.2022.105806.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Feki I, Ammar S, Kessentini Y, Muhammad K. Federated learning for COVID-19 screening from Chest X-ray images. Appl Soft Comput. 2021;106: 107330. https://doi.org/10.1016/j.asoc.2021.107330.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Shorten C, Khoshgoftaar TM, Furht B. Deep learning applications for COVID-19. J Big Data. 2021. https://doi.org/10.1186/s40537-020-00392-9.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Goel K, Sindhgatta R, Kalra S, et al. The effect of machine learning explanations on user trust for automated diagnosis of COVID-19. Comput Biol Med. 2022;146: 105587. https://doi.org/10.1016/j.compbiomed.2022.105587.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Bertsimas D, Borenstein A, Mingardi L, et al. Personalized prescription of ACEI/ARBs for hypertensive COVID-19 patients. Health Care Manag Sci. 2021;24:339–55. https://doi.org/10.1007/s10729-021-09545-5.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Liu Y, Qin J, Fan Y, et al. Estimation of infection density and epidemic size of COVID - 19 using the back—calculation algorithm. Heal Inf Sci Syst. 2020. https://doi.org/10.1007/s13755-020-00122-8.

    Article  Google Scholar 

  34. Yang Y, Li Y, Chen R, et al. Risk prediction of renal failure for chronic disease population based on electronic health record big data. Big Data Res. 2021. https://doi.org/10.1016/j.bdr.2021.100234.

    Article  Google Scholar 

  35. Bertsimas D, Orfanoudaki A, Weiner RB. Personalized treatment for coronary artery disease patients: a machine learning approach. Health Care Manag Sci. 2020;23:482–506. https://doi.org/10.1007/s10729-020-09522-4.

    Article  PubMed  Google Scholar 

  36. Schmuelling L, Franzeck FC, Nickel CH, et al. Deep learning-based automated detection of pulmonary embolism on CT pulmonary angiograms: no significant effects on report communication times and patient turnaround in the emergency department nine months after technical implementation. Eur J Radiol. 2021;141: 109816. https://doi.org/10.1016/j.ejrad.2021.109816.

    Article  PubMed  Google Scholar 

  37. Soffer S, Klang E, Shimon O, et al. Deep learning for pulmonary embolism detection on computed tomography pulmonary angiogram: a systematic review and meta-analysis. Sci Rep. 2021;11:1–8. https://doi.org/10.1038/s41598-021-95249-3.

    Article  ADS  CAS  Google Scholar 

  38. Serpen G, Tekkedil DK, Orra M. A knowledge-based artificial neural network classifier for pulmonary embolism diagnosis. Comput Biol Med. 2008;38:204–20. https://doi.org/10.1016/j.compbiomed.2007.10.001.

    Article  CAS  PubMed  Google Scholar 

  39. Manshad A, Akbilgic O, Brailovsky Y, et al. Machine learning-based prediction of 30-day all-cause mortality in patients hospitalized with acute pulmonary embolism. Chest. 2020;158:A2213–4. https://doi.org/10.1016/j.chest.2020.08.1892.

    Article  Google Scholar 

  40. Jenab Y, Hosseini K, Esmaeili Z, et al. Prediction of in-hospital adverse clinical outcomes in patients with pulmonary thromboembolism, machine learning based models. Front Cardiovasc Med. 2023;10:1–10. https://doi.org/10.3389/fcvm.2023.1087702.

    Article  CAS  Google Scholar 

  41. Arbet J, Brokamp C, Meinzen-derr J, et al. Lessons and tips for designing a machine learning study using EHR data. J Clin Transl Sci. 2020. https://doi.org/10.1017/cts.2020.513.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Ma L, Zhang C, Wang Y, et al (2020) ConCare: personalized clinical feature embedding via capturing the healthcare context. In: AAAI 2020—34th AAAI conference on artificial intelligence, pp. 833–40. https://doi.org/10.1609/aaai.v34i01.5428

  43. Leontjeva A, Kuzovkin I (2016) Combining static and dynamic features for multivariate sequence classification. In: Proceedings of 3rd IEEE international conference on data science and advanced analytics DSAA 2016, pp. 21–30. https://doi.org/10.1109/DSAA.2016.10

  44. Kumar A (2018) A framework for malware detection with static features using machine learning algorithms. A thesis submitted by Ajit Kumar in partial fulfillment of the requirements for the award of the degree. https://doi.org/10.13140/RG.2.2.35593.90723

  45. Li Z, Zhao S, Chen Y, et al. A deep-learning-based framework for severity assessment of COVID-19 with CT images. Expert Syst Appl. 2021;185: 115616. https://doi.org/10.1016/j.eswa.2021.115616.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Lucas PJF. Logic engineering in medicine. Knowl Eng Rev. 1995;10:153–79. https://doi.org/10.1017/S0269888900008134.

    Article  Google Scholar 

  47. Scudiero F, Silverio A, Di Maio M, et al. Pulmonary embolism in COVID-19 patients: prevalence, predictors and clinical outcome. Thromb Res. 2021;198:34–9.

    Article  CAS  PubMed  Google Scholar 

  48. Weikert T, Nesic I, Cyriac J, et al. Towards automated generation of curated datasets in radiology: application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism. Eur J Radiol. 2020;125: 108862. https://doi.org/10.1016/j.ejrad.2020.108862.

    Article  PubMed  Google Scholar 

  49. Tayefi M, Ngo P, Chomutare T. Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdiscip Rev. 2021;13(6):e1549. https://doi.org/10.1002/wics.1549.

    Article  MathSciNet  Google Scholar 

  50. Indexed S. Conversion of unstructured data to structured data with a profile. Int J Mech Eng Technol. 2017;8:623–30.

    Google Scholar 

  51. Schiaffino S, Codari M, Cozzi A, et al. Machine learning to predict in-hospital mortality in covid-19 patients using computed tomography-derived pulmonary and vascular features. J Pers Med. 2021. https://doi.org/10.3390/jpm11060501.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Datia N. Data mining algorithms for computer aided detection of pulmonary embolism : a comparative study. 2014

  53. Nargesian F, Samulowitz H, Khurana U, et al. Learning feature engineering for classification. Int Jt Conf Artif Intell 2017. https://doi.org/10.24963/ijcai.2017/352

  54. Card QR UpToDate ® Advanced

  55. Harrison TR, Resnick WR. Harrison’s principles of internal medicine. 618. 2022

  56. Watson KL. Medical microbiology. 2. 1978

  57. Shang Z. Use of Delphi in health sciences research: a narrative review. Medicine. 2023. https://doi.org/10.1097/MD.0000000000032829.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Chicco D, Oneto L, Tavazzi E. Eleven quick tips for data cleaning and feature engineering. PLoS Comput Biol. 2022;18:1–21. https://doi.org/10.1371/journal.pcbi.1010718.

    Article  CAS  Google Scholar 

  59. Erjavac I, Kalafatovic D, Mau G. Artificial intelligence in the life sciences coupled encoding methods for antimicrobial peptide prediction: how sensitive is a highly accurate model? Artif Intell Life Sci. 2022. https://doi.org/10.1016/j.ailsci.2022.100034.

    Article  Google Scholar 

  60. Sahoo SS, Kobow K, Zhang J, et al. Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records. Sci Rep. 2022;12:1–11. https://doi.org/10.1038/s41598-022-23101-3.

    Article  CAS  Google Scholar 

  61. Ebinger J, Wells M, Ouyang D, et al. A machine learning algorithm predicts duration of hospitalization in COVID-19 patients. Intell Med. 2021;5: 100035. https://doi.org/10.1016/j.ibmed.2021.100035.

    Article  Google Scholar 

  62. Andres M, Amell N, Awais M, et al. MethodsX attribute value extraction mechanism of constructed wetlands information. MethodsX. 2019;6:1054–67. https://doi.org/10.1016/j.mex.2019.04.017.

    Article  Google Scholar 

  63. Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials—a practical guide with flowcharts. BMC Med Res Methodol. 2017. https://doi.org/10.1186/s12874-017-0442-1.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3147–55.

    Google Scholar 

  65. Liang W, Luo S, Zhao G, Wu H. Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics. 2020;8:1–17. https://doi.org/10.3390/MATH8050765.

    Article  CAS  Google Scholar 

  66. Fang X, Gao H, Wu J. Prediction of extubation failure for intensive care unit patients using light gradient boosting machine. IEEE Access. 2019;7:150960–8. https://doi.org/10.1109/ACCESS.2019.2946980.

    Article  Google Scholar 

  67. Yu B. Fertility—LightGBM: a fertility—related protein prediction model by multi-information fusion and light gradient boosting machine. Biomed Signal Process Control. 2020;68:1–17.

    Google Scholar 

  68. Tariq A, Celi LA, Newsome JM, et al. Patient-specific COVID-19 resource utilization prediction using fusion AI model. NPJ Digit Med. 2021. https://doi.org/10.1038/s41746-021-00461-0.

    Article  PubMed  PubMed Central  Google Scholar 

  69. Fayed HA, Atiya AF. Speed up grid-search for parameter selection of support vector machines. Appl Soft Comput J. 2019;80:202–10. https://doi.org/10.1016/j.asoc.2019.03.037.

    Article  Google Scholar 

  70. Darapureddy N, Karatapu N, Battula TK. Research of machine learning algorithms using K-fold cross validation. Int J Eng Adv Technol. 2019. https://doi.org/10.35940/ijeat.F1043.0886S19.

    Article  Google Scholar 

  71. Grüning M, Kropf S. A ridge classification method for high-dimensional observations. Data Inf Anal Knowl Eng. 2006. https://doi.org/10.1007/3-540-31314-1_84.

    Article  Google Scholar 

  72. Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J Big Data. 2020. https://doi.org/10.1186/s40537-020-00369-8.

    Article  PubMed  PubMed Central  Google Scholar 

  73. Moreno-Ibarra MA, Villuendas-Rey Y, Lytras MD, et al. Classification of diseases using machine learning algorithms: a comparative study. Mathematics. 2021;9:1–21. https://doi.org/10.3390/math9151817.

    Article  Google Scholar 

  74. Zhang C, Ding Y, Peng Q. Who determines United States Healthcare out—of—pocket costs? Factor ranking and selection using ensemble learning. Heal Inf Sci Syst. 2021. https://doi.org/10.1007/s13755-021-00153-9.

    Article  Google Scholar 

  75. Zhang NJ, Rameau P, Julemis M, et al. Automated pulmonary embolism risk assessment using the wells criteria: validation study. JMIR Formative Res. 2022;6:1–9. https://doi.org/10.2196/32230.

    Article  Google Scholar 

  76. Case-study E, Banerjee I, Ph D, et al. Prediction of imaging outcomes from electronic health records : pulmonary prediction of imaging outcomes from electronic health records: pulmonary embolism case-study. In AMIA, 3–5. 2019

  77. van Es N, Kraaijpoel N, Klok FA, et al. The original and simplified Wells rules and age-adjusted D-dimer testing to rule out pulmonary embolism: an individual patient data meta-analysis. J Thromb Haemost. 2017;15:678–84. https://doi.org/10.1111/jth.13630.

    Article  PubMed  Google Scholar 

  78. Simon MA, Tan C, Hilden P, et al. Effectiveness of clinical decision tools in predicting pulmonary embolism. Pulm Med. 2021;2021:1–5.

    Article  Google Scholar 

  79. Elliott CG. Evaluation of suspected pulmonary embolism in pregnancy. J Thorac Imaging. 2012;27:3–4. https://doi.org/10.1097/RTI.0b013e31823ba521.

    Article  PubMed  Google Scholar 

  80. Zhao F, Zheng L, Shan F, et al. Evaluation of pulmonary ventilation in COVID-19 patients using oxygen-enhanced three-dimensional ultrashort echo time MRI: a preliminary study. Clin Radiol. 2021;76:391.e33-391.e41. https://doi.org/10.1016/j.crad.2021.02.008.

    Article  CAS  PubMed  Google Scholar 

  81. Waring J, Lindvall C, Umeton R. Automated machine learning: review of the state-of-the-art and opportunities for healthcare. Artif Intell Med. 2020;104: 101822. https://doi.org/10.1016/j.artmed.2020.101822.

    Article  PubMed  Google Scholar 

  82. Hong S, Lynn HS. Accuracy of random-forest-based imputation of missing data in the presence interaction. BMC Med Res Methodol. 2020;1:1–12.

    Google Scholar 

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial or non- for- profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Ebrahimi.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships between the authors and any organization that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Laffafchi, S., Ebrahimi, A. & Kafan, S. Efficient management of pulmonary embolism diagnosis using a two-step interconnected machine learning model based on electronic health records data. Health Inf Sci Syst 12, 17 (2024). https://doi.org/10.1007/s13755-024-00276-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13755-024-00276-9

Keywords

Navigation