Abstract
This paper presents the results of a comparative study of machine learning techniques when predicting deep vein thrombosis. We used the Ri-Schedule dataset with Electronic Health Records of suspected thrombotic patients for training and validation. A total of 1653 samples and 59 predictors were included in this study.
We have compared 20 standard machine learning algorithms and identified the best-performing ones: Random Forest, XGBoost, GradientBoosting and HistGradientBoosting classifiers. After hyper-parameter optimization, the best overall accuracy of 0.91 was shown by GradientBoosting classifier using only 15 of the original variables.
We have also tuned the algorithms for maximum sensitivity. The best specificity was offered by Random Forests. At maximum sensitivity of 1.0 and specificity of 0.41, the Random Forest model was able to identify 23% additional negative cases over the screening practice in use today.
These results suggest that machine learning could offer practical value in real-life implementations if combined with traditional methods for ruling out deep vein thrombosis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bishop, C.M., et al.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Bordes, A., Bottou, L., Gallinari, P.: SGD-QN: careful quasi-newton stochastic gradient descent. J. Mach. Learn. Res. 10, 1737–1754 (2009)
Božič, M., Blinc, A., Stegnar, M.: D-dimer, other markers of haemostasis activation and soluble adhesion molecules in patients with different clinical probabilities of deep vein thrombosis. Thromb. Res. 108(2), 107–114 (2002). https://doi.org/10.1016/S0049-3848(03)00007-0
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). https://doi.org/10.1007/BF00058655
Breiman, L.: Rejoinder: arcing classifiers. Ann. Stat. 26(3), 841–849 (1998). http://www.jstor.org/stable/120059
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Routledge, New York (2017)
Chan, T., Golub, G., LeVeque, R.: Technical report STAN-CS-79-773, Department of Computer Science (1979)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Coleman, D.M., Wakefield, T.W.: Biomarkers for the diagnosis of deep vein thrombosis. Expert Opin. Med. Diagn. 6(4), 253–257 (2012)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive aggressive algorithms (2006)
Douma, R.A., et al.: Using an age-dependent D-dimer cut-off value increases the number of older patients in whom deep vein thrombosis can be safely excluded. Haematologica 97(10), 1507 (2012)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451
Fronas, S.G., et al.: Safety of D-dimer testing as a stand-alone test for the exclusion of deep vein thrombosis as compared with other strategies. J. Thromb. Haemost. 16(12), 2471–2481 (2018). https://doi.org/10.1111/jth.14314
Fronas, S.G., et al.: Safety and feasibility of rivaroxaban in deferred workup of patients with suspected deep vein thrombosis. Blood Adv. 4(11), 2468–2476 (2020). https://doi.org/10.1182/bloodadvances.2020001556
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970). https://doi.org/10.1080/00401706.1970.10488634
Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, Hoboken (2013)
Johnson, E.D., Schell, J.C., Rodgers, G.M.: The D-dimer assay. Am. J. Hematol. 94(7), 833–839 (2019)
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
Le Gal, G., et al.: Prediction of pulmonary embolism in the emergency department: the revised Geneva score. Ann. Intern. Med. 144(3), 165–171 (2006). https://doi.org/10.7326/0003-4819-144-3-200602070-00004
Lippi, G., Cervellin, G., Franchini, M., Favaloro, E.J.: Biochemical markers for the diagnosis of venous thromboembolism: the past, present and future. J. Thromb. Thrombolysis 30(4), 459–471 (2010). https://doi.org/10.1007/s11239-010-0460-x
Luo, L., Kou, R., Feng, Y., Xiang, J., Zhu, W.: Cost-effective machine learning based clinical pre-test probability strategy for DVT diagnosis in neurological intensive care unit. Clin. Appl. Thromb. Hemost. 27 (2021). https://doi.org/10.1177/10760296211008650
Ma, H., et al.: A novel hierarchical machine learning model for hospital-acquired venous thromboembolism risk assessment among multiple-departments. J. Biomed. Inform. 122, 103892 (2021). https://doi.org/10.1016/j.jbi.2021.103892
Nafee, T., et al.: Machine learning to predict venous thrombosis in acutely ill medical patients. Res. Pract. Thromb. Haemost. 4(2), 230–237 (2020). https://doi.org/10.1002/rth2.12292
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)
Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016)
Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41136-6_5
Tharwat, A.: Linear vs quadratic discriminant analysis classifier: a tutorial. Int. J. Appl. Pattern Recogn. 3(2), 145–180 (2016)
Wang, K.Y., et al.: Using predictive modeling and supervised machine learning to identify patients at risk for venous thromboembolism following posterior lumbar fusion. Glob. Spine J. (2021). https://doi.org/10.1177/21925682211019361
Wang, X., Yang, Y.Q., Liu, S.H., Hong, X.Y., Sun, X.F., Shi, J.H.: Comparing different venous thromboembolism risk assessment machine learning models in Chinese patients. J. Eval. Clin. Pract. 26(1), 26–34 (2020). https://doi.org/10.1111/jep.13324
Wells, P.S., et al.: Value of assessment of pretest probability of deep-vein thrombosis in clinical management. The Lancet 350(9094), 1795–1798 (1997). https://doi.org/10.1016/S0140-6736(97)08140-3
Wilbur, J., Shian, B.: Diagnosis of deep venous thrombosis and pulmonary embolism. Am. Fam. Physician 86(10), 913–919 (2012)
Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. MIT Press, Cambridge (2006)
Xue, B., et al.: Use of machine learning to develop and evaluate models using preoperative and intraoperative data to identify risks of postoperative complications. JAMA Netw. Open 4(3), e212240 (2021). https://doi.org/10.1001/jamanetworkopen.2021.2240
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sorano, R., Magnusson, L.V., Abbas, K. (2022). Comparing Effectiveness of Machine Learning Methods for Diagnosis of Deep Vein Thrombosis. In: Gervasi, O., Murgante, B., Misra, S., Rocha, A.M.A.C., Garau, C. (eds) Computational Science and Its Applications – ICCSA 2022 Workshops. ICCSA 2022. Lecture Notes in Computer Science, vol 13381. Springer, Cham. https://doi.org/10.1007/978-3-031-10548-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-10548-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10547-0
Online ISBN: 978-3-031-10548-7
eBook Packages: Computer ScienceComputer Science (R0)