Skip to main content
Log in

Enhancing the prediction of type 2 diabetes mellitus using sparse balanced SVM

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The natural population-based prediction of type 2 diabetes is costly since it needs a high number of resources. Even though much research has used machine learning algorithms to predict type II diabetes, it could not obtain a sufficient sensitivity range due to imbalanced and sparse data. This research aims to utilize noninvasive features from electronic health records with a machine-learning algorithm, namely Sparse Balance- Support Vector Machine (SB-SVM), to handle the imbalanced data and achieve high precision. The proposed system uses SB-SVM to create sparsity and implicitly to select the highest relevant features from the imbalanced data. Initially, we preprocess the data using different baseline variables and filters. Secondly, different features are extracted from the preprocessed data using inclusion and exclusion criteria as filters. Thirdly, we selected 12 highly relevant features to diabetes prediction using statistical analysis and logistic regression. Then, we train and test the proposed model using the nested stratified cross-validation method. Finally, the optimal model performance is evaluated based on the test set. The proposed model predicts type 2 diabetes mellitus using the noninvasive features, with enhanced sensitivity and less processing time. Our solution outperforms the state-of-the-art in most performance metrics. Accuracy, precision, recall, and Area Under the Curve (AUC) of the best solution are 67.22%, 62.93%, 69.96%, and 69.96%, respectively. In comparison, our solution achieved Accuracy, precision, recall, and AUC of 76.39%, 66.86%, 76.74%, and 85.08%, respectively. The average processing time is decreased from 40 ~ 85 folds/sec to 8.9 ~ 10.7 folds/sec. To conclude, the proposed system improves the precision and sensitivity of diabetes prediction with minimal processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Abbas HT, Alic L, Erraguntla M, Ji JX, Abdul-Ghani M, Abbasi QH, Qaraqe MK (2019) Predicting long-term type 2 diabetes with support vector machine using oral glucose tolerance test. PLoS ONE 14(12):1–11. https://doi.org/10.1371/journal.pone.0219636

    Article  Google Scholar 

  2. Anderson AE, Kerr WT, Thames A, Li T, Xiao J, Cohen MS (2016) Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: A cross-sectional, unselected, retrospective study,. J Biomed Inform 60:162–168. https://doi.org/10.1016/j.jbi.2015.12.006

    Article  Google Scholar 

  3. Baghdasarian S, Lin HP, Pickering RT, Mott MM, Singer MR, Bradlee ML, Moore LL (2018) Dietary cholesterol intake is not associated with risk of type 2 diabetes in the framingham offspring study. Nutrients 10(6):665–677, [Online]. Available: https://www.mdpi.com/2072-6643/10/6/665

  4. Beam AL, Kohane IS (2018) Big data and machine learning in health care. JAMA 319(13):1317–1318. https://doi.org/10.1001/jama.2017.18391

    Article  Google Scholar 

  5. Bernardini M, Morettini M, Romeo L, Frontoni E, Burattini L (2020) Early temporal prediction of type 2 diabetes risk condition from a general practitioner electronic health record: a multiple instance boosting approach. Artif Intell Med 105:101847–101858. https://doi.org/10.1016/j.artmed.2020.101847

  6. Bernardini M, Romeo L, Misericordia P, Frontoni E (2020) Discovering the type 2 diabetes in electronic health records using the sparse balanced support vector machine. IEEE J Biomed Health Inf 24(1):235–246. https://doi.org/10.1109/JBHI.2019.2899218

    Article  Google Scholar 

  7. Cahn A, Shoshan A, Sagiv T, Yesharim R, Goshen R, Shalev V, Raz I (2020) Prediction of progression from pre-diabetes to diabetes: Development and validation of a machine learning model. Diabetes Metab Res Rev 36(2):3252–3260. https://doi.org/10.1002/dmrr.3252

    Article  Google Scholar 

  8. El-Sappagh S, Elmogy M, Ali F, Abuhmed T, Islam SMR, Kwak K-S (2019) A comprehensive medical decision–support framework based on a heterogeneous ensemble classifier for diabetes prediction. Electronics 8(6):635–664, [Online]. Available: https://www.mdpi.com/2079-9292/8/6/635

  9. Han L, Luo S, Yu J, Pan L, Chen S (2015) Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE J Biomed Health Inf 19(2):728–734. https://doi.org/10.1109/JBHI.2014.2325615

    Article  Google Scholar 

  10. Ijaz MF, Alfian G, Syafrudin M, Rhee J (2018) Hybrid prediction model for type 2 diabetes and hypertension using DBSCAN-based outlier detection, Synthetic Minority Over Sampling Technique (SMOTE), and random forest. Appl Sci 8(8):1325–1339, [Online]. Available: https://www.mdpi.com/2076-3417/8/8/1325

  11. Islam MS, Qaraqe MK, Belhaouari SB, Abdul-Ghani MA (2020) Advanced techniques for predicting the future progression of type 2 diabetes. IEEE Access 8:120537–120547. https://doi.org/10.1109/ACCESS.2020.3005540

    Article  Google Scholar 

  12. Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G (2020) Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci Rep 10(1):11981–11993. https://doi.org/10.1038/s41598-020-68771-z

    Article  Google Scholar 

  13. Lai H, Huang H, Keshavjee K, Guergachi A, Gao X (2019) Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr Disord 19(1):101–110. https://doi.org/10.1186/s12902-019-0436-6

    Article  Google Scholar 

  14. Maeta K, Nishiyama Y, Fujibayashi K, Gunji T, Sasabe N, Iijima K, Naito T (2018) Prediction of glucose metabolism disorder risk using a machine learning algorithm: pilot study. JMIR Diabetes 3(4):1–12. https://doi.org/10.2196/10212

    Article  Google Scholar 

  15. Miotto R, Li L, Kidd BA, Dudley JT (2016) Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 6(1):26094–26104. https://doi.org/10.1038/srep26094

    Article  Google Scholar 

  16. NHANES Questionnaires, Datasets, and Related Documentation. Centers for Disease Control and Prevention (CDC). https://wwwn.cdc.gov/nchs/nhanes/default.aspx. Accessed 2020

  17. Nguyen BP, Pham HN, Tran H, Nghiem N, Nguyen QH, Do TTT, Tran CT, Simpson CR (2019) Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. Comput Methods Programs Biomed 182:105055–105064. https://doi.org/10.1016/j.cmpb.2019.105055

  18. Perveen S, Shahbaz M, Ansari MS, Keshavjee K, Guergachi A (2019) A hybrid approach for modeling type 2 diabetes mellitus progression. Front Genet 10:1076–1086. https://doi.org/10.3389/fgene.2019.01076

    Article  Google Scholar 

  19. Perveen S, Shahbaz M, Saba T, Keshavjee K, Rehman A, Guergachi A (2020) Handling irregularly sampled longitudinal data and prognostic modeling of diabetes using machine learning technique. IEEE Access 8:21875–21885

    Article  Google Scholar 

  20. Pimentel A, Carreiro AV, Ribeiro RT, Gamboa H (2018) Screening diabetes mellitus 2 based on electronic health records using temporal features. Health Inf J 24(2):194–205. https://doi.org/10.1177/1460458216663023

    Article  Google Scholar 

  21. Roberts S, Barry E, Craig D, Airoldi M, Bevan G, Greenhalgh T (2017) Preventing type 2 diabetes: systematic review of studies of cost-effectiveness of lifestyle programmes and metformin, with and without screening, for pre-diabetes. BMJ Open 7(11):1–17. https://doi.org/10.1136/bmjopen-2017-017184

    Article  Google Scholar 

  22. Sneha N, Gangil T (2019) Analysis of diabetes mellitus for early prediction using optimal features selection,. J Big Data 6(1):1–19. https://doi.org/10.1186/s40537-019-0175-6

    Article  Google Scholar 

  23. Štiglic G, Kocbek P, Cilar L, Fijačko N, Stožer A, Zaletel J, Sheikh A, Povalej Bržan P (2018) Development of a screening tool using electronic health records for undiagnosed Type 2 diabetes mellitus and impaired fasting glucose detection in the Slovenian population. Diabet Med 35(5):640–649. https://doi.org/10.1111/dme.13605

    Article  Google Scholar 

  24. Wang Y, Li P, Tian Y, Ren J, Li J (2017) A shared decision-making system for diabetes medication choice utilizing electronic health record data. IEEE J Biomed Health Inf 21(5):1280–1287

    Article  Google Scholar 

  25. Wilson PWF, Meigs JB, Sullivan L, Fox CS, Nathan DM, D’Agostino RB (2007) Prediction of incident diabetes mellitus in middle-aged adults: the Framingham offspring study. Arch Intern Med 167(10):1068–1074. https://doi.org/10.1001/archinte.167.10.1068

    Article  Google Scholar 

  26. Wu J-H, Li J, Wang J, Zhang L, Wang H-D, Wang G-L, Li X-l, Yuan J-X (2020) Risk prediction of type 2 diabetes in steel workers based on convolutional neural network. Neural Comput Appl 32(13):9683–9698. https://doi.org/10.1007/s00521-019-04489-y

    Article  Google Scholar 

  27. Yang T, Yi L, Feng H, Li S, Chen H, Zhu J, Zhao J, Zeng Y, Liu H (2020) Ensemble learning models based on noninvasive features for type 2 diabetes screening: model development and validation. JMIR Med Inform 8(6):1–11. https://doi.org/10.2196/15431

    Article  Google Scholar 

  28. Zhang L, Shang X, Sreedharan S, Yan X, Liu J, Keel S, Wu J, Peng W, He M (2020) Predicting the development of type 2 diabetes in a large Australian Cohort using machine-learning techniques: longitudinal survey study. JMIR Med Inform 8(7):1–10. https://doi.org/10.2196/16850

    Article  Google Scholar 

  29. Zhang L, Wang Y, Niu M, Wang C, Wang Z (2020) Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study. Sci Rep 10(1):4406. https://doi.org/10.1038/s41598-020-61123-x

    Article  Google Scholar 

  30. Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y (2017) A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform 97:120–127. https://doi.org/10.1016/j.ijmedinf.2016.09.014

  31. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H (2018) Predicting diabetes mellitus with machine learning techniques (in English). Front Genet Original Research 9(515):1–10. https://doi.org/10.3389/fgene.2018.00515

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abeer Alsadoon.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shrestha, B., Alsadoon, A., Prasad, P.W.C. et al. Enhancing the prediction of type 2 diabetes mellitus using sparse balanced SVM. Multimed Tools Appl 81, 38945–38969 (2022). https://doi.org/10.1007/s11042-022-13087-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13087-5

Keywords