Abstract
The natural population-based prediction of type 2 diabetes is costly since it needs a high number of resources. Even though much research has used machine learning algorithms to predict type II diabetes, it could not obtain a sufficient sensitivity range due to imbalanced and sparse data. This research aims to utilize noninvasive features from electronic health records with a machine-learning algorithm, namely Sparse Balance- Support Vector Machine (SB-SVM), to handle the imbalanced data and achieve high precision. The proposed system uses SB-SVM to create sparsity and implicitly to select the highest relevant features from the imbalanced data. Initially, we preprocess the data using different baseline variables and filters. Secondly, different features are extracted from the preprocessed data using inclusion and exclusion criteria as filters. Thirdly, we selected 12 highly relevant features to diabetes prediction using statistical analysis and logistic regression. Then, we train and test the proposed model using the nested stratified cross-validation method. Finally, the optimal model performance is evaluated based on the test set. The proposed model predicts type 2 diabetes mellitus using the noninvasive features, with enhanced sensitivity and less processing time. Our solution outperforms the state-of-the-art in most performance metrics. Accuracy, precision, recall, and Area Under the Curve (AUC) of the best solution are 67.22%, 62.93%, 69.96%, and 69.96%, respectively. In comparison, our solution achieved Accuracy, precision, recall, and AUC of 76.39%, 66.86%, 76.74%, and 85.08%, respectively. The average processing time is decreased from 40 ~ 85 folds/sec to 8.9 ~ 10.7 folds/sec. To conclude, the proposed system improves the precision and sensitivity of diabetes prediction with minimal processing time.









Similar content being viewed by others
References
Abbas HT, Alic L, Erraguntla M, Ji JX, Abdul-Ghani M, Abbasi QH, Qaraqe MK (2019) Predicting long-term type 2 diabetes with support vector machine using oral glucose tolerance test. PLoS ONE 14(12):1–11. https://doi.org/10.1371/journal.pone.0219636
Anderson AE, Kerr WT, Thames A, Li T, Xiao J, Cohen MS (2016) Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: A cross-sectional, unselected, retrospective study,. J Biomed Inform 60:162–168. https://doi.org/10.1016/j.jbi.2015.12.006
Baghdasarian S, Lin HP, Pickering RT, Mott MM, Singer MR, Bradlee ML, Moore LL (2018) Dietary cholesterol intake is not associated with risk of type 2 diabetes in the framingham offspring study. Nutrients 10(6):665–677, [Online]. Available: https://www.mdpi.com/2072-6643/10/6/665
Beam AL, Kohane IS (2018) Big data and machine learning in health care. JAMA 319(13):1317–1318. https://doi.org/10.1001/jama.2017.18391
Bernardini M, Morettini M, Romeo L, Frontoni E, Burattini L (2020) Early temporal prediction of type 2 diabetes risk condition from a general practitioner electronic health record: a multiple instance boosting approach. Artif Intell Med 105:101847–101858. https://doi.org/10.1016/j.artmed.2020.101847
Bernardini M, Romeo L, Misericordia P, Frontoni E (2020) Discovering the type 2 diabetes in electronic health records using the sparse balanced support vector machine. IEEE J Biomed Health Inf 24(1):235–246. https://doi.org/10.1109/JBHI.2019.2899218
Cahn A, Shoshan A, Sagiv T, Yesharim R, Goshen R, Shalev V, Raz I (2020) Prediction of progression from pre-diabetes to diabetes: Development and validation of a machine learning model. Diabetes Metab Res Rev 36(2):3252–3260. https://doi.org/10.1002/dmrr.3252
El-Sappagh S, Elmogy M, Ali F, Abuhmed T, Islam SMR, Kwak K-S (2019) A comprehensive medical decision–support framework based on a heterogeneous ensemble classifier for diabetes prediction. Electronics 8(6):635–664, [Online]. Available: https://www.mdpi.com/2079-9292/8/6/635
Han L, Luo S, Yu J, Pan L, Chen S (2015) Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE J Biomed Health Inf 19(2):728–734. https://doi.org/10.1109/JBHI.2014.2325615
Ijaz MF, Alfian G, Syafrudin M, Rhee J (2018) Hybrid prediction model for type 2 diabetes and hypertension using DBSCAN-based outlier detection, Synthetic Minority Over Sampling Technique (SMOTE), and random forest. Appl Sci 8(8):1325–1339, [Online]. Available: https://www.mdpi.com/2076-3417/8/8/1325
Islam MS, Qaraqe MK, Belhaouari SB, Abdul-Ghani MA (2020) Advanced techniques for predicting the future progression of type 2 diabetes. IEEE Access 8:120537–120547. https://doi.org/10.1109/ACCESS.2020.3005540
Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G (2020) Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci Rep 10(1):11981–11993. https://doi.org/10.1038/s41598-020-68771-z
Lai H, Huang H, Keshavjee K, Guergachi A, Gao X (2019) Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr Disord 19(1):101–110. https://doi.org/10.1186/s12902-019-0436-6
Maeta K, Nishiyama Y, Fujibayashi K, Gunji T, Sasabe N, Iijima K, Naito T (2018) Prediction of glucose metabolism disorder risk using a machine learning algorithm: pilot study. JMIR Diabetes 3(4):1–12. https://doi.org/10.2196/10212
Miotto R, Li L, Kidd BA, Dudley JT (2016) Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 6(1):26094–26104. https://doi.org/10.1038/srep26094
NHANES Questionnaires, Datasets, and Related Documentation. Centers for Disease Control and Prevention (CDC). https://wwwn.cdc.gov/nchs/nhanes/default.aspx. Accessed 2020
Nguyen BP, Pham HN, Tran H, Nghiem N, Nguyen QH, Do TTT, Tran CT, Simpson CR (2019) Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. Comput Methods Programs Biomed 182:105055–105064. https://doi.org/10.1016/j.cmpb.2019.105055
Perveen S, Shahbaz M, Ansari MS, Keshavjee K, Guergachi A (2019) A hybrid approach for modeling type 2 diabetes mellitus progression. Front Genet 10:1076–1086. https://doi.org/10.3389/fgene.2019.01076
Perveen S, Shahbaz M, Saba T, Keshavjee K, Rehman A, Guergachi A (2020) Handling irregularly sampled longitudinal data and prognostic modeling of diabetes using machine learning technique. IEEE Access 8:21875–21885
Pimentel A, Carreiro AV, Ribeiro RT, Gamboa H (2018) Screening diabetes mellitus 2 based on electronic health records using temporal features. Health Inf J 24(2):194–205. https://doi.org/10.1177/1460458216663023
Roberts S, Barry E, Craig D, Airoldi M, Bevan G, Greenhalgh T (2017) Preventing type 2 diabetes: systematic review of studies of cost-effectiveness of lifestyle programmes and metformin, with and without screening, for pre-diabetes. BMJ Open 7(11):1–17. https://doi.org/10.1136/bmjopen-2017-017184
Sneha N, Gangil T (2019) Analysis of diabetes mellitus for early prediction using optimal features selection,. J Big Data 6(1):1–19. https://doi.org/10.1186/s40537-019-0175-6
Štiglic G, Kocbek P, Cilar L, Fijačko N, Stožer A, Zaletel J, Sheikh A, Povalej Bržan P (2018) Development of a screening tool using electronic health records for undiagnosed Type 2 diabetes mellitus and impaired fasting glucose detection in the Slovenian population. Diabet Med 35(5):640–649. https://doi.org/10.1111/dme.13605
Wang Y, Li P, Tian Y, Ren J, Li J (2017) A shared decision-making system for diabetes medication choice utilizing electronic health record data. IEEE J Biomed Health Inf 21(5):1280–1287
Wilson PWF, Meigs JB, Sullivan L, Fox CS, Nathan DM, D’Agostino RB (2007) Prediction of incident diabetes mellitus in middle-aged adults: the Framingham offspring study. Arch Intern Med 167(10):1068–1074. https://doi.org/10.1001/archinte.167.10.1068
Wu J-H, Li J, Wang J, Zhang L, Wang H-D, Wang G-L, Li X-l, Yuan J-X (2020) Risk prediction of type 2 diabetes in steel workers based on convolutional neural network. Neural Comput Appl 32(13):9683–9698. https://doi.org/10.1007/s00521-019-04489-y
Yang T, Yi L, Feng H, Li S, Chen H, Zhu J, Zhao J, Zeng Y, Liu H (2020) Ensemble learning models based on noninvasive features for type 2 diabetes screening: model development and validation. JMIR Med Inform 8(6):1–11. https://doi.org/10.2196/15431
Zhang L, Shang X, Sreedharan S, Yan X, Liu J, Keel S, Wu J, Peng W, He M (2020) Predicting the development of type 2 diabetes in a large Australian Cohort using machine-learning techniques: longitudinal survey study. JMIR Med Inform 8(7):1–10. https://doi.org/10.2196/16850
Zhang L, Wang Y, Niu M, Wang C, Wang Z (2020) Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study. Sci Rep 10(1):4406. https://doi.org/10.1038/s41598-020-61123-x
Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y (2017) A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform 97:120–127. https://doi.org/10.1016/j.ijmedinf.2016.09.014
Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H (2018) Predicting diabetes mellitus with machine learning techniques (in English). Front Genet Original Research 9(515):1–10. https://doi.org/10.3389/fgene.2018.00515
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shrestha, B., Alsadoon, A., Prasad, P.W.C. et al. Enhancing the prediction of type 2 diabetes mellitus using sparse balanced SVM. Multimed Tools Appl 81, 38945–38969 (2022). https://doi.org/10.1007/s11042-022-13087-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13087-5