Skip to main content

Advertisement

Log in

Use of a K-nearest neighbors model to predict the development of type 2 diabetes within 2 years in an obese, hypertensive population

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Prediabetes is a type of hyperglycemia in which patients have blood glucose levels above normal but below the threshold for type 2 diabetes mellitus (T2DM). Prediabetic patients are considered to be at high risk for developing T2DM, but not all will eventually do so. Because it is difficult to identify which patients have an increased risk of developing T2DM, we developed a model of several clinical and laboratory features to predict the development of T2DM within a 2-year period. We used a supervised machine learning algorithm to identify at-risk patients from among 1647 obese, hypertensive patients. The study period began in 2005 and ended in 2018. We constrained data up to 2 years before the development of T2DM. Then, using a time series analysis with the features of every patient, we calculated one linear regression line and one slope per feature. Features were then included in a K-nearest neighbors classification model. Feature importance was assessed using the random forest algorithm. The K-nearest neighbors model accurately classified patients in 96% of cases, with a sensitivity of 99%, specificity of 78%, positive predictive value of 96%, and negative predictive value of 94%. The random forest algorithm selected the homeostatic model assessment–estimated insulin resistance, insulin levels, and body mass index as the most important factors, which in combination with KNN had an accuracy of 99% with a sensitivity of 99% and specificity of 97%. We built a prognostic model that accurately identified obese, hypertensive patients at risk for developing T2DM within a 2-year period. Clinicians may use machine learning approaches to better assess risk for T2DM and better manage hypertensive patients. Machine learning algorithms may help health care providers make more informed decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Skyler JS, Bakris GL, Bonifacio E, Darsow T, Eckel RH, Groop L, Groop P-H, Handelsman Y, Insel RA, Mathieu C, McElvaine AT, Palmer JP, Pugliese A, Schatz DA, Sosenko JM, Wilding JPH, Ratner RE (2016) Differentiation of diabetes by pathophysiology, natural history, and prognosis. Diabetes, page db160806

  2. Sarwar N, Gao P, Kondapally Seshasai S R, Gobin R, Kaptoge S, Di Angelantonio E, Ingelsson E, Lawlor D A, Selvin E, Stampfer M, Stehouwer C D A, Lewington S, Pennells L, Thompson A, Sattar N, White I R, Ray K K, Danesh J (2010) Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies. Lancet (London, England) 375 (9733):2215–2222

    Article  CAS  Google Scholar 

  3. DeFronzo RA, Ferrannini E (1991) Insulin resistance. A multifaceted syndrome responsible for NIDDM, obesity, hypertension, dyslipidemia, and atherosclerotic cardiovascular disease. Diabetes Care 14(3):173–194

    Article  CAS  Google Scholar 

  4. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes—2018. Diabetes Care, 41(Supplement 1):S13 LP – S27 jan 2018

  5. Cerf M (2013) Beta cell dysfunction and insulin resistance

  6. Stevens JW, Khunti K, Harvey R, Johnson M, Preston L, Buckley Woods H, Davies M, Goyder E (2015) Preventing the progression to type 2 diabetes mellitus in adults at high risk: a systematic review and network meta-analysis of lifestyle, pharmacological and surgical interventions. Diabetes Res Clin Pract 107(3):320–331

    Article  CAS  Google Scholar 

  7. Fonseca VA (2009) Defining and characterizing the progression of type 2 diabetes, vol 32

  8. Garber A, Handelsman Y, Einhorn D, Bergman D, Bloomgarden Z, Fonseca V, Garvey WT, Gavin J III, Grunberger G, Horton E et al (2008) Diagnosis and management of prediabetes in the continuum of hyperglycemia—when do the risks of diabetes begin? A consensus statement from the american college of endocrinology and the american association of clinical endocrinologists. Endocrine Pract 14(7):933–946

    Article  Google Scholar 

  9. Swain A, Mohanty S N, Das AC (2016) Comparative risk analysis on prediction of diabetes mellitus using machine learning approach. In: 2016 international conference on electrical, electronics, and optimization Techniques (ICEEOT), pp 3312–3317

  10. Pradeep Kandhasamy J, Balamurali S (2015) Performance analysis of classifier models to predict diabetes mellitus. Proc Comput Sci 47:45–51

    Article  Google Scholar 

  11. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H (2018) Predicting diabetes mellitus with machine learning techniques. Frontiers in Genetics 9:515

    Article  Google Scholar 

  12. Xu W, Zhang J, Zhang Q, Wei X (2017) Risk prediction of type II diabetes based on random forest model. In: 3rd international conference on advances in electrical, electronics, information, communication and bio-informatics (AEEICB), pp 382–386, vol 2017

  13. Sisodia D, Sisodia DS (2018) Prediction of diabetes using classification algorithms. Proc Comput Sci 132:1578–1585

    Article  Google Scholar 

  14. Ribeiro ÁC, Barros AK, Santana E, Príncipe JC (2015) Diabetes classification using a redundancy reduction preprocessor

  15. Gandhi KK, Prajapati NB (2014) Diabetes prediction using feature selection and classification

  16. Jayalakshmi T, Santhakumaran A (2010) A novel classification method for diagnosis of diabetes mellitus using artificial neural networks. In: 2010 international conference on data storage and data engineering, pp 159–163

  17. Saxena K Dr, Khan Z, Singh S Diagnosis of diabetes mellitus using K nearest neighbor algorithm

  18. Panwar M, Acharyya A, Shafik R A, Biswas D (2016) K-nearest neighbor based methodology for accurate diagnosis of diabetes mellitus. In: 2016 6th international symposium on embedded computing and system design (ISED), pp 132–136

  19. Dua D, Taniskidou KE (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science. Irvine, CA, 144

  20. Shu T, Zhang B, Tang Y Y (2016) Using K-NN with weights to detect diabetes mellitus based on genetic algorithm feature selection. In: 2016 international conference on wavelet analysis and pattern recognition (ICWAPR), pp 12–17

  21. Nai-arun N, Moungmai R (2015) Comparison of classifiers for the risk of diabetes prediction. Proc Comput Sci 69:132–142

    Article  Google Scholar 

  22. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  Google Scholar 

  23. Tang J, Alelyani S, Liu H (2014) Feature selection for classification A review. Data classification: algorithms and applications, pp 37

  24. Liaw A, Wiener M, et al. (2002) Classification and regression by randomforest. R News 2(3):18–22

    Google Scholar 

  25. Ng K, Steinhubl SR, DeFilippi C, Dey S, Stewart WF (2016) Early detection of heart failure using electronic health records: practical implications for time before diagnosis, data diversity, data quantity, and data density. Circ-Cardiovasc Qual Outcomes 9(6):649–658

    Article  Google Scholar 

  26. Garcia-Carretero R, Barquero-Perez O, Mora-Jimenez I, Soguero-Ruiz C, Goya-Esteban R, Ramos-Lopez J (2019) Identification of clinically relevant features in hypertensive patients using penalized regression: a case study of cardiovascular events. Med Biol Eng Comput 57(9):2011–2026

    Article  Google Scholar 

  27. Garcia-Carretero R, Vigil-Medina L, Mora-Jimenez I, Soguero-Ruiz C, Goya-Esteban R, Ramos-Lopez J, Barquero-Perez O (2018) Cardiovascular risk assessment in prediabetic patients in a hypertensive population: the role of cystatin C. Diabetes and metabolic syndrome: Clinical research and reviews

  28. Garcia-Carretero R, Vigil-Medina L, Barquero-Perez O, Goya-Esteban R, Mora-Jimenez I, Soguero-Ruiz C, Ramos-Lopez J (2017) Cystatin C as a predictor of cardiovascular outcomes in a hypertensive population. Journal of human hypertension

  29. Lepot M, Aubin J-B, Clemens F (2017) Interpolation in time series: an introductive overview of existing methods, their performance criteria and uncertainty assessment. Water 9(10):796

    Article  Google Scholar 

  30. Kuhn M, Johnson K (2013) Applied predictive modeling

  31. Alkhatatbeh MJ, Abdul-Razzak KK, Khasawneh LQ, Saadeh NA (2017) High prevalence of vitamin d deficiency and correlation of serum vitamin d with cardiovascular risk in patients with metabolic syndrome. Metabolic Syndrome and Related Disorders 15(5):213–219

    Article  CAS  Google Scholar 

  32. Al-Timimi Dhia J, Ali Ardawan F (2013) Serum 25 (oh) d in diabetes mellitus type 2: relation to glycaemic control. J Clin Diagn Res JCDR 7(12):2686

    PubMed  Google Scholar 

  33. Venables W N, Ripley B D (2002) Modern applied statistics with S, 4th edn. Springer, New York

    Book  Google Scholar 

  34. R Core Team (2017) R: A language and environment for statistical computing

  35. DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44(3):837–845

    Article  CAS  Google Scholar 

  36. Sing T, Sander O, Beerenwinkel N, Lengauer T (2005) ROCR: visualizing classifier performance in R, vol 21

  37. Saaristo T, Moilanen L, Korpi-Hyovalti E, Vanhala M, Saltevo J, Niskanen L, Jokelainen J, Peltonen M, Oksa H, Tuomilehto J, Uusitupa M, Keinanen-Kiukaanniemi S (2010) Lifestyle intervention for prevention of type 2 diabetes in primary health care: one-year follow-up of the Finnish National Diabetes Prevention Program (FIN-D2D). Diabetes Care 33(10):2146–2151

    Article  Google Scholar 

  38. Saaristo T, Peltonen M, Keinanen-Kiukaanniemi S, Vanhala M, Saltevo J, Niskanen L, Oksa H, Korpi-Hyovalti E, Tuomilehto J (2007) National type 2 diabetes prevention programme in Finland: FIN-D2D. Int J Circ Health 66(2):101–112

    Article  Google Scholar 

  39. Meijnikman AS, De Block CEM, Verrijken A, Mertens I, Van Gaal LF (2018) Predicting type 2 diabetes mellitus: a comparison between the findrisc score and the metabolic syndrome. Diabetol Metab Syndr 10 (1):12

    Article  Google Scholar 

  40. Vandersmissen GJ, Godderis Lode (2015) Evaluation of the finnish diabetes risk score (findrisc) for diabetes screening in occupational health care. Int J Occup Med Environ Health 28(3):587–91

    Article  Google Scholar 

  41. Wilson PWF, Meigs JB, Sullivan L, Fox CS, Nathan DM, Sr D’Agostino RB (2007) Prediction of incident diabetes mellitus in middle-aged adults: the Framingham Offspring Study. Arch Int Med 167(10):1068–107

    Article  Google Scholar 

  42. Martinez-Millana A, Argente-Pla M, Martinez BV, Salcedo VT, Merino-Torres JF (2019) Driving type 2 diabetes risk scores into clinical practice: performance analysis in hospital settings. J Clin Med 8(1):107

    Article  Google Scholar 

  43. Srikanthan P, Karlamangla AS (2011) Relative muscle mass is inversely associated with insulin resistance and prediabetes. Findings from the third national health and nutrition examination survey. J Clin Endocrinol Metab 96 (9):2898–2903

    Article  CAS  Google Scholar 

  44. Wimalawansa SJ (2018) Associations of vitamin d with insulin resistance, obesity, type 2 diabetes, and metabolic syndrome. J Steroid Biochem Mol Biol 175:177–189

    Article  CAS  Google Scholar 

  45. Lima LMTR (2017) Prediabetes definitions and clinical outcomes

  46. Haffner SM, Mykkanen L, Festa A, Burke JP, Stern MP (2000) Insulin-resistant prediabetic subjects have more atherogenic risk factors than insulin-sensitive prediabetic subjects: implications for preventing coronary heart disease during the prediabetic state. Circulation 101(9):975–980

    Article  CAS  Google Scholar 

Download references

Funding

This work was partially funded by Research Project Nos. TEC2016-75361-R and TEC2016-75161-C2-1-R from the Spanish Government and by Research Project No. DTS17/00158 from the Instituto de Salud Carlos III (Spain).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Garcia-Carretero.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garcia-Carretero, R., Vigil-Medina, L., Mora-Jimenez, I. et al. Use of a K-nearest neighbors model to predict the development of type 2 diabetes within 2 years in an obese, hypertensive population. Med Biol Eng Comput 58, 991–1002 (2020). https://doi.org/10.1007/s11517-020-02132-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-020-02132-w

Keywords

Navigation