Skip to main content

Advertisement

Log in

A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In recent years, the prevalence of chronic diseases such as type 2 diabetes mellitus (T2DM) has increased, bringing a heavy burden to healthcare systems. While regular monitoring of patients is expensive and impractical, understanding chronic disease progressions and identifying patients at risk of developing comorbidities are crucial. This research used a real-world administrative claim dataset of T2DM to develop an ensemble of innovative patient network and machine learning approach for disease prediction. The healthcare data of 1,028 T2DM patients and 1,028 non-T2DM patients are extracted from the de-identified data to predict the risk of T2DM. The proposed model is based on the ‘patient network’, which represents the underlying relationships among health conditions for a group of patients diagnosed with the same disease using the graph theory. Besides patients’ socio-demographic and behaviour characteristics, the attributes of the ‘patient network’ (e.g., centrality measure) discover patients’ latent features, which are effective in risk prediction. We apply eight machine learning models (Logistic Regression, K-Nearest Neighbours, Support Vector Machine, Naïve Bayes, Decision Tree, Random Forest, XGBoost and Artificial Neural Network) to the extracted features to predict the chronic disease risk. The extensive experiments show that the proposed framework with machine learning classifiers performance with the Area Under Curve (AUC) ranged from 0.79 to 0.91. The Random Forest model outperformed the other models; whereas, eigenvector centrality and closeness centrality of the network and patient age are the most important features for the model. The outstanding performance of our model provides promising potential applications in healthcare services. Also, we provide strong evidence that the extracted latent features are essential in the disease risk prediction. The proposed approach offers vital insight into chronic disease risk prediction that could benefit healthcare service providers and their stakeholders.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. World Health Organization (2020) Diabetes. https://www.who.int/news-room/fact-sheets/detail/diabetes. Accessed 8 March 2021

  2. Hossain M E, Uddin S, Khan A (2021) Network analytics and machine learning for predictive risk modelling of cardiovascular disease in patients with type 2 diabetes. Expert Syst Appl 164:113918

    Article  Google Scholar 

  3. Australian Institute of Health and Welfare (2021) Diabetes. https://www.aihw.gov.au/reports/diabetes/diabetes/contents/what-is-diabetes. Accessed 8 March 2021

  4. Jermendy G (2005) Can type 2 diabetes mellitus be considered preventable?. Diabetes Res Clin Practice 68:S73– S81

    Article  Google Scholar 

  5. Rathmann W, Haastert B, Icks A, Löwel H, Meisinger C, Holle R, Giani G (2003) High prevalence of undiagnosed diabetes mellitus in southern germany: target populations for efficient screening. the kora survey 2000. Diabetologia 46(2):182–189

    Article  Google Scholar 

  6. Zhang L, Wang Y, Niu M, Wang C, Wang Z (2020) Machine learning for characterizing risk of type 2 diabetes mellitus in a rural chinese population: The henan rural cohort study. Sci Rep 10(1):1–10

    Google Scholar 

  7. Khan A, Uddin S, Srinivasan U (2019) Chronic disease prediction using administrative data and graph theory: The case of type 2 diabetes. Expert Syst Appl 136:230–241

    Article  Google Scholar 

  8. Collins G S, Mallett S, Omar O, Yu L-M (2011) Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med 9(1):1–14

    Article  Google Scholar 

  9. Fiorini S, Hajati F, Barla A, Girosi F (2019) Predicting diabetes second-line therapy initiation in the australian population via time span-guided neural attention network. PloS One 14(10):e0211844

    Article  Google Scholar 

  10. Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G (2020) Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci Rep 10(1):1–12

    Article  Google Scholar 

  11. Sahoo A K, Pradhan C, Das H (2020) Performance evaluation of different machine learning methods and deep-learning based convolutional neural network for health decision making. In: Nature inspired computing for data science. Springer, pp 201– 212

  12. Heydari M, Teimouri M, Heshmati Z, Alavinia S M (2016) Comparison of various classification algorithms in the diagnosis of type 2 diabetes in iran. Int J Diabetes Dev Count 36(2):167– 173

    Article  Google Scholar 

  13. Samant P, Agarwal R (2018) Machine learning techniques for medical diagnosis of diabetes using iris images. Comput Methods Program Biomed 157:121–128

    Article  Google Scholar 

  14. Xiao Q, Dai J, Luo J, Fujita H (2019) Multi-view manifold regularized learning-based method for prioritizing candidate disease mirnas. Knowl-Based Syst 175:118–129

    Article  Google Scholar 

  15. Butt AH, Rovini E, Fujita H, Maremmani C, Cavallo F (2020) Data-driven models for objective grading improvement of parkinson’s disease. Ann Biomed Eng 48(12):2976–2987

    Article  Google Scholar 

  16. Zhang X, Yang Y, Li T, Zhang Y, Wang H, Fujita H (2021) Cmc: A consensus multi-view clustering model for predicting alzheimers disease progression. Comput Methods Prog Biomed 199:105895

    Article  Google Scholar 

  17. Lei X, Tie J, Fujita H (2020) Relational completion based non-negative matrix factorization for predicting metabolite-disease associations. Knowl-Based Syst 204:106238

    Article  Google Scholar 

  18. Uddin S, Khan A, Hossain M E, Moni M A (2019) Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inf Decis Making 19(1):1–16

    Google Scholar 

  19. Razavian N, Blecker S, Schmidt A M, Smith-McLallen A, Nigam S, Sontag D (2015) Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3(4):277–287

    Article  Google Scholar 

  20. Barabsi A-L (2007) Network medicine - from obesity to the ‘diseasome’. England J Med 357 (4):404–407

    Article  Google Scholar 

  21. Loscalzo J, Kohane I, Barabasi A-L (2007) Human disease classification in the postgenomic era: a complex systems approach to human pathobiology. Mol Syst Biol 3(1):124

    Article  Google Scholar 

  22. Fotouhi B, Momeni N, Riolo M A, Buckeridge D L (2018) Statistical methods for constructing disease comorbidity networks from longitudinal inpatient data. Appl Netw Sci 3(1):1–34

    Article  Google Scholar 

  23. Aguado A, Moratalla-Navarro F, López-Simarro F, Moreno V (2020) Morbinet: multimorbidity networks in adult general population. analysis of type 2 diabetes mellitus comorbidity. Sci Rep 10(1):1–12

    Article  Google Scholar 

  24. Folino F, Pizzuti C, Ventura M (2010) A comorbidity network approach to predict disease risk. In: International Conference on Information Technology in Bio-and Medical Informatics. Springer, pp 102–109

  25. World Health Organization (2020) International classification of diseases (ICD) information sheet. https://www.who.int/classifications/icd/factsheet/en/. Accessed 8 March 2021

  26. The Australian Classification of Health Interventions (2020) ICD-10-AM. http://www.accd.net.au/icd-10-am-achi-acs/. Accessed 8 March 2021

  27. Charlson M E, Pompei P, Ales K L, MacKenzie C R (1987) A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chron Diseas 40(5):373–383

    Article  Google Scholar 

  28. Elixhauser A, Steiner C, Harris D R, Coffey R M (1998) Comorbidity measures for use with administrative data. Med Care:8–27

  29. Asratian A S, Denley Tristan MJ, Häggkvist R (1998) Bipartite graphs and their applications, vol 131. Cambridge university press

  30. Zweig K A, Kaufmann M (2011) A systematic approach to the one-mode projection of bipartite graphs. Soc Netw Anal Min 1(3):187–218

    Article  Google Scholar 

  31. Capobianco E et al (2013) Comorbidity: a multidimensional approach. Trends Mol Med 19 (9):515–521

    Article  Google Scholar 

  32. Goh K-I, Cusick M E, Valle D, Childs B, Vidal M, Barabási A-L (2007) The human disease network. Proc Natl Acad Sci 104(21):8685–8690

    Article  Google Scholar 

  33. Sandford AJ, Weir TD, Pare P D (1997) Genetic risk factors for chronic obstructive pulmonary disease. Eur Respir J 10(6):1380–1391

    Article  Google Scholar 

  34. Zhou T, Ren J, Medo M, Zhang Y-C (2007) Bipartite network projection and personal recommendation. Phys Rev E 76(4):046115

    Article  Google Scholar 

  35. Shaw M E (1954) Group structure and the behavior of individuals in small groups. J Psychol 38(1):139–149

    Article  Google Scholar 

  36. Bonacich P (1972) Factoring and weighting approaches to status scores and clique identification. J Math Sociol 2(1):113–120

    Article  Google Scholar 

  37. Freeman L C (1978) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239

    Article  Google Scholar 

  38. Holland P W, Leinhardt S (1971) Transitivity in structural models of small groups. Comp Group Stud 2(2):107–124

    Article  Google Scholar 

  39. Kavanagh A, Bentley R J, Turrell G, Shaw J, Dunstan D, Subramanian SV (2010) Socioeconomic position, gender, health behaviours and biomarkers of cardiovascular disease and diabetes. Soc Sci Med 71(6):1150–1160

    Article  Google Scholar 

  40. Agah A (2013) Medical applications of artificial intelligence, 1st edn. Taylor & Francis Group, Baton Rouge

    Book  Google Scholar 

  41. Kleinbaum D G, Dietz K, Gail M, Klein M, Klein M (2002) Logistic regression. Springer

  42. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13 (1):21–27

    Article  MATH  Google Scholar 

  43. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Article  MATH  Google Scholar 

  44. Lindley D V (1958) Fiducial distributions and bayes’ theorem. J R Stat Soc Ser B (Methodol) 20(1):102–107

    MathSciNet  MATH  Google Scholar 

  45. Quinlan J R (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Article  Google Scholar 

  46. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  47. Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794

  48. McCulloch W S, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bullet Math Biophys 5(4):115–133

    Article  MathSciNet  MATH  Google Scholar 

  49. Rumelhart D E, Hinton G E, Williams R J (1986) Learning representations by back-propagating errors. Nature 323(6088):533

    Article  MATH  Google Scholar 

  50. Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol 14, Montreal, pp 1137–1145

  51. Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  52. Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the International AAAI Conference on Web and Social Media, vol 3

  53. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  54. Chollet F et al (2015) Keras. https://keras.io

  55. Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980

  56. Mani S, Chen Y, Elasy T, Clayton W, Denny J (2012) Type 2 diabetes risk forecasting from emr data using machine learning. In: AMIA Ann Symp Proc, vol 2012. American Medical Informatics Association, p 606

  57. Yang J, Yao D, Zhan X, Zhan X (2014) Predicting disease risks using feature selection based on random forest and support vector machine. In: International Symposium on Bioinformatics Research and Applications. Springer, pp 1–11

  58. Altmann A, Toloşi L, Sander O, Lengauer T (2010) Permutation importance: a corrected feature importance measure. Bioinformatics 26(10):1340–1347

    Article  Google Scholar 

  59. Scornet E, Biau G, Vert J-P (2015) Consistency of random forests. Ann Stat 43(4):1716–1741

    Article  MathSciNet  MATH  Google Scholar 

  60. Pippitt K, Li M, Gurgle H E (2016) Diabetes mellitus: screening and diagnosis. Amer Family Phys 93(2):103–109

    Google Scholar 

  61. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I (2017) Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J 15:104–116

    Article  Google Scholar 

  62. Dinh A, Miertschin S, Young A, Mohanty S D (2019) A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med Inf Decis Making 19(1):1–15

    Google Scholar 

  63. Venugopala PS, Barh D, Ashwini B et al (2021) Artificial intelligence techniques for predicting type 2 diabetes. In: Advances in Artificial Intelligence and Data Engineering. Springer, pp 411–430

Download references

Author information

Authors and Affiliations

Authors

Contributions

HL: Writing, Data analysis and Research design; SU: Research design,Writing, Conceptualisation and Supervision; FH: Critical revision and Writing; MAM: Critical revision; and MK: Critical revision.

Corresponding author

Correspondence to Shahadat Uddin.

Ethics declarations

Conflict of Interests

The authors declare that they do not have any conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, H., Uddin, S., Hajati, F. et al. A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus. Appl Intell 52, 2411–2422 (2022). https://doi.org/10.1007/s10489-021-02533-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02533-w

Keywords

Navigation