Abstract
In recent years, the prevalence of chronic diseases such as type 2 diabetes mellitus (T2DM) has increased, bringing a heavy burden to healthcare systems. While regular monitoring of patients is expensive and impractical, understanding chronic disease progressions and identifying patients at risk of developing comorbidities are crucial. This research used a real-world administrative claim dataset of T2DM to develop an ensemble of innovative patient network and machine learning approach for disease prediction. The healthcare data of 1,028 T2DM patients and 1,028 non-T2DM patients are extracted from the de-identified data to predict the risk of T2DM. The proposed model is based on the ‘patient network’, which represents the underlying relationships among health conditions for a group of patients diagnosed with the same disease using the graph theory. Besides patients’ socio-demographic and behaviour characteristics, the attributes of the ‘patient network’ (e.g., centrality measure) discover patients’ latent features, which are effective in risk prediction. We apply eight machine learning models (Logistic Regression, K-Nearest Neighbours, Support Vector Machine, Naïve Bayes, Decision Tree, Random Forest, XGBoost and Artificial Neural Network) to the extracted features to predict the chronic disease risk. The extensive experiments show that the proposed framework with machine learning classifiers performance with the Area Under Curve (AUC) ranged from 0.79 to 0.91. The Random Forest model outperformed the other models; whereas, eigenvector centrality and closeness centrality of the network and patient age are the most important features for the model. The outstanding performance of our model provides promising potential applications in healthcare services. Also, we provide strong evidence that the extracted latent features are essential in the disease risk prediction. The proposed approach offers vital insight into chronic disease risk prediction that could benefit healthcare service providers and their stakeholders.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
World Health Organization (2020) Diabetes. https://www.who.int/news-room/fact-sheets/detail/diabetes. Accessed 8 March 2021
Hossain M E, Uddin S, Khan A (2021) Network analytics and machine learning for predictive risk modelling of cardiovascular disease in patients with type 2 diabetes. Expert Syst Appl 164:113918
Australian Institute of Health and Welfare (2021) Diabetes. https://www.aihw.gov.au/reports/diabetes/diabetes/contents/what-is-diabetes. Accessed 8 March 2021
Jermendy G (2005) Can type 2 diabetes mellitus be considered preventable?. Diabetes Res Clin Practice 68:S73– S81
Rathmann W, Haastert B, Icks A, Löwel H, Meisinger C, Holle R, Giani G (2003) High prevalence of undiagnosed diabetes mellitus in southern germany: target populations for efficient screening. the kora survey 2000. Diabetologia 46(2):182–189
Zhang L, Wang Y, Niu M, Wang C, Wang Z (2020) Machine learning for characterizing risk of type 2 diabetes mellitus in a rural chinese population: The henan rural cohort study. Sci Rep 10(1):1–10
Khan A, Uddin S, Srinivasan U (2019) Chronic disease prediction using administrative data and graph theory: The case of type 2 diabetes. Expert Syst Appl 136:230–241
Collins G S, Mallett S, Omar O, Yu L-M (2011) Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med 9(1):1–14
Fiorini S, Hajati F, Barla A, Girosi F (2019) Predicting diabetes second-line therapy initiation in the australian population via time span-guided neural attention network. PloS One 14(10):e0211844
Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G (2020) Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci Rep 10(1):1–12
Sahoo A K, Pradhan C, Das H (2020) Performance evaluation of different machine learning methods and deep-learning based convolutional neural network for health decision making. In: Nature inspired computing for data science. Springer, pp 201– 212
Heydari M, Teimouri M, Heshmati Z, Alavinia S M (2016) Comparison of various classification algorithms in the diagnosis of type 2 diabetes in iran. Int J Diabetes Dev Count 36(2):167– 173
Samant P, Agarwal R (2018) Machine learning techniques for medical diagnosis of diabetes using iris images. Comput Methods Program Biomed 157:121–128
Xiao Q, Dai J, Luo J, Fujita H (2019) Multi-view manifold regularized learning-based method for prioritizing candidate disease mirnas. Knowl-Based Syst 175:118–129
Butt AH, Rovini E, Fujita H, Maremmani C, Cavallo F (2020) Data-driven models for objective grading improvement of parkinson’s disease. Ann Biomed Eng 48(12):2976–2987
Zhang X, Yang Y, Li T, Zhang Y, Wang H, Fujita H (2021) Cmc: A consensus multi-view clustering model for predicting alzheimers disease progression. Comput Methods Prog Biomed 199:105895
Lei X, Tie J, Fujita H (2020) Relational completion based non-negative matrix factorization for predicting metabolite-disease associations. Knowl-Based Syst 204:106238
Uddin S, Khan A, Hossain M E, Moni M A (2019) Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inf Decis Making 19(1):1–16
Razavian N, Blecker S, Schmidt A M, Smith-McLallen A, Nigam S, Sontag D (2015) Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3(4):277–287
Barabsi A-L (2007) Network medicine - from obesity to the ‘diseasome’. England J Med 357 (4):404–407
Loscalzo J, Kohane I, Barabasi A-L (2007) Human disease classification in the postgenomic era: a complex systems approach to human pathobiology. Mol Syst Biol 3(1):124
Fotouhi B, Momeni N, Riolo M A, Buckeridge D L (2018) Statistical methods for constructing disease comorbidity networks from longitudinal inpatient data. Appl Netw Sci 3(1):1–34
Aguado A, Moratalla-Navarro F, López-Simarro F, Moreno V (2020) Morbinet: multimorbidity networks in adult general population. analysis of type 2 diabetes mellitus comorbidity. Sci Rep 10(1):1–12
Folino F, Pizzuti C, Ventura M (2010) A comorbidity network approach to predict disease risk. In: International Conference on Information Technology in Bio-and Medical Informatics. Springer, pp 102–109
World Health Organization (2020) International classification of diseases (ICD) information sheet. https://www.who.int/classifications/icd/factsheet/en/. Accessed 8 March 2021
The Australian Classification of Health Interventions (2020) ICD-10-AM. http://www.accd.net.au/icd-10-am-achi-acs/. Accessed 8 March 2021
Charlson M E, Pompei P, Ales K L, MacKenzie C R (1987) A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chron Diseas 40(5):373–383
Elixhauser A, Steiner C, Harris D R, Coffey R M (1998) Comorbidity measures for use with administrative data. Med Care:8–27
Asratian A S, Denley Tristan MJ, Häggkvist R (1998) Bipartite graphs and their applications, vol 131. Cambridge university press
Zweig K A, Kaufmann M (2011) A systematic approach to the one-mode projection of bipartite graphs. Soc Netw Anal Min 1(3):187–218
Capobianco E et al (2013) Comorbidity: a multidimensional approach. Trends Mol Med 19 (9):515–521
Goh K-I, Cusick M E, Valle D, Childs B, Vidal M, Barabási A-L (2007) The human disease network. Proc Natl Acad Sci 104(21):8685–8690
Sandford AJ, Weir TD, Pare P D (1997) Genetic risk factors for chronic obstructive pulmonary disease. Eur Respir J 10(6):1380–1391
Zhou T, Ren J, Medo M, Zhang Y-C (2007) Bipartite network projection and personal recommendation. Phys Rev E 76(4):046115
Shaw M E (1954) Group structure and the behavior of individuals in small groups. J Psychol 38(1):139–149
Bonacich P (1972) Factoring and weighting approaches to status scores and clique identification. J Math Sociol 2(1):113–120
Freeman L C (1978) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239
Holland P W, Leinhardt S (1971) Transitivity in structural models of small groups. Comp Group Stud 2(2):107–124
Kavanagh A, Bentley R J, Turrell G, Shaw J, Dunstan D, Subramanian SV (2010) Socioeconomic position, gender, health behaviours and biomarkers of cardiovascular disease and diabetes. Soc Sci Med 71(6):1150–1160
Agah A (2013) Medical applications of artificial intelligence, 1st edn. Taylor & Francis Group, Baton Rouge
Kleinbaum D G, Dietz K, Gail M, Klein M, Klein M (2002) Logistic regression. Springer
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13 (1):21–27
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Lindley D V (1958) Fiducial distributions and bayes’ theorem. J R Stat Soc Ser B (Methodol) 20(1):102–107
Quinlan J R (1986) Induction of decision trees. Mach Learn 1(1):81–106
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
McCulloch W S, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bullet Math Biophys 5(4):115–133
Rumelhart D E, Hinton G E, Williams R J (1986) Learning representations by back-propagating errors. Nature 323(6088):533
Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol 14, Montreal, pp 1137–1145
Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874
Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: Proceedings of the International AAAI Conference on Web and Social Media, vol 3
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830
Chollet F et al (2015) Keras. https://keras.io
Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Mani S, Chen Y, Elasy T, Clayton W, Denny J (2012) Type 2 diabetes risk forecasting from emr data using machine learning. In: AMIA Ann Symp Proc, vol 2012. American Medical Informatics Association, p 606
Yang J, Yao D, Zhan X, Zhan X (2014) Predicting disease risks using feature selection based on random forest and support vector machine. In: International Symposium on Bioinformatics Research and Applications. Springer, pp 1–11
Altmann A, Toloşi L, Sander O, Lengauer T (2010) Permutation importance: a corrected feature importance measure. Bioinformatics 26(10):1340–1347
Scornet E, Biau G, Vert J-P (2015) Consistency of random forests. Ann Stat 43(4):1716–1741
Pippitt K, Li M, Gurgle H E (2016) Diabetes mellitus: screening and diagnosis. Amer Family Phys 93(2):103–109
Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I (2017) Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J 15:104–116
Dinh A, Miertschin S, Young A, Mohanty S D (2019) A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med Inf Decis Making 19(1):1–15
Venugopala PS, Barh D, Ashwini B et al (2021) Artificial intelligence techniques for predicting type 2 diabetes. In: Advances in Artificial Intelligence and Data Engineering. Springer, pp 411–430
Author information
Authors and Affiliations
Contributions
HL: Writing, Data analysis and Research design; SU: Research design,Writing, Conceptualisation and Supervision; FH: Critical revision and Writing; MAM: Critical revision; and MK: Critical revision.
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they do not have any conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, H., Uddin, S., Hajati, F. et al. A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus. Appl Intell 52, 2411–2422 (2022). https://doi.org/10.1007/s10489-021-02533-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02533-w