Abstract
Healthcare is evolving from standard to personalized, driven by the patients’ needs. Personalized healthcare is a medical model based on genetics, genomics, and other biological information that helps to predict risk for disease. To date, machine learning and data mining are the fastest-growing healthcare field used to classify patient cohorts from a large dataset and its application for diabetes subtyping will be a breakthrough. In this review paper, we have identified, analyzed, and summarized how previous studies distinguished diabetes into subtypes besides implementing the methods for diabetes subtyping using data mining and various clustering algorithms. We have discovered that many studies have suggested diabetes can be differentiated into subtypes clinically based on the risk complications, genetically defined, using clinical features, and for treatment selection. As for clustering algorithms, k-means clustering and hierarchical clustering were shown to be widely used in determining sub-clusters of diabetes. To further investigate diabetes subtyping, understanding the specific objective and method of diabetes subtyping using clustering algorithms from a large dataset will be crucial which could contribute to novel knowledge and improvement for diabetes management.
Similar content being viewed by others
References
Ahlqvist E, Storm P, Käräjämäki A, Martinell M, Dorkhan M, Carlsson A, Vikman P, Prasad RB, Aly DM, Almgren P (2017) ‘Clustering of adult-onset diabetes into novel subgroups guides therapy and improves prediction of outcome’, bioRxiv, 186387
Ahlqvist E, Storm P, Käräjämäki A, Martinell M, Dorkhan M, Carlsson A, Vikman P, Prasad RB, Aly DM, Almgren P, Wessman Y, Shaat N, Spégel P, Mulder H, Lindholm E, Melander O, Hansson O, Malmqvist U, Lernmark Ã, Lahti K, Forsén T, Tuomi T, Rosengren AH, Groop L (2018) ‘Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables’, The Lancet Diabetes and Endocrinology,
Alamsyah M, Nafisah Z, Prayitno E, Afida A, Imah E (2018) The classification of diabetes mellitus using Kernel k-means. Journal of Physics: Conference Series. 2018. 012003
Aris T, Yusoff M, Abd Ghani MF, Ahmad AA, Omar NA, Guat Hiong MA, Hasri TMohd, Radzi NHM, Manan NF, Kamaruddin NA (2015) National Health & Morbidity Survey 2015 Non-communicable diseases, risk factors and other health problems. National Institutes of Health, Ministry of Health Malaysia, Kuala Lumpur
Bancks MP, Bertoni AG, Carnethon M, Chen H, Cotch MF, Gujral UP, Herrington D, Kanaya AM, Szklo M, Vaidya D (2021) ‘Association of diabetes subgroups with race/ethnicity, risk factor burden and complications: the MASALA and MESA studies’. J Clin Endocrinol Metabolism 106:e2106–e2115
Berkhin P (2006) A survey of clustering data mining techniques. Grouping multidimensional data. Springer
Carruth L, Chard S, Howard HA, Manderson L, Mendenhall E, Vasquez E, Yates-Doerr E (2019) ‘Disaggregating diabetes: New subtypes, causes, and care’, Medicine Anthropology Theory | An open-access journal in the anthropology of health, illness, and medicine,
Cho SB, Kim SC, Chung MG (2019) ‘Identification of novel population clusters with different susceptibilities to type 2 diabetes and their impact on the prediction of diabetes’. Sci Rep 9:1–9
Collins FS, Varmus H (2015) ‘A new initiative on precision medicine’. N Engl J Med 372:793–795
Dahl A, Cai N, Ko A, Laakso M, Pajukanta P, Flint J, Zaitlen N (2019) ‘Reverse GWAS: Using genetics to identify and model phenotypic subtypes’,PLoS Genetics
Dennis JM (2020) ‘Precision medicine in type 2 diabetes. Using individualized prediction models to optimize selection of treatment’, Diabetes
Dennis JM, Shields BM, Henley WE, Jones AG, Hattersley AT (2019) ‘Disease progression and treatment response in data-driven subgroups of type 2 diabetes compared with models based on simple clinical features: an analysis using clinical trial data’. The Lancet Diabetes and Endocrinology 7:442–451
Devasena MG, Grace RK, Gopu G (2020) PDD: Predictive Diabetes Diagnosis using Datamining Algorithms. 2020 International Conference on Computer Communication and Informatics (ICCCI). 2020. 1–4
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) ‘From data mining to knowledge discovery in databases’. AI magazine 17:37–37
Fedotkina O, Sulaieva O, Ozgumus T, Cherviakova L, Khalimon N, Svietleisha T, Buldenko T, Ahlqvist E, Asplund O, Groop L (2021) ‘Novel reclassification of adult diabetes is useful to distinguish stages of β-cell function linked to the risk of vascular complications: the DOLCE study from northern Ukraine’. Front Genet 12:1114
Fiarni C, Sipayung EM, Maemunah S (2019) ‘Analysis and prediction of diabetes complication disease using data mining algorithm’. Procedia Comput Sci 161:449–457
Fitipaldi H, McCarthy MI, Florez JC, Franks PW (2018) ‘A global overview of precision medicine in type 2 diabetes’. Diabetes 67:1911–1922
García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer
Greene J (2018) Dividing Diabetes by Cluster Instead of Types. Available: https://www.managedcaremag.com/archives/2018/6/dividing-diabetes-cluster-instead-types
Hattersley AT, Patel KA (2017) Precision diabetes: learning from monogenic diabetes.Diabetologia.
Hu J, Perer A, Wang F (2016) Data driven analytics for personalized healthcare. Healthcare Information Management Systems. Springer
IDF IDF (2019) IDF Diabetes Atlas 9th Edition 2019
Irani J, Pise N, Phatak M (2016) ‘Clustering techniques and the similarity measures used in clustering: A survey’. Int J Comput Appl 134:9–14
Jee K, Kim G-H (2013) ‘Potentiality of big data in the medical sector: focus on how to reshape the healthcare system’. Healthc Inf Res 19:79–85
Kahkoska AR, Geybels MS, Klein KR, Kreiner FF, Marx N, Nauck MA, Pratley RE, Wolthers BO, Buse JB (2020) ‘Validation of distinct type 2 diabetes clusters and their association with diabetes complications in the DEVOTE, LEADER and SUSTAIN-6 cardiovascular outcomes trials’. Diabetes Obes Metabolism 22:1537–1547
Kamaruddin NA (2015) Clinical Practice Guidelines: Management of Type 2 Diabetes Mellitus. 5th Edition. Ministry of Health Malaysia
Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I (2017) ‘Machine learning and data mining methods in diabetes research’. Comput Struct Biotechnol J 15:104–116
Kesavadev J, Sadikot SM, Saboo B, Shrestha D, Jawad F, Azad K, Wijesuriya MA, Latt TS, Kalra S (2014) ‘Challenges in Type 1 diabetes management in South East Asia: Descriptive situational assessment’. Indian J Endocrinol Metabol 18:600
Khan K, Rehman SU, Aziz K, Fong S, Sarasvady S (2014) DBSCAN: Past, present and future. The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014). 2014. 232–238
Kumar VB, Vijayalakshmi K, Padmavathamma M (2020)’A Hybrid Data Mining Approach for Diabetes Prediction and Classification’
Kuwil FH, Atila Ü, Abu-Issa R, Murtagh F (2020) ‘A novel data clustering algorithm based on gravity center methodology’. Expert Syst Appl 156:113435
Lee J, Maslove DM, Dubin JA (2015) ‘Personalized mortality prediction driven by electronic medical data and a patient similarity metric’. PLoS ONE 10:e0127428
Li L, Cheng WY, Glicksberg BS, Gottesman O, Tamler R, Chen R, Bottinger EP, Dudley JT (2015) ‘Identification of type 2 diabetes subgroups through topological analysis of patient similarity’. Sci Transl Med 7:1–16
Madhulatha TS (2012) ‘An overview on clustering methods’, arXiv preprint arXiv:1205.1117,
Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM, Rayner NW, Payne AJ, Steinthorsdottir V, Scott RA, Grarup N, Cook JP, Schmidt EM, Wuttke M, Sarnowski C, Mägi R, Nano J, Gieger C, Trompet S, Lecoeur C, Preuss MH, Prins BP, Guo X, Bielak LF, Below JE, Bowden DW, Chambers JC, Kim YJ, Ng MCY, Petty LE, Sim X, Zhang W, Bennett AJ, Bork-Jensen J, Brummett CM, Canouil M, Ec kardt KU, Fischer K, Kardia SLR, Kronenberg F, Läll K, Liu CT, Locke AE, Luan Ja, Ntalla I, Nylander V, Schönherr S, Schurmann C, Yengo L, Bottinger EP, Brandslund I, Christensen C, Dedoussis G, Florez JC, Ford I, Franco OH, Frayling TM, Giedraitis V, Hackinger S, Hattersley AT, Herder C, Ikram MA, Ingelsson M, Jørgensen ME, Jørgensen T, Kriebel J, Kuusisto J, Ligthart S, Lindgren CM, Linneberg A, Lyssenko V, Mamakou V, Meitinger T, Mohlke KL, Morris AD, Nadkarni G, Pankow JS, Peters A, Sattar N, Stančáková A, Strauch K, Taylor KD, Thorand B, Thorleifsson G, Thorsteinsdottir U, Tuomilehto J, Witte DR, Dupuis J, Peyser PA, Zeggini E, Loos RJF, Froguel P, Ingelsson E, Lind L, Groop L, Laakso M, Collins FS, Jukema JW, Palmer H, Metspalu, A., et al (2018) ‘Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps’, Nature Genetics,
Mahayidin H, Zakariah SZ, Ishah NA, Wee XA, Mohamed N, Nor MA ‘Diabetes-Associated Autoantibodies Among Young Diabetes Mellitus Patients in Malaysia’,Age, 19, 8.413a.
Marso SP, Bain SC, Consoli A, Eliaschewitz FG, Jódar E, Leiter LA, Lingvay I, Rosenstock J, Seufert J, Warren ML (2016a) ‘Semaglutide and cardiovascular outcomes in patients with type 2 diabetes’. N Engl J Med 375:1834–1844
Marso SP, Daniels GH, Brown-Frandsen K, Kristensen P, Mann JF, Nauck MA, Nissen SE, Pocock S, Poulter NR, Ravn LS (2016b) ‘Liraglutide and cardiovascular outcomes in type 2 diabetes’. N Engl J Med 375:311–322
Marso SP, McGuire DK, Zinman B, Poulter NR, Emerson SS, Pieber TR, Pratley RE, Haahr P-M, Lange M, Brown-Frandsen K (2017) ‘Efficacy and safety of degludec versus glargine in type 2 diabetes’. N Engl J Med 377:723–732
McCarthy MI (2017) Painting a new picture of personalised medicine for diabetes.Diabetologia.
Mehta P (2019) Deconstructing complex diseases: identification of new phenotypical sub-clusters of Type 2 diabetes using machine learning
Miotto R, Li L, Kidd BA, Dudley JT (2016) ‘Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records’. Scientific Reports
Ng K, Sun J, Hu J, Wang F (2015) ‘Personalized predictive modeling and risk factor identification using patient similarity’, AMIA Summits on Translational Science Proceedings, 2015, 132
Nilashi M, Ibrahim O, Dalvi M, Ahmadi H, Shahmoradi L (2017) ‘Accuracy improvement for diabetes disease classification: a case on a public medical dataset’. Fuzzy Inform Eng 9:345–357
Nilashi MI, Othman, Mardani A, Ahani A, Jusoh A (2018) ‘A soft computing approach for diabetes disease classification’. Health Inf J 24:379–393
Nithya R, Manikandan P, Ramyachitra D (2015) ‘Analysis of clustering technique for the diabetes dataset using the training set parameter’. Int J Adv Res Comput Communication Eng 4:166–169
Panahiazar M, Taslimitehrani V, Pereira NL, Pathak J (2015) ‘Using EHRs for heart failure therapy recommendation using multidimensional patient similarity analytics’. Stud Health Technol Inform 210:369
Patel S, Patel H (2016) ‘Survey of data mining techniques used in healthcare domain’. Int J Inform 6:53–60
Pearson ER (2019) Type 2 diabetes: a multifaceted disease.Diabetologia.
Raihan M, Islam MT, Farzana F, Raju MGM, Mondal HS (2019) An Empirical Study to Predict Diabetes Mellitus using K-Means and Hierarchical Clustering Techniques. 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). 2019. 1–6
Rokach L, Maimon O (2005) Clustering methods. Data mining and knowledge discovery handbook. Springer
Saravananathan K, Velmurugan T (2018) ‘Cluster based performance analysis for Diabetic data’. Int J Pure Appl Math 119:399–410
Sharafoddini A, Dubin JA, Lee J (2017) ‘Patient similarity in prediction models based on health data: a scoping review’. JMIR Med Inf 5:e7
Sheet MM, Khudhair HAA (2019) ‘Beta-cell Death and/or Stress Biomarkers in Diabetes Mellitus Type’,Al-Kufa University Journal for Biology, 11
Shirkhorshidi AS, Aghabozorgi S, Wah TY (2015) ‘A comparison study on similarity and dissimilarity measures in clustering continuous data’. PLoS ONE 10:e0144059
Slieker RC, Donnelly LA, Fitipaldi H, Bouland GA, Giordano GN, Åkerlund M, Gerl MJ, Ahlqvist E, Ali A, Dragan I (2021) ‘Replication and cross-validation of type 2 diabetes subtypes based on clinical variables: an IMI-RHAPSODY study’, Diabetologia, 1–8
Srinivasan U, Arunasalam B (2013) ‘Leveraging big data analytics to reduce healthcare costs’. IT Prof 15:21–28
Sujatha DC, Kumar DM, Peter MC (2018) ‘Building predictive model for diabetics data using k means algorithm’. Int J Manage IT Eng 8:58–65
Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G (2018) ‘Data Processing and Text Mining Technologies on Electronic Medical Records: A Review’, Journal of Healthcare Engineering, 2018
Sun W, Cai Z, Liu F, Fang S, Wang G (2017) A survey of data mining technology on electronic medical records. e-Health Networking, Applications and Services (Healthcom), 2017 IEEE 19th International Conference on. 2017. 1–6
Tanabe H, Saito H, Kudo A, Machii N, Hirai H, Maimaituxun G, Tanaka K, Masuzaki H, Watanabe T, Asahi K (2020) ‘Factors associated with risk of diabetic complications in novel cluster-based diabetes subgroups: a Japanese retrospective cohort study’, Journal of clinical medicine, 9, 2083
Tooke J, Lundgren J, Trembath R, Iredale J (2015) Stratified, personalised or P4 medicine: a new direction for placing the patient at the centre of healthcare and health education. The Academy of Medical Sciences. 2015. 37
Udler MS, Kim J, Grotthuss Mv, Bonàs-Guarch S, Mercader JM, Cole JB, Chiou J, Anderson CD, Boehnke M, Laakso M, Atzmon G, Glaser B, Gaulton K, Flannick J, Getz G, Florez JC (2018) ‘Clustering of Type 2 Diabetes Genetic Loci by Multi-Trait Associations Identifies Disease Mechanisms and Subtypes’, bioRxiv, 319509
van Smeden M, Harrell FE, Dahly DL (2018) ‘Novel diabetes subgroups’. The Lancet Diabetes and Endocrinology 6:439–440
Venkatachalam MG, M (2015) ‘Performance analysis of clustering algorithms for diabetes data’. Int J Appl Eng Res 10:38014–38017
Vijayakumar R, Arjunan KP, Sivasakthi M, Lakshmanan K (2019) ‘Diabetes Prediction by Machine Learning over Big Data from Healthcare Communities’,Diabetes, 6
Wang F, Sun J (2015) ‘PSF: a unified patient similarity evaluation framework through metric learning with weak supervision’. IEEE J biomedical health Inf 19:1053–1060
WHO, W.H.O (2016) Diabetes country profiles 2016.
WHO, W.H.O (2021) Diabetes [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/diabetes
Wu H, Yang S, Huang Z, He J, Wang X (2018) ‘Type 2 diabetes mellitus prediction model based on data mining’. Inf Med Unlocked 10:100–107
Yardimci A (2009) ‘Soft computing in medicine’. Appl Soft Comput 9:1029–1043
Yeow TP, Aun ES-Y, Hor CP, Lim SL, Khaw CH, Aziz NA (2019) ‘Challenges in the classification and management of Asian youth-onset diabetes mellitus-lessons learned from a single centre study’,PloS one,14, e0211210
Yildirim P, Birant D (2017) ‘K-linkage: A new agglomerative approach for hierarchical clustering’. Adv Electr Comput Eng 17:77–88
Zaharia OP, Strassburger K, Strom A, Bönhof GJ, Karusheva Y, Antoniou S, Bódis K, Markgraf DF, Burkart V, Müssig K, Hwang JH, Asplund O, Groop L, Ahlqvist E, Seissler J, Nawroth P, Kopf S, Schmid SM, Stumvoll M, Pfeiffer AFH, Kabisch S, Tselmin S, Häring HU, Ziegler D, Kuss O, Szendroedi J, Roden M, Belgardt BF, Buyken A, Eckel J, Geerling G, Al-Hasani H, Herder C, Icks A, Kotzka J, Lammert E, Markgraf D, Rathmann W (2019) ‘Risk of diabetes-associated diseases in subgroups of patients with recent-onset diabetes: a 5-year follow-up study’, The Lancet Diabetes and Endocrinology,
Zou X, Zhou X, Zhu Z, Ji L (2019) Novel subgroups of patients with adult-onset diabetes in Chinese and US populations. The Lancet Diabetes and Endocrinology
Acknowledgements
This work was supported by the Universiti Teknologi Malaysia under the Fundamental Research Grant Scheme (FRGS) of Ministry of Science, Technology and Innovation (MOSTI) with reference number of 5F364 and the Ministry of Higher Education Malaysia with reference number 07G22.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Omar, N., Nazirun, N.N., Vijayam, B. et al. Diabetes subtypes classification for personalized health care: A review. Artif Intell Rev 56, 2697–2721 (2023). https://doi.org/10.1007/s10462-022-10202-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-022-10202-8