skip to main content
10.1145/3424978.3425085acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaeConference Proceedingsconference-collections
research-article

Feature Selection and Prediction Model for Type 2 Diabetes in the Chinese Population with Machine Learning

Published: 20 October 2020 Publication History

Abstract

Diabetes is a chronic disease characterized by hyperglycemia. Based on the rising incidence of the disease in recent years, diabetes is affecting more and more families. In 2017 alone, it caused 5 million deaths and cost $850 billion in global healthcare. In this paper, we proposed a method to predict the prevalence of diabetes based on a selected set of features from physical examination data. We used Fisher's score, RFE and decision tree to select features. Random forest, logistic regression, SVM and MLP were used to predict the prevalence of diabetes. EA and Fisher' s score helped us to reduce dimensions. We used random forest to classify diabetes accurately. Our results show that the highest accuracy (0.987) can be achieved by using random forest with 85 features. The prediction accuracy using Fisher's Score with 19 features also reached 0.986. We finally selected 5 features based on our method to form a new dataset for diabetes prediction. The 5 features are fasting plasma glucose, HbA1c, HDL, total cholesterol level and hypertension. The values of accuracy, precision, sensitivity, F1 score, MCC and AUC were 0.977, 0.968, 0.812, 0.883, 0.875, and 0.905, respectively. Results show that our method can be successfully used to select features for diabetes classifier and improve its performance, which will provide support for clinicians to quickly identify diabetes.

References

[1]
Chatterjee Sudesna, Kamlesh Khunti, and Melanie J Davies (2017). Type 2 diabetes. The Lancet, 389(10085), 2239--2251.
[2]
Li Yongze, et al. (2020). Prevalence of diabetes recorded in mainland China using 2018 diagnostic criteria from the American Diabetes Association: national cross sectional study. Bmj, 369.
[3]
Cho N H, et al. (2018). IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes research and clinical practice, 138, 271--281.
[4]
Han Jiawei, Jian Pei and Micheline Kamber (2011). Data mining: concepts and techniques. Elsevier.
[5]
Witten Ian H and Eibe Frank (2002). Data mining: practical machine learning tools and techniques with Java implementations. ACM Sigmod Record, 31(1), 76--77.
[6]
Alghamdi Manal, et al. (2017). Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project. PloS one, 12(7), e0179805.
[7]
Zou Quan, et al. (2018). Predicting diabetes mellitus with machine learning techniques. Frontiers in genetics, 9, 515.
[8]
Lee Bum Ju and Jong Yeol Kim (2015). Identification of type 2 diabetes risk factors using phenotypes consisting of anthropometry and triglycerides based on machine learning. IEEE journal of biomedical and health informatics, 20(1), 39--46.
[9]
Kavakiotis Ioannis, et al. (2017). Machine learning and data mining methods in diabetes research. Computational and structural biotechnology journal, 15, 104--116.
[10]
Han Longfei, et al. (2014). Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE journal of biomedical and health informatics, 19(2), 728--734.
[11]
Patil Bankat M, Ramesh Chandra Joshi and Durga Toshniwal (2010). Hybrid prediction model for type-2 diabetic patients. Expert systems with applications, 37(12), 8102--8108.
[12]
Wu Han, et al. (2018). Type 2 diabetes mellitus prediction model based on data mining. Informatics in Medicine Unlocked, 10, 100--107.
[13]
Zhu Changsheng, Christian Uwa Idemudia and Wenfang Feng (2019). Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Informatics in Medicine Unlocked, 17, 100179.
[14]
Nguyen Binh P, et al. (2019). Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. Computer methods and programs in biomedicine, 182, 105055.
[15]
Saaristo Timo, et al. (2005). Cross-sectional evaluation of the Finnish Diabetes Risk Score: a tool to identify undetected type 2 diabetes, abnormal glucose tolerance and metabolic syndrome. Diabetes and vascular disease research, 2(2), 67--72.
[16]
Dong Jianjun, et al. (2009). Evaluation of various questionnaires for screening diabetes mellitus in Chinese population. Chinese Journal of Endocrinology and Metabolism, 25(1), 64--65.
[17]
Pei Dongmei, et al. (2019). Identification of potential type II diabetes in a chinese population with a sensitive decision tree approach. Journal of diabetes research 2019.
[18]
Chinese, Diabetes Society (2018). National guidelines for the prevention and control of diabetes in primary care. Zhonghua nei ke za zhi, 57(12), 885.
[19]
Xue Bing, et al. (2015). A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation, 20(4), 606--626.
[20]
Tan Mingkui, Ivor W Tsang and Li Wang (2014). Towards ultrahigh dimensional feature selection for big data. Journal of Machine Learning Research.
[21]
Ahmed Soha, Mengjie Zhang and Lifeng Peng (2013). Enhanced feature selection for biomarker discovery in LC-MS data using GP. 2013 IEEE Congress on Evolutionary Computation, IEEE, 2013.
[22]
Vikhar Pradnya A (2016). Evolutionary algorithms: A critical review and its future prospects. 2016 International conference on global trends in signal processing, information computing and communication (ICGTSPICC), IEEE, 2016.
[23]
Duda Richard O, Peter E Hart and David G Stork (2012). Pattern classification. John Wiley & Sons.
[24]
He Xiaofei, Deng Cai and Partha Niyogi (2006). Laplacian score for feature selection. Advances in neural information processing systems.
[25]
Li Jundong, et al. (2017). Feature selection: A data perspective. ACM Computing Surveys (CSUR), 50(6) 1--45.
[26]
Guyon Isabelle, et al. (2002). Gene selection for cancer classification using support vector machines. Machine learning, 46(1-3), 389--422.
[27]
Quinlan J Ross (1986). Induction of decision trees. Machine learning, 1(1), 81--106.
[28]
Song Yan-Yan and L U Ying (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry, 27(2), 130.
[29]
Hastie Trevor, Robert Tibshirani and Jerome Friedman (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media 2009.
[30]
World Health Organization (2011). Use of glycated haemoglobin (HbA1c) in diagnosis of diabetes mellitus: abbreviated report of a WHO consultation. No. WHO/NMH/CHP/CPM/11.1. World Health Organization 2011.
[31]
Rohlfing Curt L, et al. (2000). Use of GHb (HbA1c) in screening for undiagnosed diabetes in the US population. Diabetes care, 23(2), 187--191.
[32]
Bennett C M, M Guo and S C Dharmage (2007). HbA1c as a screening tool for detection of type 2 diabetes: a systematic review. Diabetic medicine, 24(4), 333--343.
[33]
World Health Organization (2006). Definition and diagnosis of diabetes mellitus and intermediate hyperglycaemia: report of a WHO/IDF consultation.
[34]
Willems James P, et al. (1997). Prevalence of coronary heart disease risk factors among rural blacks: a community-based study. Southern medical journal, 90(8), 814--820.
[35]
Ijaz Muhammad Fazal, et al. (2018). Hybrid prediction model for type 2 diabetes and hypertension using dbscan-based outlier detection, synthetic minority over sampling technique (SMOTE), and random forest. Applied Sciences, 8(8), 1325.

Cited By

View all
  • (2024)Critical Factor Analysis for prediction of Diabetes Mellitus using an Inclusive Feature Selection StrategyApplied Artificial Intelligence10.1080/08839514.2024.233191938:1Online publication date: Apr-2024
  • (2024)An efficient classification framework for Type 2 Diabetes incorporating feature interactionsExpert Systems with Applications10.1016/j.eswa.2023.122138239(122138)Online publication date: Apr-2024
  • (2024)Type-2 Diabetes Mellitus Prediction Through Ensemble Learning Technique Based on Gene Data and Machine Learning ApproachICT for Intelligent Systems10.1007/978-981-97-6675-8_47(565-576)Online publication date: 29-Oct-2024
  • Show More Cited By

Index Terms

  1. Feature Selection and Prediction Model for Type 2 Diabetes in the Chinese Population with Machine Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CSAE '20: Proceedings of the 4th International Conference on Computer Science and Application Engineering
    October 2020
    1038 pages
    ISBN:9781450377720
    DOI:10.1145/3424978
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Classification
    2. Diabetes mellitus
    3. Feature selection
    4. Machine learning
    5. Random forest

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    CSAE 2020

    Acceptance Rates

    CSAE '20 Paper Acceptance Rate 179 of 387 submissions, 46%;
    Overall Acceptance Rate 368 of 770 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Critical Factor Analysis for prediction of Diabetes Mellitus using an Inclusive Feature Selection StrategyApplied Artificial Intelligence10.1080/08839514.2024.233191938:1Online publication date: Apr-2024
    • (2024)An efficient classification framework for Type 2 Diabetes incorporating feature interactionsExpert Systems with Applications10.1016/j.eswa.2023.122138239(122138)Online publication date: Apr-2024
    • (2024)Type-2 Diabetes Mellitus Prediction Through Ensemble Learning Technique Based on Gene Data and Machine Learning ApproachICT for Intelligent Systems10.1007/978-981-97-6675-8_47(565-576)Online publication date: 29-Oct-2024
    • (2023)An effective feature selection method for type 2 diabetes mellitus detection using gene expression dataIntelligent Decision Technologies10.3233/IDT-22007717:3(595-606)Online publication date: 31-Jul-2023
    • (2023)The Effect of Feature Selection on Diabetes Prediction Using Machine Learning2023 IEEE Symposium on Computers and Communications (ISCC)10.1109/ISCC58397.2023.10218243(1-7)Online publication date: 9-Jul-2023
    • (2023)Analysis of Machine Learning and Deep Learning for Diabetes Diagnosis in Design Thinking2023 International Conference on Energy, Materials and Communication Engineering (ICEMCE)10.1109/ICEMCE57940.2023.10434016(1-7)Online publication date: 14-Dec-2023
    • (2023)Diabetes detection based on machine learning and deep learning approachesMultimedia Tools and Applications10.1007/s11042-023-16407-583:8(24153-24185)Online publication date: 10-Aug-2023
    • (2021)Multi-Tier Ensemble Learning Model With Neighborhood Component Analysis to Predict Health DiseasesIEEE Access10.1109/ACCESS.2021.31179639(138677-138715)Online publication date: 2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media