Abstract
With the rapid advancement of technology and the ever increasing amount of data in the healthcare domain, big data analytics has become a significant study area. Analyzing patterns in patient treatment for the early detection and diagnosis of diseases can improve overall healthcare quality. Machine learning has emerged as a promising technology for aiding clinicians in making accurate diagnosis decisions. In this paper, we aim to propose an approach through the feature selection technique and employing various ML algorithms such as GBDT, NB, K-NN, SVM, LR, RF, and DT that will identify the subset of features relevant to the prediction of diabetes disease. The performance of each algorithm is evaluated using the Pima Indians Diabetes Dataset and Korean National Health and Nutrition Dataset. Experimental results show that the GBDT algorithm performs the best in predicting the disease with the highest accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sun, Y., Zhang, D.: Machine learning techniques for screening and diagnosis of diabetes: a survey. Teh. Vjesn. 26, 872–880 (2019)
Ndisang, J.F., Vannacci, A., Rastogi, S.: Insulin resistance, type 1 and type 2 diabetes, and related complications 2017. J. Diabetes Res. 2017, e1478294 (2017). [PubMed]
Malik, S., Harous, S., El-Sayed, H.: Comparative analysis of machine learning algorithms for early prediction of diabetes mellitus in women. In: Chikhi, S., Amine, A., Chaoui, A., Saidouni, D.E., Kholladi, M.K. (eds.) MISC 2020. LNNS, vol. 156, pp. 95–106. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-58861-8_7
Himsworth, H.P., Kerr, R.B.: Insulin-sensitive and insulin-insensitive types of diabetes mellitus. Clin. Sci. 4, 119–152 (1939)
World Health Organization, 2020 World Health Organization. https://www.who.int/news-room/fact-sheets/detail/diabetes. Accessed 8 June 2020
Theera-Umpon, N., Poonkasem, I., Auephanwiriyakul, S., Patikulsila, D.: Hard exudate detection in retinal fundus images using supervised learning. Neural Comput. Appl. 32(17), 13079–13096 (2019). https://doi.org/10.1007/s00521-019-04402-7
Afzali, S., Yildiz, O.: An effective sample preparation method for diabetes prediction. Int. Arab J. Inf. Technol. 15(6), 968–973 (2018)
Jaiswal, V., Negi, A., Pal, T.: A review on current advances in machine learning based diabetes prediction. Prim. Care Diabetes 15, 435–443 (2021)
Tariq, H., Rashid, M., Javed, A., Zafar, E., Alotaibi, S.S., Zia, M.Y.I.: Performance analysis of deep-neural-network-based automatic diagnosis of diabetic retinopathy. Sensors 22, 205 (2022)
Kumar, D., et al.: Automatic detection of white blood cancer from bone marrow microscopic images using convolutional neural networks. IEEE Access 8, 142521–142531 (2020)
Khaleel, F.A., Al-Bakry, A.M.:Diagnosis of diabetes using machine learning algorithms. Mater. Today: Proc. (2021)
Saxena, R., Sharma, S.K., Gupta, M., Sampada, G.C.: A comprehensive review of various diabetic prediction models: a literature survey. J. Healthc. Eng. 2022, e8100697 (2022). [PubMed]
Chatrati, S.P., et al.: Smart home health monitoring system for predicting type 2 diabetes and hypertension. J. King Saud Univ.—Comput. Inf. Sci. 34, 862–870 (2020)
Goyal, P., Jain, S.: Prediction of type-2 diabetes using classification and ensemble method approach. In: Proceedings of the 2022 International Mobile and Embedded Technology Conference (MECON), Noida, India, pp. 658–665, 10–11 March 2022
Prakash, A.: An ensemble technique for early prediction of type 2 diabetes mellitus—a normalization approach. Turk. J. Comput. Math. Educ. 12, 9 (2021)
Chang, V., et al.: Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms.Neural Comput. Appl., 1-17 (2022)
Jackins, V., Vimal, S., Kaliappan, M., Lee, M.Y.: AI-based smart prediction of clinical disease using random forest classifier and Naive Bayes. J. Supercomput. 77(5), 5198–5219 (2020). https://doi.org/10.1007/s11227-020-03481-x
Sneha, N., Tarun, G.: Analysis of diabetes mellitus for early prediction using optimal feature selection. J. Big data 6, 3 (2019)
Kamrul Hasan, M., Ashraful Alam, M., Das, D., Hussain, E., Hasan, M.: Diabetes prediction using ensembling of different machine learning classifiers.IEEE Access 8 (2020). Article ID: 76531
Saxena, R., Sharma, S.K., Gupta, M., Sampada, G.C.: A novel approach for feature selection and classification of diabetes mellitus: machine learning methods. Comput. Intell. Neurosci. 2022, e3820360 (2022)
Korea Centers for Disease Control and Prevention. https://knhanes.kdca.go.kr/knhanes/sub03/sub03_02_05.do
Khaire, U.M., Dhanalakshmi, R.: Stability of feature selection algorithm: a review. J. King Saud Univ. Comput. Inf. Sci. (2019)
Gao, Z., Xu, Y., Meng, F., Qi, F., Lin, Z.: Improved information gain-based feature selection for text categorization. In: Proceedings of the 2014 4th International Conference on Wireless Communications, Vehicular Technology, Information Theory and Aerospace Electronic Systems (VITAE), IEEE, Aalborg, Denmark, pp. 1–5, 11–14 May 2014
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. 50, 1–45 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kang, Ia., Ngnamsie Njimbouom, S., Kim, JD. (2023). An Effective Feature Selection for Diabetes Prediction. In: Kotsis, G., et al. Database and Expert Systems Applications - DEXA 2023 Workshops. DEXA 2023. Communications in Computer and Information Science, vol 1872. Springer, Cham. https://doi.org/10.1007/978-3-031-39689-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-39689-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39688-5
Online ISBN: 978-3-031-39689-2
eBook Packages: Computer ScienceComputer Science (R0)