ABSTRACT
Using machine learning methods based on relevant physiological indicators for disease prediction has become a relatively mature and widely applied technique. Diabetes is a common and prevalent disease, posing a significant global public health problem that seriously threatens human health. Currently, various machine learning and deep learning techniques, such as neural networks, have been employed for diabetes prediction. Most methods have achieved significant results by utilizing complex models on large datasets. However, many discussions on these methods lack classification analysis of experimental datasets or unilaterally pursue high precision. In the case of potential patients, a model with high recall rate becomes more important, as it reduces the probability of potential patients being misclassified as normal individuals. In this study, detailed analysis of various attributes of the dataset was conducted through a hybrid approach based on statistics and mathematics. Based on the conclusions drawn from this stage and considering factors of diabetes in society, the composition of potential cases in a small dataset was analysed. The performance of common machine learning models was tested in experiments. Ultimately, a linear model was selected and optimized, and model performance was evaluated using the confusion matrix and ROC curve, demonstrating balanced and satisfactory precision and recall scores.
- Yang Guo, Guohua Bai and Yan Hu, "Using Bayes Network for Prediction of Type-2 diabetes," 2012 International Conference for Internet Technology and Secured Transactions, London, 2012, pp. 471-472.Google Scholar
- M. NirmalaDevi, S. A. alias Balamurugan and U. V. Swathi, "An amalgam KNN to predict diabetes mellitus," 2013 IEEE International Conference ON Emerging Trends in Computing, Communication and Nanotechnology (ICECCN), Tirunelveli, India, 2013, pp. 691-695, doi: 10.1109/ICE-CCN.2013.6528591.Google ScholarCross Ref
- N. Nnamoko, A. Hussain and D. England, "Predicting Diabetes Onset: An Ensemble Supervised Learning Approach," 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 2018, pp. 1-7, doi: 10.1109/CEC.2018.8477663.Google ScholarDigital Library
- G. Verma, V. Nijhawan and A. Kumar, "A Supervised Ensemble Machine Learning Model To Predict Diabetes At Early Stage," 2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 2022, pp. 1-4, doi: 10.1109/ICRITO56286.2022.9965058.Google ScholarCross Ref
- C. Chethana, "Tree based Predictive Modelling for Prediction of the Accuracy of Diabetics," 2021 International Conference on Intelligent Technologies (CONIT), Hubli, India, 2021, pp. 1-6, doi: 10.1109/CONIT51480.2021.9498571.Google ScholarCross Ref
- Rashi Rastogi, Mamta Bansal. 2023. Diabetes prediction model using data mining techniques. Measurement: Sensors, Volume 25, pp. 100605, ISSN 2665-9174. https://doi.org/10.1016/j.measen.2022.100605.Google ScholarCross Ref
- Aman Chauhan. Predict Diabetes. Retrieved October 23, 2022 from https://www.kaggle.com/datasets/whenamancodes/predict-diabities.Google Scholar
- Saul Stahl, The Evolution of the Normal Distribution, Mathematics Magazine, Vol. 79, no. 2 (2006), pp. 96–113.Google ScholarCross Ref
- Pritha Bhandari. Normal Distribution | Examples, Formulas, & Uses. Retrieved May 27, 2023 from https://www.scribbr.com/statistics/normal-distribution/.Google Scholar
- Shaun Turney. Central Limit Theorem | Formula, Definition & Examples. Retrieved May 27, 2023 from https://www.scribbr.com/statistics/central-limit-theorem/.Google Scholar
- Shaun Turney. Skewness | Definition, Examples & Formula. Retrieved May 27, 2023 from https://www.scribbr.com/statistics/skewness/.Google Scholar
- N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman and A. Galstyan. 2019. A Survey on Bias and Fairness in Machine Learning. https://doi.org/10.48550/arXiv.1908.09635.Google ScholarCross Ref
- Maalouf, Maher. Logistic regression in data analysis: An overview. 2011. International Journal of Data Analysis Techniques and Strategies, Volume 3, pp. 281-299, doi: 10.1504/IJDATS.2011.041335.Google ScholarDigital Library
- Evgeniou, Theodoros and Pontil, Massimiliano. Support Vector Machines: Theory and Applications. 2001. Volume 2049, pp. 249-257, isbn 978-3-540-42490-1, doi: 10.1007/3-540-44673-7_12.Google ScholarCross Ref
- Aurélien Géron. 2019. Hands-On Machine learning with Scikit-Learn, Keras & TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, Second Edition. O'Reilly Media, Inc.Google Scholar
- K. Taunk, S. De, S. Verma and A. Swetapadma, "A Brief Review of Nearest Neighbor Algorithm for Learning and Classification," 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 2019, pp. 1255-1260, doi: 10.1109/ICCS45141.2019.9065747.Google ScholarCross Ref
- Kotsiantis, S.B. Decision trees: a recent overview. Artif Intell Rev 39, 261–283 (2013). https://doi.org/10.1007/s10462-011-9272-4.Google ScholarDigital Library
- Sebastian Raschka. Machine Learning FAQ. Retrieved May 24, 2023 from https://sebastianraschka.com/faq/docs/decision-tree-binary.html.Google Scholar
- Tin Kam Ho, "Random decision forests," Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 1995, pp. 278-282 vol.1, doi: 10.1109/ICDAR.1995.598994.Google ScholarCross Ref
- James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13, 1 (2012), Volume 13, pp 281–305.Google ScholarDigital Library
- Tom Fawcett. An introduction to ROC analysis. 2006. Pattern Recognition Letters, Volume 27, pp. 861-874. https://www.sciencedirect.com/science/article/pii/S016786550500303X.Google ScholarDigital Library
Index Terms
- Diabetes Prediction Based on Limited Medical Indication
Recommendations
Research on Diabetes Prediction Based on Machine Learning
MLMI '23: Proceedings of the 6th International Conference on Machine Learning and Machine IntelligenceDiabetes is an irreversible, chronic metabolic disease, and is now the third most important non-communicable disease threatening human health. Therefore, early diagnosis of diabetes is essential. In this paper, after preprocessing the Pima Indian ...
Machine Learning Based Unified Framework for Diabetes Prediction
BDET 2018: Proceedings of the 2018 International Conference on Big Data Engineering and TechnologyMachine learning gained a significant position in healthcare services (HCS) due to its ability to improve the disease prediction in HCS. Machine learning techniques and artificial intelligence have already been worked in the HCS area. Recently, diabetes ...
Deep Learning Approach for Accurate Prediction of diabetes
ICIMMI '23: Proceedings of the 5th International Conference on Information Management & Machine IntelligenceOver the decades, diabetes has proven to be a chronic disease, causing significant impact on individuals and healthcare systems globally. This disease increases the threat of diseases like cardiovascular illness, blindness, and may even cause early ...
Comments