ABSTRACT
Diabetes is a very common disease nowadays. If not treated early diabetes can pose a profoundly serious health threat. Much research has been conducted to find out the optimal solution for diabetes detection by applying different data mining algorithms, where the dataset consists of different medicinal attributes. In this study, our aim is to examine whether diabetes can be detected at early-stage by applying different data mining algorithms to the non-medicinal dataset; as well as to investigate whether data normalization techniques can improve the classifiers accuracy.
Naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machines (SVM), Decision Tree, Random Forest, and Gradient Boosting Classifier (GBC) algorithms are applied to the Early Stage Diabetes Risk Prediction Dataset in conjunction with Decimal Point Scaling, Z-Score Normalization, Pareto Scaling, Variable Stability Scaling, Min-Max normalization, Max normalization, Maximum Absolute Scaling, Mean Centered Scaling, Soft-max normalization, Power Transformer, Median and Median Absolute Deviation Normalization, Robust Scaling and Log Scaling normalization methods. In this experiment, we discovered that early-stage diabetes detection is possible without any medical diagnosis data. The result shows that GBC performs better compared to other classification algorithms in combination with data normalization and achieved an impressive 99.038% prediction accuracy.
- Vijiyarani S and Sudha S. 2013. Disease Prediction in Data Mining Technique – A Survey. International Journal of Computer Applications & Information Technology 2 (1).Google Scholar
- Mamatha Bai B.G., Nalini B.M. and Jharna Majumdar. 2019. Alalysis and Detection of Diabetes Using Data Mining Techniques – A Big Data Application in Health Care. Emerging Research in Computing, Information, Communication and Applications 882, 443-455. DOI: https://doi.org/10.1007/978-981-13-5953-8_37Google Scholar
- Robert A. Aronowitz. 2001. When Do Symptoms Become a Disease? Annals of Internal Medicine 134 (9), 803. DOI: https://doi.org/10.7326/0003-4819-134-9_part_2-200105011-00002Google Scholar
- Aiswarya Iyer, Jeyalatha S, and Ronak Sumbaly. 2015. Diagnosis of Diabetes Using Classification Mining Techniques. International Journal of Data Mining & Knowledge Management Process 5 (1), 01-14. DOI: https://doi.org/10.5121/ijdkp.2015.5101Google Scholar
- Gaganjot Kaur and Amit Chhabra. 2014. Improved J48 Classification Algorithm for the Prediction of Diabetes. International Journal of Computer Applications 98 (22), 13-17. DOI: https://doi.org/10.5120/17314-7433Google Scholar
- How does data mining help healthcare? cprimestudios.com. Retrieved December 28, 2021 from https://cprimestudios.com/blog/how-does-data-mining-help-healthcareGoogle Scholar
- Nesreen Samer El_Jerjawi and Samy S. Abu-Naser. 2018. Diabetes Prediction Using Artificial Neural Network. International Journal of Advanced Science and Technology 121, 55-64. DOI: http://dx.doi.org/10.14257/ijast.2018.121.05Google Scholar
- Diabetes - Health topics. Retrieved December 28, 2021 from https://www.who.int/health-topics/diabetesGoogle Scholar
- Patrick J. Lustman, Ray E. Clouse, and Robert M. Carney. 1989. Depression and the Reporting of Diabetes Symptoms. The International Journal of Psychiatry in Medicine 18 (4), 295-303. DOI: https://doi.org/10.2190/lw52-jfkm-jchv-j67xGoogle Scholar
- J. Pradeep Kandhasamy and S. Balamurali. 2015. Performance Analysis of Classifier Models to Predict Diabetes Mellitus. Procedia Computer Science 47, 45-51. DOI: https://doi.org/10.1016/j.procs.2015.03.182Google Scholar
- Yang Guo, Guohua Bai, and Yan Hu. 2012. Using Bayes Network for Prediction of Type-2 diabetes. 2012 International Conference for Internet Technology and Secured Transactions, 471-472.Google Scholar
- Srideivanai Nagarajan and R. M. Chandrasekaran. 2015. Design and Implementation of Expert Clinical System for Diagnosing Diabetes Using Data Mining Techniques. Indian Journal of Science and Technology 8 (8), 771. DOI: https://doi.org/10.17485/ijst/2015/v8i8/69272Google Scholar
- Huy Nguyen Anh Pham and Evangelos Triantaphyllou. 2008. Prediction of Diabetes by Employing a New Data Mining Approach Which Balances Fitting and Generalization. Computer and Information Science, 11-26. DOI: https://doi.org/10.1007/978-3-540-79187-4_2Google Scholar
- Jie Gao, J. Denzinger, and R.C. James. 2005. CoLe: A Cooperative Data Mining Approach and Its Application to Early Diabetes Detection. Fifth IEEE International Conference on Data Mining (ICDM’05), 4. DOI: https://doi.org/10.1109/icdm.2005.44Google Scholar
- Vrushali R. Balpande and Rakhi D. Wajgi. 2017. Prediction and severity estimation of diabetes using data mining technique. 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 576-580. DOI: https://doi.org/10.1109/icimia.2017.7975526Google Scholar
- Saranya C and Manikandan G. 2013. A study on normalization techniques for privacy preserving data mining. International Journal of Engineering and Technology (IJET) 5 (3), 2701-2704.Google Scholar
- Zohair Ihsan, Mohd Yazid Idris, and Abdul Hanan Abdullah. 2013. Attribute normalization techniques and performance of intrusion classifiers: A comparative analysis. Life Science Journal 10 (4), 2568-2576.Google Scholar
- RD Canlas. 2009. Data mining in healthcare: Current applications and issues. School of Information Systems & Management, Carnegie Mellon University, Australia.Google Scholar
- Sudhir M Gorade, Ankit Deo, and Preetesh Purohit. 2017. Early Identification of Diseases Based on Responsible Attribute Using Data Mining. International Research Journal of Engineering and Technology (IRJET) 4 (7).Google Scholar
- Amit Pandey and Achin Jain. 2017. Comparative Analysis of KNN Algorithm using Various Normalization Techniques. International Journal of Computer Network and Information Security 9 (11), 36-42. DOI: https://doi.org/10.5815/ijcnis.2017.11.04Google Scholar
- S Selvakumar, K. Senthamarai Kannan, and S. Gothai Nachiyar. 2017. Prediction of diabetes diagnosis using classification based data mining techniques. International Journal of Statistics and Systems 12 (2), 183-188Google Scholar
- Fikirte Girma Woldemichael and Sumitra Menaria. 2018. Prediction of Diabetes Using Data Mining Techniques. 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), 414-418. DOI: https://doi.org/10.1109/icoei.2018.8553959Google Scholar
- M M Faniqul Islam, Rahatara Ferdousi, Sadikur Rahman, and Humayra Yasmin Bushra. Early stage diabetes risk prediction dataset. UCI Machine Learning Repository. Retrieved December 28, 2021 from https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset.Google Scholar
- Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques (3rd ed.). Elsevier.Google Scholar
- Anil Jain, Karthik Nandakumar, and Arun Ross. 2005. Score normalization in multimodal biometric systems. Pattern Recognition 38 (12), 2270-2285. DOI: https://doi.org/10.1016/j.patcog.2005.01.012Google Scholar
- Keinosuke Fukunaga. 2013. Introduction to Statistical Pattern Recognition. Elsevier.Google Scholar
- Isao Noda. 2008. Scaling techniques to enhance two-dimensional correlation spectra. Journal of Molecular Structure 883, 216-227. DOI: https://doi.org/10.1016/j.molstruc.2007.12.026Google Scholar
- Lennart Eriksson, Joanna Jaworska, Andrew P Worth, Mark T D Cronin, Robert M McDowell, and Paola Gramatica. 2003. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environmental Health Perspectives 111 (10), 1361-1375. DOI: https://doi.org/10.1289/ehp.5758Google Scholar
- Robert A Van den Berg, Huub CJ Hoefsloot, Johan A Westerhuis, Age K Smilde, and Mariët J Van der Werf. 2006. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7 (1). DOI: https://doi.org/10.1186/1471-2164-7-142Google Scholar
- Weijun Li and Zhenyu Liu. 2011. A method of SVM with Normalization in Intrusion Detection. Procedia Environmental Sciences 11, 256-262. DOI: https://doi.org/10.1016/j.proenv.2011.12.040Google Scholar
- Geoff Dougherty. 2012. Pattern Recognition and Classification. Springer Science & Business Media.Google Scholar
- Andrew Craig, Olivier Cloarec, Elaine Holmes, Jeremy K. Nicholson, and John C. Lindon. 2006. Scaling and Normalization Effects in NMR Spectroscopic Metabonomic Data Sets. Analytical Chemistry 78 (7), 2262-2267. DOI: https://doi.org/10.1021/ac0519312Google Scholar
- Olav M. Kvalheim, Frode. Brakstad, and Yizeng. Liang. 1994. Preprocessing of analytical profiles in the presence of homoscedastic or heteroscedastic noise. Analytical Chemistry 66 (1), 43-51. DOI: https://doi.org/10.1021/ac00073a010Google Scholar
- Jiaqi Pan, Yan Zhuang, and Simon Fong. 2016. The Impact of Data Normalization on Stock Market Prediction: Using SVM and Technical Indicators. Communications in Computer and Information Science, 72-88. DOI:https://doi.org/10.1007/978-981-10-2777-2_7Google Scholar
- How to Scale Data with Outliers for Machine Learning. Retrieved December 28, 2021 from https://machinelearningmastery.com/robust-scaler-transforms-for-machine-learning/Google Scholar
- Harry Zhang. 2004. The optimality of naive Bayes. AA 1 (2), 3.Google Scholar
- Oliver Kramer. 2013. Dimensionality Reduction with Unsupervised Nearest Neighbors. Springer Science & Business Media.Google Scholar
- Nello Cristianini and John Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.Google Scholar
- Yan-Yan Song and LU Ying. 2015. Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry 27 (2), 130-135. DOI: https://doi.org/10.11919/j.issn.1002-0829.215044Google Scholar
- Gérard Biau and Erwan Scornet. 2016. A random forest guided tour. TEST 25 (2), 197-227. DOI: https://doi.org/10.1007/s11749-016-0481-7Google Scholar
- Navoneel Chakrabarty, Tuhin Kundu, Sudipta Dandapat, Apurba Sarkar, and Dipak Kumar Kole. 2018. Flight Arrival Delay Prediction Using Gradient Boosting Classifier. Advances in Intelligent Systems and Computing, 651-659. DOI:https://doi.org/10.1007/978-981-13-1498-8_57Google Scholar
- Early-Stage Diabetes Prediction using Data Mining Algorithms
Recommendations
Prediction of Diabetes using Classification Algorithms
AbstractDiabetes is considered as one of the deadliest and chronic diseases which causes an increase in blood sugar. Many complications occur if diabetes remains untreated and unidentified. The tedious identifying process results in visiting of a patient ...
Mining of classification patterns in clinical data through data mining algorithms
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and InformaticsData mining on clinical data is a challenging area in the field of medical research, aiming at predicting and discovering patterns of disease occurrence and prognosis based on detected symptoms and reported health conditions. Data mining is the process ...
Data Mining Approach for the Early Risk Assessment of Gestational Diabetes Mellitus
In this article, the authors proposed the method of medical diagnosis in gestational diabetes mellitus GDM in the initial stages of pregnancy to facilitate diagnoses and prevent the affection. Nowadays, in industrial modern world with changing lifestyle ...
Comments