research-article

Early-Stage Diabetes Prediction using Data Mining Algorithms

Authors:
Md Moniruzzaman

American International University-Bangladesh, Bangladesh

American International University-Bangladesh, Bangladesh
View Profile

,
A. G. M. Zaman

American International University-Bangladesh, Bangladesh

American International University-Bangladesh, Bangladesh
View Profile

,
Rifah Tasnia

American International University-Bangladesh, Bangladesh

American International University-Bangladesh, Bangladesh
View Profile

,
Sutopa Biswas

American International University-Bangladesh, Bangladesh

American International University-Bangladesh, Bangladesh
View Profile

,
Mehnur Khanam

American International University-Bangladesh, Bangladesh

American International University-Bangladesh, Bangladesh
View Profile

ICCA '22: Proceedings of the 2nd International Conference on Computing AdvancementsMarch 2022Pages 240–248https://doi.org/10.1145/3542954.3542990

Published:11 August 2022Publication History

ICCA '22: Proceedings of the 2nd International Conference on Computing Advancements

Pages 240–248

ABSTRACT

Diabetes is a very common disease nowadays. If not treated early diabetes can pose a profoundly serious health threat. Much research has been conducted to find out the optimal solution for diabetes detection by applying different data mining algorithms, where the dataset consists of different medicinal attributes. In this study, our aim is to examine whether diabetes can be detected at early-stage by applying different data mining algorithms to the non-medicinal dataset; as well as to investigate whether data normalization techniques can improve the classifiers accuracy.

Naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machines (SVM), Decision Tree, Random Forest, and Gradient Boosting Classifier (GBC) algorithms are applied to the Early Stage Diabetes Risk Prediction Dataset in conjunction with Decimal Point Scaling, Z-Score Normalization, Pareto Scaling, Variable Stability Scaling, Min-Max normalization, Max normalization, Maximum Absolute Scaling, Mean Centered Scaling, Soft-max normalization, Power Transformer, Median and Median Absolute Deviation Normalization, Robust Scaling and Log Scaling normalization methods. In this experiment, we discovered that early-stage diabetes detection is possible without any medical diagnosis data. The result shows that GBC performs better compared to other classification algorithms in combination with data normalization and achieved an impressive 99.038% prediction accuracy.

References

Vijiyarani S and Sudha S. 2013. Disease Prediction in Data Mining Technique – A Survey. International Journal of Computer Applications & Information Technology 2 (1).Google Scholar
Mamatha Bai B.G., Nalini B.M. and Jharna Majumdar. 2019. Alalysis and Detection of Diabetes Using Data Mining Techniques – A Big Data Application in Health Care. Emerging Research in Computing, Information, Communication and Applications 882, 443-455. DOI: https://doi.org/10.1007/978-981-13-5953-8_37Google Scholar
Robert A. Aronowitz. 2001. When Do Symptoms Become a Disease? Annals of Internal Medicine 134 (9), 803. DOI: https://doi.org/10.7326/0003-4819-134-9_part_2-200105011-00002Google Scholar
Aiswarya Iyer, Jeyalatha S, and Ronak Sumbaly. 2015. Diagnosis of Diabetes Using Classification Mining Techniques. International Journal of Data Mining & Knowledge Management Process 5 (1), 01-14. DOI: https://doi.org/10.5121/ijdkp.2015.5101Google Scholar
Gaganjot Kaur and Amit Chhabra. 2014. Improved J48 Classification Algorithm for the Prediction of Diabetes. International Journal of Computer Applications 98 (22), 13-17. DOI: https://doi.org/10.5120/17314-7433Google Scholar
How does data mining help healthcare? cprimestudios.com. Retrieved December 28, 2021 from https://cprimestudios.com/blog/how-does-data-mining-help-healthcareGoogle Scholar
Nesreen Samer El_Jerjawi and Samy S. Abu-Naser. 2018. Diabetes Prediction Using Artificial Neural Network. International Journal of Advanced Science and Technology 121, 55-64. DOI: http://dx.doi.org/10.14257/ijast.2018.121.05Google Scholar
Diabetes - Health topics. Retrieved December 28, 2021 from https://www.who.int/health-topics/diabetesGoogle Scholar
Patrick J. Lustman, Ray E. Clouse, and Robert M. Carney. 1989. Depression and the Reporting of Diabetes Symptoms. The International Journal of Psychiatry in Medicine 18 (4), 295-303. DOI: https://doi.org/10.2190/lw52-jfkm-jchv-j67xGoogle Scholar
J. Pradeep Kandhasamy and S. Balamurali. 2015. Performance Analysis of Classifier Models to Predict Diabetes Mellitus. Procedia Computer Science 47, 45-51. DOI: https://doi.org/10.1016/j.procs.2015.03.182Google Scholar
Yang Guo, Guohua Bai, and Yan Hu. 2012. Using Bayes Network for Prediction of Type-2 diabetes. 2012 International Conference for Internet Technology and Secured Transactions, 471-472.Google Scholar
Srideivanai Nagarajan and R. M. Chandrasekaran. 2015. Design and Implementation of Expert Clinical System for Diagnosing Diabetes Using Data Mining Techniques. Indian Journal of Science and Technology 8 (8), 771. DOI: https://doi.org/10.17485/ijst/2015/v8i8/69272Google Scholar
Huy Nguyen Anh Pham and Evangelos Triantaphyllou. 2008. Prediction of Diabetes by Employing a New Data Mining Approach Which Balances Fitting and Generalization. Computer and Information Science, 11-26. DOI: https://doi.org/10.1007/978-3-540-79187-4_2Google Scholar
Jie Gao, J. Denzinger, and R.C. James. 2005. CoLe: A Cooperative Data Mining Approach and Its Application to Early Diabetes Detection. Fifth IEEE International Conference on Data Mining (ICDM’05), 4. DOI: https://doi.org/10.1109/icdm.2005.44Google Scholar
Vrushali R. Balpande and Rakhi D. Wajgi. 2017. Prediction and severity estimation of diabetes using data mining technique. 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 576-580. DOI: https://doi.org/10.1109/icimia.2017.7975526Google Scholar
Saranya C and Manikandan G. 2013. A study on normalization techniques for privacy preserving data mining. International Journal of Engineering and Technology (IJET) 5 (3), 2701-2704.Google Scholar
Zohair Ihsan, Mohd Yazid Idris, and Abdul Hanan Abdullah. 2013. Attribute normalization techniques and performance of intrusion classifiers: A comparative analysis. Life Science Journal 10 (4), 2568-2576.Google Scholar
RD Canlas. 2009. Data mining in healthcare: Current applications and issues. School of Information Systems & Management, Carnegie Mellon University, Australia.Google Scholar
Sudhir M Gorade, Ankit Deo, and Preetesh Purohit. 2017. Early Identification of Diseases Based on Responsible Attribute Using Data Mining. International Research Journal of Engineering and Technology (IRJET) 4 (7).Google Scholar
Amit Pandey and Achin Jain. 2017. Comparative Analysis of KNN Algorithm using Various Normalization Techniques. International Journal of Computer Network and Information Security 9 (11), 36-42. DOI: https://doi.org/10.5815/ijcnis.2017.11.04Google Scholar
S Selvakumar, K. Senthamarai Kannan, and S. Gothai Nachiyar. 2017. Prediction of diabetes diagnosis using classification based data mining techniques. International Journal of Statistics and Systems 12 (2), 183-188Google Scholar
Fikirte Girma Woldemichael and Sumitra Menaria. 2018. Prediction of Diabetes Using Data Mining Techniques. 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), 414-418. DOI: https://doi.org/10.1109/icoei.2018.8553959Google Scholar
M M Faniqul Islam, Rahatara Ferdousi, Sadikur Rahman, and Humayra Yasmin Bushra. Early stage diabetes risk prediction dataset. UCI Machine Learning Repository. Retrieved December 28, 2021 from https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset.Google Scholar
Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques (3rd ed.). Elsevier.Google Scholar
Anil Jain, Karthik Nandakumar, and Arun Ross. 2005. Score normalization in multimodal biometric systems. Pattern Recognition 38 (12), 2270-2285. DOI: https://doi.org/10.1016/j.patcog.2005.01.012Google Scholar
Keinosuke Fukunaga. 2013. Introduction to Statistical Pattern Recognition. Elsevier.Google Scholar
Isao Noda. 2008. Scaling techniques to enhance two-dimensional correlation spectra. Journal of Molecular Structure 883, 216-227. DOI: https://doi.org/10.1016/j.molstruc.2007.12.026Google Scholar
Lennart Eriksson, Joanna Jaworska, Andrew P Worth, Mark T D Cronin, Robert M McDowell, and Paola Gramatica. 2003. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environmental Health Perspectives 111 (10), 1361-1375. DOI: https://doi.org/10.1289/ehp.5758Google Scholar
Robert A Van den Berg, Huub CJ Hoefsloot, Johan A Westerhuis, Age K Smilde, and Mariët J Van der Werf. 2006. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7 (1). DOI: https://doi.org/10.1186/1471-2164-7-142Google Scholar
Weijun Li and Zhenyu Liu. 2011. A method of SVM with Normalization in Intrusion Detection. Procedia Environmental Sciences 11, 256-262. DOI: https://doi.org/10.1016/j.proenv.2011.12.040Google Scholar
Geoff Dougherty. 2012. Pattern Recognition and Classification. Springer Science & Business Media.Google Scholar
Andrew Craig, Olivier Cloarec, Elaine Holmes, Jeremy K. Nicholson, and John C. Lindon. 2006. Scaling and Normalization Effects in NMR Spectroscopic Metabonomic Data Sets. Analytical Chemistry 78 (7), 2262-2267. DOI: https://doi.org/10.1021/ac0519312Google Scholar
Olav M. Kvalheim, Frode. Brakstad, and Yizeng. Liang. 1994. Preprocessing of analytical profiles in the presence of homoscedastic or heteroscedastic noise. Analytical Chemistry 66 (1), 43-51. DOI: https://doi.org/10.1021/ac00073a010Google Scholar
Jiaqi Pan, Yan Zhuang, and Simon Fong. 2016. The Impact of Data Normalization on Stock Market Prediction: Using SVM and Technical Indicators. Communications in Computer and Information Science, 72-88. DOI:https://doi.org/10.1007/978-981-10-2777-2_7Google Scholar
How to Scale Data with Outliers for Machine Learning. Retrieved December 28, 2021 from https://machinelearningmastery.com/robust-scaler-transforms-for-machine-learning/Google Scholar
Harry Zhang. 2004. The optimality of naive Bayes. AA 1 (2), 3.Google Scholar
Oliver Kramer. 2013. Dimensionality Reduction with Unsupervised Nearest Neighbors. Springer Science & Business Media.Google Scholar
Nello Cristianini and John Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.Google Scholar
Yan-Yan Song and LU Ying. 2015. Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry 27 (2), 130-135. DOI: https://doi.org/10.11919/j.issn.1002-0829.215044Google Scholar
Gérard Biau and Erwan Scornet. 2016. A random forest guided tour. TEST 25 (2), 197-227. DOI: https://doi.org/10.1007/s11749-016-0481-7Google Scholar
Navoneel Chakrabarty, Tuhin Kundu, Sudipta Dandapat, Apurba Sarkar, and Dipak Kumar Kole. 2018. Flight Arrival Delay Prediction Using Gradient Boosting Classifier. Advances in Intelligent Systems and Computing, 651-659. DOI:https://doi.org/10.1007/978-981-13-1498-8_57Google Scholar

Early-Stage Diabetes Prediction using Data Mining Algorithms
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
2. Information systems
  1. Information systems applications

Recommendations

Prediction of Diabetes using Classification Algorithms
Abstract
Diabetes is considered as one of the deadliest and chronic diseases which causes an increase in blood sugar. Many complications occur if diabetes remains untreated and unidentified. The tedious identifying process results in visiting of a patient ...
Read More
Mining of classification patterns in clinical data through data mining algorithms
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and Informatics

Data mining on clinical data is a challenging area in the field of medical research, aiming at predicting and discovering patterns of disease occurrence and prognosis based on detected symptoms and reported health conditions. Data mining is the process ...
Read More
Data Mining Approach for the Early Risk Assessment of Gestational Diabetes Mellitus

In this article, the authors proposed the method of medical diagnosis in gestational diabetes mellitus GDM in the initial stages of pregnancy to facilitate diagnoses and prevent the affection. Nowadays, in industrial modern world with changing lifestyle ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCA '22: Proceedings of the 2nd International Conference on Computing Advancements
March 2022
543 pages
ISBN:9781450397346
DOI:10.1145/3542954

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data Mining
Decision Tree
Early-Stage diabetes
Gradient Boosting Classifiers (GBC)
K-Nearest Neighbor (KNN)
Machine Learning
Normalization
Random Forest
Support Vector Machines (SVM)
Symptoms
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 126
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Early-Stage Diabetes Prediction using Data Mining Algorithms

ICCA '22: Proceedings of the 2nd International Conference on Computing Advancements

ABSTRACT

References

Cited By

Recommendations

Prediction of Diabetes using Classification Algorithms

Mining of classification patterns in clinical data through data mining algorithms

Data Mining Approach for the Early Risk Assessment of Gestational Diabetes Mellitus

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Early-Stage Diabetes Prediction using Data Mining Algorithms

ICCA '22: Proceedings of the 2nd International Conference on Computing Advancements

ABSTRACT

References

Cited By

Recommendations

Prediction of Diabetes using Classification Algorithms

Mining of classification patterns in clinical data through data mining algorithms

Data Mining Approach for the Early Risk Assessment of Gestational Diabetes Mellitus

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media