Abstract
Diabetes has become one of the major health concerns for the modern day population. This can be attributed to a number of factors such as unhealthy lifestyle, meager diet, genetics, obesity, etc. The rapid growth in the number of diabetic patients urges the requirement for a state-of-the-art healthcare against such diseases. Early prediction of such diseases can be very useful for mitigating the risks associated with such diseases. In this context, this research proposes an indigenous efficient diagnostic tool for the detection of diabetes. The proposed methodology comprises two phases: Phase-I deals with collection of Pima Indian Diabetes Dataset from the UCI machine learning repository databases and Localized Diabetes Dataset from Bombay Medical Hall, Upper Bazar Ranchi, Jharkhand, India. In Phase-II, the acquired datasets are processed and analyzed using two different approaches. The first approach entails classification through Logistic Regression, K-Nearest Neighbor, ID3 DT, C4.5 DT, and Naive Bayes. The second approach employs PCA and PSO algorithms for feature reduction prior to the classification of the dataset using the methods used in the first approach. A comparative analysis is performed between the various approaches used in this manuscript. Results obtained clearly depict the efficiency of the proposed approach over the traditional classification approach in terms of less computation time and increased accuracy. The proposed approach has the potential to be applied for effective and early diagnosis of other medical diseases as well.
Similar content being viewed by others
References
Abu Naser SS, Abu Zaiter OA (2005-2008) An expert system for diagnosing eye diseases using clips. J Theor Appl Inf Technol (JATIT), 923–930
Ali FM, Fgee E-BE, Zubi ZS (2015) Predicting performance of classification algorithms. Int J Comput Eng Technol (IJCET) 6(2):19–28
Aslam MW, Zhu Z, Nandi AK (2013) Feature generation using genetic programming with comparative partner selection for diabetes classification. Expert Syst Appl 40:5402–5412
Bala K, Choubey DK, Paul S (2017). Soft computing and data mining techniques for thunderstorms and lightning prediction: a survey. In: International conference of electronics, communication and aerospace technology (ICECA 2017), IEEE, RVS Technical Campus, Coimbatore, Tamilnadu, India, vol 1, pp 42–46
Bala K, Choubey DK, Paul S, Lala MGN (2018) Classification techniques for thunderstorms and lightning prediction—a survey. Soft computing-based nonlinear control systems design. IGI Global, Hershey, pp 1–17
Barakat N (2007) Rule extraction from support vector machines: Medical diagnosis prediction and explanation. Ph.D. thesis, School of Information Technology and Electrical Engineering (ITEE), University of Queensland, Brisbane, Australia
Barakat NH, Bradley AP (2007) Rule extraction from support vector machines: a sequential covering approach. IEEE Trans Knowl Data Eng 19(6):729–741
Barakat NH, Bradley AP, Barakat Mohamed NH (2010) Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans Inf Technol Biomed 14(4):1114–1120
Cheng Q, Varshney PK, Arora MK (2006) Logistic regression for feature selection and soft classification of remote sensing data. IEEE Geosci Remote Sens Lett 3(4):491–494
Choubey DK, Paul S (2015) GA_J48graft DT: a hybrid intelligent system for diabetes disease diagnosis. Int J Bio-Sci Bio-Technol (IJBSBT) 7(5):135–150 (ISSN: 2233-7849)
Choubey DK, Paul S (2016a) GA_MLP NN: a hybrid intelligent system for diabetes disease diagnosis. Int J Intell Syst Appl (IJISA) 8(1):49–59
Choubey DK, Paul S (2016b) Classification techniques for diagnosis of diabetes disease: a review. Int J Biomed Eng Technol (IJBET) 21(1):15–39
Choubey DK, Paul S (2017a) GA_SVM-A classification system for diagnosis of diabetes. Handbook of research on nature inspired soft computing and algorithms. IGI Global, Hershey, pp 359–397
Choubey DK, Paul S (2017b) GA_RBF NN: a classification system for diabetes. Int J Biomed Eng Technol (IJBET) 23(1):71–93
Choubey DK, Paul S, Bhattacharjee J (2014) Soft computing approaches for diabetes disease diagnosis: a survey. Int J Appl Eng Res (IJAER) 9:11715–11726
Choubey DK, Paul S, Kumar S, Kumar S (2017a) Classification of Pima Indian diabetes dataset using Naive Bayes with genetic algorithm as an attribute selection. In: CRC Press Taylor Francis, communication and computing systems: proceedings of the international conference on communication and computing system (ICCCS 2016), pp 451–455
Choubey DK, Paul S, Dhandhenia VK (2017b) Rule based diagnosis system for diabetes. Biomed Res 28(12):5196–5209
Choubey DK, Paul S, Sandilya S, Dhandhenia VK (2018) Implementation and analysis of classification algorithms for diabetes. Current medical imaging reviews. Bentham Science, Sharjah
Choubey DK, Paul S, Bala K, Kumar M, Singh UP (2019a) Implementation of a hybrid classification method for diabetes. Innovations in multimedia data engineering and management. IGI Global, Hershey, pp 201–240
Choubey DK, Paul S, Dhandhenia VK (2019b) GA_NN: an intelligent classification system for diabetes. Chapter 2, Soft Computing for Problem Solving, Advances in Intelligent Systems and Computing 817. Springer, Berlin, vol 2, pp 11–23
Choubey DK, Tripathi S, Kumar P, Shukla V, Dhandhania VK (2019c) Classification of diabetes by kernel based SVM with PSO, recent patents on computer science. Bentham Science, Sharjah
Choubey DK, Kumar M, Shukla V, Tripathi S, Dhandhania VK (2019d) Comparative analysis of classification methods with PCA and LDA for diabetes. Current diabetes reviews. Bentham Science, Sharjah
Chuang L-Y, Tsai S-W, Yang C-H (2011) Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst Appl 38:12699–12707
Daho MEH, Settouti N, Lazouni MEA, Chikh MA (2013) Recognition of diabetes disease using a new hybrid learning algorithm for nefclass. In: 8th international workshop on systems, signal processing and their applications (WoSSPA), pp 239–243
Dogantekin E, Dogantekin A, Avci D, Avci L (2010) An intelligent diagnosis system for diabetes on linear discriminant analysis and adaptive network based fuzzy inference system: LDA–ANFIS. Digit Signal Proc 20:1248–1255
Dreiseitl S, Ohno-Machado L (2003) Logistic regression and artificial neural network classification models: a review. J Biomed Inf 35:352–359
Ephzibah EP (2011) Cost effective approach on feature selection using genetic algorithms and fuzzy logic for diabetes diagnosis. Int J Soft Comput (IJSC) 2(1):1–10
Escalante HJ, Montes M, Sucar LE (2009) Particle swarm model selection. J Mach Learn Res 10:405–440
Ganji MF, Abadeh MS (2010) Using fuzzy ant colony optimization for diagnosis of diabetes disease. In: Proceedings of ICEE, May 11–13, IEEE, pp 501–505
Ghosh SR, Waheed S (2016) A critical study of selected classification algorithms for liver disease diagnosis. Int J Comput Sci Inf Technol (IJCSIT) 7(6):2561–2565
Goncalves LB, Bernardes MM, Vellasco R (2006) Inverted hierarchical neuro-fuzzy BSP system: a novel neuro-fuzzy model for pattern classification and rule extraction in databases. IEEE Trans Syst Man Cybern Part C Appl Rev 36(2):236–248
Guo Y, Bai G, Hu Y (2012) Using bayes network for prediction of type-2 diabetes. In: Internet technology and secured transactions. IEEE, New York, pp 471–472
Gutierrez PA, Hervas-Martinez C, Martinez-Estudillo FJ (2011) Logistic regression by means of evolutionary radial basis function neural networks. IEEE Trans Neural Networks 22(2):246–263
Hemant P, Pushpavathi T (2012) A novel approach to predict diabetes by cascading clustering and classification. In: Computing communication and networking technologies (ICCCNT), pp 1–7
Jabbar MA (2017) Prediction of heart disease using K-nearest neighbor and particle swarm optimization. Biomed Res 28(9):4154–4158
Jabbar MA, Deekshatulu BL, Chandra P (2013) Classification of heart disease using K-nearest neighbor and genetic algorithm. In: International conference on computational intelligence: modeling techniques and applications (CIMTA). Elsevier, Amsterdam, vol 10, pp 85–94
Jayalakshmi T, Santhakumaran A (2010) A novel classification method for diagnosis of diabetes mellitus using artificial neural networks. In: International conference on data storage and data engineering (DSDE Bangalore, India), pp 159–163
Jiawei H, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Elsevier, Amsterdam, pp 1–702
Jin Z, Zhou G, Gao D, Zhang Y (2018) EEG classification using sparse bayesian extreme learning machine for brain-computer interface. Neural Comput Appl 1–9
Kahramanli H, Allahverdi N (2008) Design of a hybrid system for the diabetes and heart diseases. Expert Syst Appl 35:82–89
Kala R, Vazirani H, Khanwalkar N, Bhattacharya M (2010) Evolutionary radial basis function network for classificatory problems. Int J Comput Sci Appl Technomath Res Found 7(4):34–49
Kalaiselvi C, Nasira GM (2014) A new approach for diagnosis of diabetes and prediction of cancer using ANFIS. In: World congress on computing and communication technologies. IEEE, New York, pp 188–190
Kandhasamy J Pradeep, Balamurali S (2015) Performance analysis of classifier models to predict diabetes mellitus. Procedia Comput Sci 47:45–51
Karatsiolis S, Schizas CN (2012). Region based support vector machine algorithm for medical diagnosis on Pima Indian diabetes dataset. In: Proceedings of the IEEE 12th international conference on bioinformatics and bioengineering (BIBE), Larnaca, Cyprus, pp 139–134
Karegowda AG, Manjunath AS, Jayaram MA (2011) Application of genetic algorithm optimized neural network connection weights for medical diagnosis of Pima Indians diabetes. Int J Soft Comput (IJSC). 2(2):15–23
Kayaer K, Yildirim T (2003) Medical diagnosis on Pima Indian diabetes using general regression neural networks. In: Proceedings of the international conference on artificial neural networks and neural information processing (ICANN/ICONIP). IEEE, New York, pp 181–184
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceeding IEEE international conference neural network (ICNN), vol 4, pp 1942–1948
Lee C-S (2011) A fuzzy expert system for diabetes decision support application. IEEE Trans Syst Man Cybern-Part B: Cybern 41(1):139–153
Lin SW, Ying C, Chen S-C, Lee Z-J (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35:1817–1824
Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst Appl 38:4600–4607
Meza-Palacios R, Aguilar-Lasserre AA, Enrique L, Vázquez-Rodríguez CF, Posada-Gómez R, Trujillo-Mata A (2017) Development of a fuzzy expert system for the nephropathy control assessment in patients with type 2 diabetes mellitus. Expert Syst Appl 72:335–343
Nkounkou B, Lee C, Huang C-H, Brown C (2010) Biological data classifications with LDA and SPRT. In: International conference on bioinformatics and biomedicine workshops. IEEE, New York, pp 164–168
Nookala GKM, Pottumuthu BK, Orsu N, Mudunuri SB (2013) Performance analysis and evaluation of different data mining algorithms used for cancer classification. Int J Adv Res Artif Intell (IJARAI) 2(5):49–55
Orkcu H Hasan, Bal H (2011) Comparing performances of backpropagation and genetic algorithms in the data classification. Expert Syst Appl 38:3703–3709
Parashar A, Burse K, Rawat K (2014a) A comparative approach for pima Indians diabetes diagnosis using LDA-support vector machine and feed forward neural network. Int J Adv Res Comput Sci Softw Eng 4(11):378–383
Parashar A, Burse K, Rawat K (2014b) Diagnosis of pima indians diabetes by LDA-SVM approach: a survey. Int J Eng Res Technol (IJERT) 3(10):1192–1194
Patil BM, Joshi RC, Toshniwal D (2010) Association rule for classification of type-2 diabetic patients. In: Second international conference on machine learning and computing. IEEE, New York, pp 330–334
Polat K, Gunes S (2007) An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit Signal Proc 17:702–710
Polat K, Gunes S, Arslan A (2008) A cascade learning system for classification of diabetes disease: generalized discriminant analysis and least square support vector machine. Expert Syst Appl 34:482–487
Prabhat A, Khuller V (2017) Sentiment classification on big data using naive bayes and logistic regression. In: International conference on computer communication and informatics (ICCCI-017). IEEE, 2017
Qasem SN, Shamsuddin SM (2011) Radial basis function network based on time variant multi objective particle swarm optimization for medical diseases diagnosis. Appl Soft Comput 11:1427–1438
Raghavendra BK, Simha JB (2010) Evaluation of logistic regression model with feature selection methods on medical dataset. Int J Comput Intell 1(2):35–42
Raikwal JS, Saxena K (2012) Performance evaluation of SVM and K-nearest neighbor algorithm over medical dataset. Int J Comput Appl 50(14):35–39
Saravananathan K, Velmurugan T (2016) Analyzing diabetic data using classification algorithms in data mining. Indian J Sci Technol 9(43):1–6
Sathasivam S, Hamadneh N, Choon OH (2011) Comparing neural networks: hopfield network and RBF network. Appl Math Sci 5(69):3439–3452
Sayana AK, Sreelakshmi KP, Vanitha T (2017) A comparative study of KNN and SVM data classification algorithms in chronic kidney disease. Int J Latest Trends Eng Technol, Special Issue SACAIM, pp 426–429
Seera M, Lim CP (2014) A hybrid intelligent system for medical data classification. Expert Syst Appl 41:2239–2249
Selva kumar S, Senthamarai Kannan K, Gothai Nachiyar S (2017) Prediction of diabetes diagnosis using classification based data mining techniques. Int J Stat Syst 12(2):183–188
Selvakuberan K, Kayathiri D, Harini B, Devi MI (2011) An efficient feature selection method for classification in healthcare systems using machine learning techniques. In: 2011 3rd international conference on electronics computer technology. IEEE, New York, pp 223–226
Sharma R, Kumar S, Maheshwari R (2015) Comparative analysis of classification techniques in data mining using different datasets. Int J Comput Sci Mob Comput (IJCSMC) 4(12):125–134
Shen L, Tan EC (2005) Dimension reduction-based penalized logistic regression for cancer classification using microarray data. IEEE/ACM Trans Comput Biol Bioinf 2(2):166–175
Shouman M, Turner T, Stocker R (2012) Applying K-nearest neighbor in diagnosing heart disease patients. Int J Inf Educ Technol 2(3):220–223
Siddique AQ, Hossain MS (2013) Predicting heart-disease from medical data by applying naive bayes and Apriori algorithm. Int J Sci Eng Res (IJSER) 4(10):224–231
Tamizharasi K, Umarani, Rajasekaran K (2014) Performance analysis of various data mining algorithms. Int J Comput Commun Inf Syst (IJCCIS) 6(3):118–127
Temurtas H, Yumusak N, Temurtas F (2009) A comparative study on diabetes disease diagnosis using neural networks. Expert Syst Appl 36:8610–8615
Tsanas A, Little MA, McSharry PE (2013) A methodology for the analysis of medical data Handbook of systems and complexity in health. Springer, Berlin, pp 113–125
Vijyan V, Ravi Kumar A (2014) Study of data mining algorithms for prediction and diagnosis of diabetes mellitus. Int J Comput Appl 95(17):12–16
Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recogn Lett 28:459–471
Wu H, Yang S, Huang Z, He J, Wang X (2018) Type 2 diabetes mellitus prediction model based on data mining. Inf Med Unlocked 10:100–107
Xue B, Zhang M, Browne WN (2014) Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276
UCI Repository of Bioinformatics Databases [online]. https://www.ics.uci.edu/mlearn/MLRepository.html
Zhang X, Yao L, Wang X, Monaghan J, Mcalpine D, Zhang Yu (2016) A survey on deep learning based brain computer interface: recent advances and new frontiers. Hum Comput Interact 1(1):1–66
Zhang Yu, Zhang H, Chen X, Liu M, Zhu X, Lee S-W, Shen D (2019) Strength and similarity guided group-level brain functional network construction for MCI diagnosis. Pattern Recogn 88:421–430
Acknowledgements
The work done by authors fulfills all the ethical terms and conditions. The data used in the research work were selective and anonymous. Confidentiality of personal and medical data of the patients has been maintained in all aspects. The authors would like to first thank all the patients of Bombay Medical Hall, Mahabir Chowk, Pyada Toli, Upper Bazar, Ranchi, Jharkhand, India who gave us information very patiently, Dr. Vinay Kumar Dhandhania, Diabetologist; M/s Sneha Verma Dietitian; Linus ji, and remaining staff of Bombay Medical Hall, Ranchi, India who helped us to collect and compile the dataset of diabetic and non-diabetic patients.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Choubey, D.K., Kumar, P., Tripathi, S. et al. Performance evaluation of classification methods with PCA and PSO for diabetes. Netw Model Anal Health Inform Bioinforma 9, 5 (2020). https://doi.org/10.1007/s13721-019-0210-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-019-0210-8