Skip to main content

Advertisement

Log in

Performance evaluation of classification methods with PCA and PSO for diabetes

  • Original Article
  • Published:
Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

Abstract

Diabetes has become one of the major health concerns for the modern day population. This can be attributed to a number of factors such as unhealthy lifestyle, meager diet, genetics, obesity, etc. The rapid growth in the number of diabetic patients urges the requirement for a state-of-the-art healthcare against such diseases. Early prediction of such diseases can be very useful for mitigating the risks associated with such diseases. In this context, this research proposes an indigenous efficient diagnostic tool for the detection of diabetes. The proposed methodology comprises two phases: Phase-I deals with collection of Pima Indian Diabetes Dataset from the UCI machine learning repository databases and Localized Diabetes Dataset from Bombay Medical Hall, Upper Bazar Ranchi, Jharkhand, India. In Phase-II, the acquired datasets are processed and analyzed using two different approaches. The first approach entails classification through Logistic Regression, K-Nearest Neighbor, ID3 DT, C4.5 DT, and Naive Bayes. The second approach employs PCA and PSO algorithms for feature reduction prior to the classification of the dataset using the methods used in the first approach. A comparative analysis is performed between the various approaches used in this manuscript. Results obtained clearly depict the efficiency of the proposed approach over the traditional classification approach in terms of less computation time and increased accuracy. The proposed approach has the potential to be applied for effective and early diagnosis of other medical diseases as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Abu Naser SS, Abu Zaiter OA (2005-2008) An expert system for diagnosing eye diseases using clips. J Theor Appl Inf Technol (JATIT), 923–930

  • Ali FM, Fgee E-BE, Zubi ZS (2015) Predicting performance of classification algorithms. Int J Comput Eng Technol (IJCET) 6(2):19–28

    Google Scholar 

  • Aslam MW, Zhu Z, Nandi AK (2013) Feature generation using genetic programming with comparative partner selection for diabetes classification. Expert Syst Appl 40:5402–5412

    Google Scholar 

  • Bala K, Choubey DK, Paul S (2017). Soft computing and data mining techniques for thunderstorms and lightning prediction: a survey. In: International conference of electronics, communication and aerospace technology (ICECA 2017), IEEE, RVS Technical Campus, Coimbatore, Tamilnadu, India, vol 1, pp 42–46

  • Bala K, Choubey DK, Paul S, Lala MGN (2018) Classification techniques for thunderstorms and lightning prediction—a survey. Soft computing-based nonlinear control systems design. IGI Global, Hershey, pp 1–17

    Google Scholar 

  • Barakat N (2007) Rule extraction from support vector machines: Medical diagnosis prediction and explanation. Ph.D. thesis, School of Information Technology and Electrical Engineering (ITEE), University of Queensland, Brisbane, Australia

  • Barakat NH, Bradley AP (2007) Rule extraction from support vector machines: a sequential covering approach. IEEE Trans Knowl Data Eng 19(6):729–741

    Google Scholar 

  • Barakat NH, Bradley AP, Barakat Mohamed NH (2010) Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans Inf Technol Biomed 14(4):1114–1120

    Google Scholar 

  • Cheng Q, Varshney PK, Arora MK (2006) Logistic regression for feature selection and soft classification of remote sensing data. IEEE Geosci Remote Sens Lett 3(4):491–494

    Google Scholar 

  • Choubey DK, Paul S (2015) GA_J48graft DT: a hybrid intelligent system for diabetes disease diagnosis. Int J Bio-Sci Bio-Technol (IJBSBT) 7(5):135–150 (ISSN: 2233-7849)

    Google Scholar 

  • Choubey DK, Paul S (2016a) GA_MLP NN: a hybrid intelligent system for diabetes disease diagnosis. Int J Intell Syst Appl (IJISA) 8(1):49–59

    Google Scholar 

  • Choubey DK, Paul S (2016b) Classification techniques for diagnosis of diabetes disease: a review. Int J Biomed Eng Technol (IJBET) 21(1):15–39

    Google Scholar 

  • Choubey DK, Paul S (2017a) GA_SVM-A classification system for diagnosis of diabetes. Handbook of research on nature inspired soft computing and algorithms. IGI Global, Hershey, pp 359–397

    Google Scholar 

  • Choubey DK, Paul S (2017b) GA_RBF NN: a classification system for diabetes. Int J Biomed Eng Technol (IJBET) 23(1):71–93

    Google Scholar 

  • Choubey DK, Paul S, Bhattacharjee J (2014) Soft computing approaches for diabetes disease diagnosis: a survey. Int J Appl Eng Res (IJAER) 9:11715–11726

    Google Scholar 

  • Choubey DK, Paul S, Kumar S, Kumar S (2017a) Classification of Pima Indian diabetes dataset using Naive Bayes with genetic algorithm as an attribute selection. In: CRC Press Taylor Francis, communication and computing systems: proceedings of the international conference on communication and computing system (ICCCS 2016), pp 451–455

  • Choubey DK, Paul S, Dhandhenia VK (2017b) Rule based diagnosis system for diabetes. Biomed Res 28(12):5196–5209

    Google Scholar 

  • Choubey DK, Paul S, Sandilya S, Dhandhenia VK (2018) Implementation and analysis of classification algorithms for diabetes. Current medical imaging reviews. Bentham Science, Sharjah

    Google Scholar 

  • Choubey DK, Paul S, Bala K, Kumar M, Singh UP (2019a) Implementation of a hybrid classification method for diabetes. Innovations in multimedia data engineering and management. IGI Global, Hershey, pp 201–240

    Google Scholar 

  • Choubey DK, Paul S, Dhandhenia VK (2019b) GA_NN: an intelligent classification system for diabetes. Chapter 2, Soft Computing for Problem Solving, Advances in Intelligent Systems and Computing 817. Springer, Berlin, vol 2, pp 11–23

  • Choubey DK, Tripathi S, Kumar P, Shukla V, Dhandhania VK (2019c) Classification of diabetes by kernel based SVM with PSO, recent patents on computer science. Bentham Science, Sharjah

    Google Scholar 

  • Choubey DK, Kumar M, Shukla V, Tripathi S, Dhandhania VK (2019d) Comparative analysis of classification methods with PCA and LDA for diabetes. Current diabetes reviews. Bentham Science, Sharjah

    Google Scholar 

  • Chuang L-Y, Tsai S-W, Yang C-H (2011) Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst Appl 38:12699–12707

    Google Scholar 

  • Daho MEH, Settouti N, Lazouni MEA, Chikh MA (2013) Recognition of diabetes disease using a new hybrid learning algorithm for nefclass. In: 8th international workshop on systems, signal processing and their applications (WoSSPA), pp 239–243

  • Dogantekin E, Dogantekin A, Avci D, Avci L (2010) An intelligent diagnosis system for diabetes on linear discriminant analysis and adaptive network based fuzzy inference system: LDA–ANFIS. Digit Signal Proc 20:1248–1255

    Google Scholar 

  • Dreiseitl S, Ohno-Machado L (2003) Logistic regression and artificial neural network classification models: a review. J Biomed Inf 35:352–359

    Google Scholar 

  • Ephzibah EP (2011) Cost effective approach on feature selection using genetic algorithms and fuzzy logic for diabetes diagnosis. Int J Soft Comput (IJSC) 2(1):1–10

    Google Scholar 

  • Escalante HJ, Montes M, Sucar LE (2009) Particle swarm model selection. J Mach Learn Res 10:405–440

    Google Scholar 

  • Ganji MF, Abadeh MS (2010) Using fuzzy ant colony optimization for diagnosis of diabetes disease. In: Proceedings of ICEE, May 11–13, IEEE, pp 501–505

  • Ghosh SR, Waheed S (2016) A critical study of selected classification algorithms for liver disease diagnosis. Int J Comput Sci Inf Technol (IJCSIT) 7(6):2561–2565

    Google Scholar 

  • Goncalves LB, Bernardes MM, Vellasco R (2006) Inverted hierarchical neuro-fuzzy BSP system: a novel neuro-fuzzy model for pattern classification and rule extraction in databases. IEEE Trans Syst Man Cybern Part C Appl Rev 36(2):236–248

    Google Scholar 

  • Guo Y, Bai G, Hu Y (2012) Using bayes network for prediction of type-2 diabetes. In: Internet technology and secured transactions. IEEE, New York, pp 471–472

  • Gutierrez PA, Hervas-Martinez C, Martinez-Estudillo FJ (2011) Logistic regression by means of evolutionary radial basis function neural networks. IEEE Trans Neural Networks 22(2):246–263

    Google Scholar 

  • Hemant P, Pushpavathi T (2012) A novel approach to predict diabetes by cascading clustering and classification. In: Computing communication and networking technologies (ICCCNT), pp 1–7

  • Jabbar MA (2017) Prediction of heart disease using K-nearest neighbor and particle swarm optimization. Biomed Res 28(9):4154–4158

    Google Scholar 

  • Jabbar MA, Deekshatulu BL, Chandra P (2013) Classification of heart disease using K-nearest neighbor and genetic algorithm. In: International conference on computational intelligence: modeling techniques and applications (CIMTA). Elsevier, Amsterdam, vol 10, pp 85–94

  • Jayalakshmi T, Santhakumaran A (2010) A novel classification method for diagnosis of diabetes mellitus using artificial neural networks. In: International conference on data storage and data engineering (DSDE Bangalore, India), pp 159–163

  • Jiawei H, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Elsevier, Amsterdam, pp 1–702

    MATH  Google Scholar 

  • Jin Z, Zhou G, Gao D, Zhang Y (2018) EEG classification using sparse bayesian extreme learning machine for brain-computer interface. Neural Comput Appl 1–9

  • Kahramanli H, Allahverdi N (2008) Design of a hybrid system for the diabetes and heart diseases. Expert Syst Appl 35:82–89

    Google Scholar 

  • Kala R, Vazirani H, Khanwalkar N, Bhattacharya M (2010) Evolutionary radial basis function network for classificatory problems. Int J Comput Sci Appl Technomath Res Found 7(4):34–49

    Google Scholar 

  • Kalaiselvi C, Nasira GM (2014) A new approach for diagnosis of diabetes and prediction of cancer using ANFIS. In: World congress on computing and communication technologies. IEEE, New York, pp 188–190

  • Kandhasamy J Pradeep, Balamurali S (2015) Performance analysis of classifier models to predict diabetes mellitus. Procedia Comput Sci 47:45–51

    Google Scholar 

  • Karatsiolis S, Schizas CN (2012). Region based support vector machine algorithm for medical diagnosis on Pima Indian diabetes dataset. In: Proceedings of the IEEE 12th international conference on bioinformatics and bioengineering (BIBE), Larnaca, Cyprus, pp 139–134

  • Karegowda AG, Manjunath AS, Jayaram MA (2011) Application of genetic algorithm optimized neural network connection weights for medical diagnosis of Pima Indians diabetes. Int J Soft Comput (IJSC). 2(2):15–23

    Google Scholar 

  • Kayaer K, Yildirim T (2003) Medical diagnosis on Pima Indian diabetes using general regression neural networks. In: Proceedings of the international conference on artificial neural networks and neural information processing (ICANN/ICONIP). IEEE, New York, pp 181–184

  • Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceeding IEEE international conference neural network (ICNN), vol 4, pp 1942–1948

  • Lee C-S (2011) A fuzzy expert system for diabetes decision support application. IEEE Trans Syst Man Cybern-Part B: Cybern 41(1):139–153

    Google Scholar 

  • Lin SW, Ying C, Chen S-C, Lee Z-J (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35:1817–1824

    Google Scholar 

  • Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst Appl 38:4600–4607

    Google Scholar 

  • Meza-Palacios R, Aguilar-Lasserre AA, Enrique L, Vázquez-Rodríguez CF, Posada-Gómez R, Trujillo-Mata A (2017) Development of a fuzzy expert system for the nephropathy control assessment in patients with type 2 diabetes mellitus. Expert Syst Appl 72:335–343

    Google Scholar 

  • Nkounkou B, Lee C, Huang C-H, Brown C (2010) Biological data classifications with LDA and SPRT. In: International conference on bioinformatics and biomedicine workshops. IEEE, New York, pp 164–168

  • Nookala GKM, Pottumuthu BK, Orsu N, Mudunuri SB (2013) Performance analysis and evaluation of different data mining algorithms used for cancer classification. Int J Adv Res Artif Intell (IJARAI) 2(5):49–55

    Google Scholar 

  • Orkcu H Hasan, Bal H (2011) Comparing performances of backpropagation and genetic algorithms in the data classification. Expert Syst Appl 38:3703–3709

    Google Scholar 

  • Parashar A, Burse K, Rawat K (2014a) A comparative approach for pima Indians diabetes diagnosis using LDA-support vector machine and feed forward neural network. Int J Adv Res Comput Sci Softw Eng 4(11):378–383

    Google Scholar 

  • Parashar A, Burse K, Rawat K (2014b) Diagnosis of pima indians diabetes by LDA-SVM approach: a survey. Int J Eng Res Technol (IJERT) 3(10):1192–1194

    Google Scholar 

  • Patil BM, Joshi RC, Toshniwal D (2010) Association rule for classification of type-2 diabetic patients. In: Second international conference on machine learning and computing. IEEE, New York, pp 330–334

  • Polat K, Gunes S (2007) An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit Signal Proc 17:702–710

    Google Scholar 

  • Polat K, Gunes S, Arslan A (2008) A cascade learning system for classification of diabetes disease: generalized discriminant analysis and least square support vector machine. Expert Syst Appl 34:482–487

    Google Scholar 

  • Prabhat A, Khuller V (2017) Sentiment classification on big data using naive bayes and logistic regression. In: International conference on computer communication and informatics (ICCCI-017). IEEE, 2017

  • Qasem SN, Shamsuddin SM (2011) Radial basis function network based on time variant multi objective particle swarm optimization for medical diseases diagnosis. Appl Soft Comput 11:1427–1438

    Google Scholar 

  • Raghavendra BK, Simha JB (2010) Evaluation of logistic regression model with feature selection methods on medical dataset. Int J Comput Intell 1(2):35–42

    Google Scholar 

  • Raikwal JS, Saxena K (2012) Performance evaluation of SVM and K-nearest neighbor algorithm over medical dataset. Int J Comput Appl 50(14):35–39

    Google Scholar 

  • Saravananathan K, Velmurugan T (2016) Analyzing diabetic data using classification algorithms in data mining. Indian J Sci Technol 9(43):1–6

    Google Scholar 

  • Sathasivam S, Hamadneh N, Choon OH (2011) Comparing neural networks: hopfield network and RBF network. Appl Math Sci 5(69):3439–3452

    Google Scholar 

  • Sayana AK, Sreelakshmi KP, Vanitha T (2017) A comparative study of KNN and SVM data classification algorithms in chronic kidney disease. Int J Latest Trends Eng Technol, Special Issue SACAIM, pp 426–429

  • Seera M, Lim CP (2014) A hybrid intelligent system for medical data classification. Expert Syst Appl 41:2239–2249

    Google Scholar 

  • Selva kumar S, Senthamarai Kannan K, Gothai Nachiyar S (2017) Prediction of diabetes diagnosis using classification based data mining techniques. Int J Stat Syst 12(2):183–188

    Google Scholar 

  • Selvakuberan K, Kayathiri D, Harini B, Devi MI (2011) An efficient feature selection method for classification in healthcare systems using machine learning techniques. In: 2011 3rd international conference on electronics computer technology. IEEE, New York, pp 223–226

  • Sharma R, Kumar S, Maheshwari R (2015) Comparative analysis of classification techniques in data mining using different datasets. Int J Comput Sci Mob Comput (IJCSMC) 4(12):125–134

    Google Scholar 

  • Shen L, Tan EC (2005) Dimension reduction-based penalized logistic regression for cancer classification using microarray data. IEEE/ACM Trans Comput Biol Bioinf 2(2):166–175

    Google Scholar 

  • Shouman M, Turner T, Stocker R (2012) Applying K-nearest neighbor in diagnosing heart disease patients. Int J Inf Educ Technol 2(3):220–223

    Google Scholar 

  • Siddique AQ, Hossain MS (2013) Predicting heart-disease from medical data by applying naive bayes and Apriori algorithm. Int J Sci Eng Res (IJSER) 4(10):224–231

    Google Scholar 

  • Tamizharasi K, Umarani, Rajasekaran K (2014) Performance analysis of various data mining algorithms. Int J Comput Commun Inf Syst (IJCCIS) 6(3):118–127

    Google Scholar 

  • Temurtas H, Yumusak N, Temurtas F (2009) A comparative study on diabetes disease diagnosis using neural networks. Expert Syst Appl 36:8610–8615

    Google Scholar 

  • Tsanas A, Little MA, McSharry PE (2013) A methodology for the analysis of medical data Handbook of systems and complexity in health. Springer, Berlin, pp 113–125

    Google Scholar 

  • Vijyan V, Ravi Kumar A (2014) Study of data mining algorithms for prediction and diagnosis of diabetes mellitus. Int J Comput Appl 95(17):12–16

    Google Scholar 

  • Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recogn Lett 28:459–471

    Google Scholar 

  • Wu H, Yang S, Huang Z, He J, Wang X (2018) Type 2 diabetes mellitus prediction model based on data mining. Inf Med Unlocked 10:100–107

    Google Scholar 

  • Xue B, Zhang M, Browne WN (2014) Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276

    Google Scholar 

  • UCI Repository of Bioinformatics Databases [online]. https://www.ics.uci.edu/mlearn/MLRepository.html

  • Zhang X, Yao L, Wang X, Monaghan J, Mcalpine D, Zhang Yu (2016) A survey on deep learning based brain computer interface: recent advances and new frontiers. Hum Comput Interact 1(1):1–66

    Google Scholar 

  • Zhang Yu, Zhang H, Chen X, Liu M, Zhu X, Lee S-W, Shen D (2019) Strength and similarity guided group-level brain functional network construction for MCI diagnosis. Pattern Recogn 88:421–430

    Google Scholar 

Download references

Acknowledgements

The work done by authors fulfills all the ethical terms and conditions. The data used in the research work were selective and anonymous. Confidentiality of personal and medical data of the patients has been maintained in all aspects. The authors would like to first thank all the patients of Bombay Medical Hall, Mahabir Chowk, Pyada Toli, Upper Bazar, Ranchi, Jharkhand, India who gave us information very patiently, Dr. Vinay Kumar Dhandhania, Diabetologist; M/s Sneha Verma Dietitian; Linus ji, and remaining staff of Bombay Medical Hall, Ranchi, India who helped us to collect and compile the dataset of diabetic and non-diabetic patients.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dilip Kumar Choubey.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choubey, D.K., Kumar, P., Tripathi, S. et al. Performance evaluation of classification methods with PCA and PSO for diabetes. Netw Model Anal Health Inform Bioinforma 9, 5 (2020). https://doi.org/10.1007/s13721-019-0210-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13721-019-0210-8

Keywords

Navigation