Skip to main content
Log in

A novel feature selection approach with integrated feature sensitivity and feature correlation for improved prediction of heart disease

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

This paper presents a random forest-feature sensitivity and feature correlation (RF-FSFC) technique for enhanced heart disease prediction. The proposed methodology is implemented using the Cleveland heart disease dataset which comprises a total of 120 heart disease patient records. Data imputation was utilized for missing values, and min–max normalization was utilized for data transformation. We attempted to construct an RF-based classifier for coronary heart disease in this paper by combining feature sensitivity and correlation analysis. The sensitivity-based feature selection process ranks features according to their value in assessing CHD risk, and the feature correlation analysis phase analyses if there are any correlations between features. The heart disease prediction accuracy of 81.16% was obtained using the proposed RF-FSFCA technique by omitting five features (sex, hemoglobin, TD, CRF, and cirrhosis). When compared to the Naïve Bayes, decision tree, regression analysis, and support vector machine models, the proposed model offered a higher accuracy of 86.141% without omitting any features. It also offered sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) scores of 87.321%, 87.364%, 91.23, and 91.02 respectively. Experiment findings demonstrated that the proposed RF-FSFC approach significantly improves prediction accuracy as compared to other approaches that do not use the integrated Feature selection method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Availability of data and material

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

  • Acharya UR, Faust O, Sree V, Swapna G, Martis RJ, Kadri NA, Suri JS (2014) Linear and nonlinear analysis of normal and CAD-affected heart rate signals. Comput Methods Programs Biomed 113(1):55–68

    Google Scholar 

  • Akay MF (2009) Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl 36(2):3240–3247

    Google Scholar 

  • Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, Kwak KS (2020) A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf Fusion 63:208–222

    Google Scholar 

  • Alizadehsani R, Hosseini MJ, Sani ZA, Ghandeharioun A, Boghrati R (2012) Diagnosis of coronary artery disease using cost-sensitive algorithms. In: 2012 IEEE 12th international conference on data mining workshops (ICDMW), p 9–16, Brussels, Belgium, December

  • Almustafa KM (2020) Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinform 21(1):1–18

    Google Scholar 

  • Ashish L, Kumar S, Yeligeti S (2021) Ischemic heart disease detection using support vector Machine and extreme gradient boosting method. Mater Today Proc. https://doi.org/10.1016/j.matpr.2021.01.715

    Article  Google Scholar 

  • Babaoğlu I, Fındık O, Bayrak M (2010) Effects of principle component analysis on assessment of coronary artery diseases using support vector machine. Expert Syst Appl 37(3):2182–2185

    Google Scholar 

  • Baihaqi WM Setiawan NA, Ardiyanto I (2016 ) Rule extraction for fuzzy expert system to diagnose coronary artery disease. In: International conference on information technology, information systems and electrical engineering (ICITISEE), p 136–141, Yogyakarta, Indonesia, August

  • Bhatla N, Jyoti K (2012) An analysis of heart disease prediction using different data mining techniques. Int J Eng 1(8):1–4

    Google Scholar 

  • Bonow RO, Carabello BA, Chatterjee K, de Leon AC, Faxon DP, Freed MD, Gaasch WH, Lytle BW, Nishimura RA, O’Gara PT, O’Rourke RA (2008) 2008 focused update incorporated into the ACC/AHA 2006 guidelines for the management of patients with valvular heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (writing committee to revise the 1998 guidelines for the management of patients with valvular heart disease) endorsed by the Society of Cardiovascular Anesthesiologists, Society for Cardiovascular Angiography and Interventions, and Society of Thoracic Surgeons. J Am Cardiol 52(13):e1–e142

    Google Scholar 

  • Budholiya K, Shrivastava SK, Sharma V (2020) An optimized XGBoost based diagnostic system for effective prediction of heart disease. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2020.10.013

    Article  Google Scholar 

  • Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Google Scholar 

  • Chen H-L, Yang B, Liu J, Liu D-Y (2011) A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 38(7):9014–9022

    Google Scholar 

  • Cherian RP, Thomas N, Venkitachalam S (2020) Weight optimized neural network for heart disease prediction using hybrid lion plus particle swarm algorithm. J Biomed Inf 110:103543

    Google Scholar 

  • Cook S, Ladich E, Nakazawa G, Eshtehardi P, Neidhart M, Vogel R, Togni M, Wenaweser P, Billinger M, Seiler C, Gay S (2009) Correlation of intravascular ultrasound findings with histopathological analysis of thrombus aspirates in patients with very late drug-eluting stent thrombosis. Circulation 120(5):391–399

    Google Scholar 

  • Davari Dolatabadi A, Khadem SEZ, Asl BM (2017) Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM. Comput Methods Programs Biomed 138:117–126

    Google Scholar 

  • Demuth HB, Beale MH, De Jess O, Hagan MT (2014) Neural network design. Martin Hagan, Stillwater

    Google Scholar 

  • Fayyad UM, Irani K (1992) On the handling of continuousvalued attributes in decision tree generation. Mach Learn 8(1):87–102

    MATH  Google Scholar 

  • Frank A, Asuncion A (2010) UCI Machine learning repository, vol. 213. University of California, School of Information and Computer Science, Irvine, CA, USA, http://archive.ics.uci.edu/ml

  • Fraser VJ, Burd L, Liebson E, Lipschik GY, Peterson CM (2008) Diseases and disorders. Marshall Cavendish Corporation, New York

    Google Scholar 

  • Giri D, Acharya UR, Martis RJ, Sree SV, Lim TC, Thajudin Ahamed VI, Suri JS (2013) Automated diagnosis of coronary artery disease affected patients using LDA, PCA, ICA and discrete wavelet transform. Knowl Based Syst 37:274–282

    Google Scholar 

  • Gowthul Alam MM, Baulkani S (2019a) Geometric structure information based multi-objective function to increase fuzzy clustering performance with artificial and real-life data. Soft Comput 23(4):1079–1098

    Google Scholar 

  • Gowthul Alam MM, Baulkani S (2019b) Local and global characteristics-based kernel hybridization to increase optimal support vector machine performance for stock market prediction. Knowl Inf Syst 60(2):971–1000

    Google Scholar 

  • Hameed AZ, Ramasamy B, Shahzad MA, Bakhsh AAS (2021) Efficient hybrid algorithm based on genetic with weighted fuzzy rule for developing a decision support system in prediction of heart diseases. J Supercomput 77:1–21

    Google Scholar 

  • Hamilton HJ, Shan N, Cercone N (1996) RIAC: a rule induction algorithm based on approximate classification. Computer Science Department, University of Regina, Regina

    Google Scholar 

  • Hassan BA (2020) CSCF: a chaotic sine cosine firefly algorithm for practical application problems. Neural Comput Appl 33:1–20

    Google Scholar 

  • Hassan BA, Rashid TA (2020) Datasets on statistical analysis and performance evaluation of backtracking search optimisation algorithm compared with its counterpart algorithms. Data Brief 28:105046

    Google Scholar 

  • Huang C-L, Liao H-C, Chen M-C (2008) Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Syst Appl 34(1):578–587

    Google Scholar 

  • ISO (1993) Guide to the expression of uncertainty in measurement. International Organization for Standardization, Geneva

    Google Scholar 

  • Jolliffe IT (1986) Principal component analysis and factor analysis. Principal component analysis. Springer, New York, pp 115–128

    Google Scholar 

  • Jose J, Gautam N, Tiwari M, Tiwari T, Suresh A, Sundararaj V, Rejeesh MR (2021) An image quality enhancement scheme employing adolescent identity search algorithm in the NSST domain for multimodal medical image fusion. Biomed Signal Process Control 66:102480

    Google Scholar 

  • Kannel WB, Gordon T, Castelli WP, Margolis JR (1970) Electrocardiographic left ventricular hypertrophy and risk of coronary heart disease. The Framingham study. Ann Intern Med 72(6):813–822

    Google Scholar 

  • Khan MA (2020) An IoT framework for heart disease prediction based on MDCNN classifier. IEEE Access 8:34717–34727

    Google Scholar 

  • Khemphila A, Boonjing V (2011) Heart disease classification using neural network and feature selection. In: 21st international conference on systems engineering (ICSEng), p 406–409

  • Kim JK, Kang S (2017) Neural network-based coronary heart disease risk prediction using feature correlation analysis. J Healthc Eng. https://doi.org/10.1155/2017/2780501

    Article  Google Scholar 

  • Krishnaveni N, Radha V (2019) Feature selection algorithms for data mining classification: a survey. Indian J Sci Technol 1:1. https://doi.org/10.17485/ijst/2018/v12i6/139581

    Article  Google Scholar 

  • Lu Y, Ballew SH, Tanaka H, Szklo M, Heiss G, Coresh J, Matsushita K (2020) 2017 ACC/AHA blood pressure classification and incident peripheral artery disease: the atherosclerosis risk in communities (ARIC) study. Eur J Prev Cardiol 27(1):51–59

    Google Scholar 

  • Maneerat Y, Prasongsukarn K, Benjathummarak S, Dechkhajorn W, Chaisri U (2016) Intersected genes in hyperlipidemia and coronary bypass patients: feasible biomarkers for coronary heart disease. Atherosclerosis 252:183-e184

    Google Scholar 

  • Marateb HR, Goudarzi S (2015) A noninvasive method for coronary artery diseases diagnosis using a clinically-interpretable fuzzy rule-based system. J Res Med Sci 20(3):214–223

    Google Scholar 

  • Mohammadpour RA, Abedi SM, Bagheri S, Ghaemian A (2015) Fuzzy rule-based classification system for assessing coronary artery disease. Comput Math Methods Med 2015 (article ID 564867)

  • Mohan S, Thirumalai C, Srivastava G (2020) Heart disease prediction using machine learning techniques. SN Comput Sci 1(6):1–6

    Google Scholar 

  • Nakashima T, Noguchi T, Haruta S, Yamamoto Y, Oshima S, Nakao K, Taniguchi Y, Yamaguchi J, Tsuchihashi K, Seki A, Kawasaki T (2016) Prognostic impact of spontaneous coronary artery dissection in young female patients with acute myocardial infarction: a report from the angina pectoris—myocardial infarction multicenter investigators in Japan. Int J Cardiol 207:341–348

    Google Scholar 

  • Narain R, Saxena S, Goyal AK (2016) Cardiovascular risk prediction: a comparative study of Framingham and quantum neural network based approach. Patient Prefer Adherence 10:1259–1270

    Google Scholar 

  • N Cardiovascular Diseases (2015) (CVDs) Fact sheet N°317, WHO [updated May 2017]. http://www.who.int/mediacentre/factsheets/fs317/en/index/html

  • Nissen SE, Tuzcu EM, Libby P, Thompson PD, Ghali M, Garza D, Berman L, Shi H, Buebendorf E, Topol EJ, Investigators C (2004) Effect of antihypertensive agents on cardiovascular events in patients with coronary disease and normal blood pressure: the CAMELOT study: a randomized controlled trial. JAMA 292(18):2217–2225

    Google Scholar 

  • Oliver AS, Ganesan K, Yuvaraj SA, Jayasankar T, Sikkandar MY, Prakash NB (2021) Accurate prediction of heart disease based on bio system using regressive learning based neural network classifier. J Ambient Intell Human Comput 2021:1–9

    Google Scholar 

  • Patidar S, Pachori RB, Rajendra Acharya U (2015) Automated diagnosis of coronary artery disease using tunable-Q wavelet transform applied on heart rate signals. Knowl Based Syst 82:1–10

    Google Scholar 

  • Piekarczyk M, Bar O, Bibrzycki Ł, Niedźwiecki M, Rzecki K, Stuglik S, Andersen T, Budnev NM, Alvarez-Castillo DE, Cheminant KA, Góra D (2021) CNN-based classifier as an offline trigger for the CREDO experiment. Sensors 21(14):4804

    Google Scholar 

  • Polat K, Güneş S (2007) Breast cancer diagnosis using least square support vector machine. Digit Signal Proc 17(4):694–701

    Google Scholar 

  • Quinlan JR (1996a) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90

    MATH  Google Scholar 

  • Quinlan JR (1996b) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90

    MATH  Google Scholar 

  • Rani P, Kumar R, Ahmed NM, Jain A (2021) A decision support system for heart disease prediction based upon machine learning. J Reliab Intell Environ 7:1–13

    Google Scholar 

  • Sarmah SS (2020) An efficient IoT-based patient monitoring and heart disease prediction system using deep learning modified neural network. IEEE Access 8:135784–135797

    Google Scholar 

  • Selvi RT, Muthulakshmi I (2021) Modelling the map reduce based optimal gradient boosted tree classification algorithm for diabetes mellitus diagnosis system. J Ambient Intell Human Comput 12(2):1717–1730

    Google Scholar 

  • Shah A, Ahirrao S, Pandya S, Kotecha K, Rathod S (2021) Smart cardiac framework for an early detection of cardiac arrest condition and risk. Front Public Health 9:762303. https://doi.org/10.3389/fpubh.2021.762303

    Article  Google Scholar 

  • Singh P, Singh S, Pandi-Jain GS (2008) Independent component analysis for vision-inspired classification of retinal images with age-related macular degeneration. In: Proceeding of IEEE international conference on image processing SSIAI, p 65–68

  • Singh P, Singh S, Pandi-Jain GS (2018) Effective heart disease prediction system using data mining techniques. Int J Nanomed 13(T-NANO 2014 Abstracts):121

    Google Scholar 

  • Singh G, Singh M, Gupta P (2021) An observational study to compare diagnostic accuracy of lever sign test, anterior drawer test and lachman test in cases of anterior cruciate ligament tears. J Doctor Res 1(1):21–28

    Google Scholar 

  • Sornalakshmi M, Balamurali S, Venkatesulu M, Krishnan MN, Ramasamy LK, Kadry S, Lim S (2021) An efficient apriori algorithm for frequent pattern mining using mapreduce in healthcare data. Bull Electr Eng Inform 10(1):390–403

    Google Scholar 

  • Ster B, Dobnikar A (1996) Neural networks in medical diagnosis: comparison with other methods. In: Proceedings of the international conference on engineering applications of neural networks (EANN ’96), p 427–430

  • Sundararaj V (2016) An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. Int J Intell Eng Syst 9(3):117–126

    Google Scholar 

  • Sundararaj V (2019) Optimised denoising scheme via opposition-based self-adaptive learning PSO algorithm for wavelet-based ECG signal noise reduction. Int J Biomed Eng Technol 31(4):325

    Google Scholar 

  • Sundararaj V, Anoop V, Dixit P, Arjaria A, Chourasia U, Bhambri P, Rejeesh MR, Sundararaj R (2020) CCGPA-MPPT: Cauchy preferential crossover-based global pollination algorithm for MPPT in photovoltaic system. Prog Photovolt Res Appl 28(11):1128–1145

    Google Scholar 

  • Tan KC, Teoh EJ, Yu Q, Goh KC (2009) A hybrid evolutionary algorithm for attribute selection in data mining. Expert Syst Appl 36(4):8616–8630

    Google Scholar 

  • Tsipouras MG, Exarchos TP, Fotiadis DI, Kotsia AP, Vakalis KV, Naka KK, Michalis LK (2008) Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. IEEE Trans Inf Technol Biomed 12(4):447–458

    Google Scholar 

  • Verma M, Kumar D (2021) A correlation-based feature selection and classification approach for autism spectrum disorder. Int J Inf Syst Model Des (IJISMD) 12(2):51–66

    Google Scholar 

  • Vinu S (2019) Optimal task assignment in mobile cloud computing by queue based ant-bee algorithm. Wirel Pers Commun 104(1):173–197

    Google Scholar 

  • Wang C, Zhao Y, Jin B, Gan X, Liang B, Xiang Y, Zhang X, Lu Z, Zheng F (2021) Development and validation of a predictive model for coronary artery disease using machine learning. Front Cardiovasc Med 8(20):43

    Google Scholar 

  • Wong ND (2014) Epidemiological studies of CHD and the evolution of preventive cardiology. Nat Rev Cardiol 11(5):276–289

    Google Scholar 

  • Xu Y, Ye H, Zhu Y, Du S, Xu G, Wang Q (2021) The efficacy of mobile health in alleviating risk factors related to the occurrence and development of coronary heart disease: a systematic review and meta-analysis. Clin Cardiol 44:609–619

    Google Scholar 

  • Zebrack JS, Anderson JL, Maycock CA, Horne BD, Bair TL, Muhlestein JB, Group IH (2002) Usefulness of high-sensitivity C-reactive protein in predicting long-termrisk of death or acute myocardial infarction in patients with unstable or stable angina pectoris or acute myocardial infarction. Am J Cardiol 89(2):145–149

    Google Scholar 

  • Zheng Y, Vanderbeek B, Daniel E, Stambolian D, Maguire M, Brainard D, Gee J (2013) An automated drusen detection system for classifying age-related macular degeneration with color fundus photographs. In: IEEE 10th international symposium on biomedical imaging, p 1440–1443

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Saranya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saranya, G., Pravin, A. A novel feature selection approach with integrated feature sensitivity and feature correlation for improved prediction of heart disease. J Ambient Intell Human Comput 14, 12005–12019 (2023). https://doi.org/10.1007/s12652-022-03750-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-03750-y

Keywords

Navigation