Hybrid model for prediction of heart disease

Sarkar, Bikash Kanti

doi:10.1007/s00500-019-04022-2

Hybrid model for prediction of heart disease

Methodologies and Application
Published: 15 May 2019

Volume 24, pages 1903–1925, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

Bikash Kanti Sarkar¹

538 Accesses
10 Citations
Explore all metrics

Abstract

Heart disease is a leading cause of death in the world. In order to drop its rate, effective and timely diagnosis of the disease is very essential. Numerous automated decision support systems have been developed for this purpose. In the present research, a predictive model consisting of two-level optimization is introduced, to save lives and cost via effective diagnosis of the disease. Level-1 optimization of the model first identifies parallelly an optimal proportion (P_opt) for training and test sets for each dataset on parallel machine. Next, the best training set (T_best) for P_opt is again searched parallelly. On the other hand, level-2 optimization refines the rule set (R) generated by the Perfect Rule Induction by Sequential Method (PRISM) learner on T_best employing parallel genetic algorithm. The experimental results obtained by the model over the heart disease datasets (collected from https://archive.ics.uci.edu/ml) are compared and analysed with its base learner and four state-of-the-art learners, namely C4.5 (decision tree-based classifier), Naïve Bayes, neural network and support vector machine. The empirical outcomes (based on the top performance metrics—prediction accuracy, precision, recall, area under curve values, true positive and false positive rates) positively demonstrate that the new model is proficient in undertaking heart disease treatment. Importantly, the prediction accuracy of the presented hybrid model exceeds around 6% than that of the sequential GA-based hybrid model over almost all the chosen datasets. After all, the proposed system may work as an e-doctor to predict heart attack and assist clinicians to take precautionary steps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Early prediction model for coronary heart disease using genetic algorithms, hyper-parameter optimization and machine learning techniques

Article 13 November 2020

Priya R. L, S. Vinila Jinny & Yash Vijay Mate

Diagnosis of Heart Disease Using an Intelligent Method: A Hybrid ANN – GA Approach

Efficient Heart Disease Prediction Using Modified Hybrid Classifier

References

Al Janabi S, Mahdi MA (2019) Evaluation prediction techniques to achievement an optimal biomedical analysis. Int J Grid Util Comput. https://doi.org/10.1504/IJGUC.2019.10020511
Article Google Scholar
Alba E, Troya JM (2001) Analyzing synchronous and asynchronous parallel distributed genetic algorithms. Gener Comput Syst 17(4):451–465
Article MATH Google Scholar
Alba E, Troya JM (2002) Improving flexibility and efficiency by adding parallelism to genetic algorithms. Stat Comput 12(2):91–114
Article MathSciNet Google Scholar
Ali SH (2012) A novel tool (FP-KC) for handle the three main dimensions reduction and association rule mining. In: The 6th international conference on sciences of electronics, technologies of information and telecommunications (IEEE), pp 951–961. https://doi.org/10.1109/setit.2012.6482042
Al-Janabi S, Al-Shourbaji I (2016) A hybrid image steganography method based on genetic algorithm. In: 7th international conference on sciences of electronics, technologies of information and telecommunications (SETIT), Hammamet, pp 398–404. https://doi.org/10.1109/setit.2016.7939903
Al-Janabi S, Patel A, Fatlawi H, Kalajdzic K, Al Shourbaji I (2014) Empirical rapid and accurate prediction model for data mining tasks in cloud computing environments. In: The international congress on technology, communication and knowledge (IEEE), Mashhad, pp 1–8
Al-Janabi S, Al-Shourbaji I, Shojafar M, Shamshirband S (2017) Survey of main challenges (security and privacy) in wireless body area networks for healthcare applications. Egypt Inf J 18(2):113–122
Article Google Scholar
Arabasadi Z, Alizadehsani R, Roshanzamir M, Moosaei H, Yarifard AA (2017) Computer aided decision making for heart disease detection using hybrid neural network-genetic algorithm. Comput Methods Programs Biomed 141:19–26
Article Google Scholar
Berry MW et al (2006) Lecture notes in data mining. World Scientific, Singapore
Book MATH Google Scholar
Blake C, Koegh E, Mertz CJ (1999) Repository of machine learning. University of California at Irvine. https://archive.ics.uci.edu/ml. Accessed Feb 2018
Brown RG (1957) Exponential smoothing for predicting demand. Oper Res 5(1):145
Article Google Scholar
Catlett J (1991) On changing continuous attributes into ordered discrete attributes. In: Proceedings of European working session on learning, pp 164–178
Cendrowska J (1987) PRISM: an algorithm for inducing modular rules. Int J Man-Mach Stud 27:349–370
Article MATH Google Scholar
Cho BH, Yu H, Lee J, Chee YJ, Kim IY, Kim SI (2008) Nonlinear support vector machine visualization for risk factor analysis using nomograms and localized radial basis function kernels. IEEE Trans Inf Technol Biomed 12(2):247–256
Article Google Scholar
Choi E, Schuetz A, Stewart WF, Sun J (2016) Using recurrent neural network models for early detection of heart failure onset. J Am Med Inf Assoc 24(2):361–370
Google Scholar
Clark P, Niblett T (1989) The CN2 algorithm. Mach Learn 3:261–283
Google Scholar
Cohen WW (1995) Fast effective rule induction. In: Proceeding of twelfth international conference on machine learning, pp 115–123
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
MATH Google Scholar
Das R, Sengur A (2010) Evaluation of integrated methods for diagnosing of valvular heart disease. Expert Syst Appl 37(7):5110–5115
Article Google Scholar
Das R, Turkoglu I, Sengur A (2009) Effective diagnosis of heart disease through neural networks ensembles. Expert Syst Appl 36:7675–7680
Article Google Scholar
Duda RO, Hurt PE (1973) Pattern classification and scene analysis. Wiley, New York
Google Scholar
El-Bialy R, Salamay MA, Karam OH, Khalifa ME (2015) Feature analysis of coronary artery heart disease datasets. Procedia Comput Sci 65:459–468
Article Google Scholar
Fernández A, García S, del Jesus MJ, Herrera F (2008) A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst 159(18):2378–2398
Article MathSciNet Google Scholar
Fernández A, del Jesus MJ, Herrera F (2009) Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets. Int J Approx Reason 50:561–577
Article MATH Google Scholar
Fix E, Hodges J (1951) Discriminatory analysis. Nonparametric discrimination: consistency properties. Technical Report 4, USAF School of Aviation Medicine, Randolph Field, Texas
Fu LM (1999) Knowledge discovery based on neural networks. Commun ACM 42(11):47–50
Article Google Scholar
Grefenstette JJ (1981) Parallel adaptive algorithms for function optimization. Report No. CS-81-19, Vanderbilt University, TN
Hidayet T (2018) Improvement of heart attack prediction by the feature selection methods. Turk J Electr Eng Comput Sci 26:1–10
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Jabbar MA, Deekshatulu BL, Chandra P (2013) Classification of heart disease using K-nearest neighbor and genetic algorithm. Procedia Technol 10:85–94
Article Google Scholar
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–450
Article MATH Google Scholar
Jin B, Chao C, Liu Z, Zhang S, Yin X, Wei X (2018) Predicting the risk of heart failure with EHR sequential data modeling. IEEE Access. https://doi.org/10.1109/ACCESS.2017.2789324
Article Google Scholar
Kahramanli H, Allahverdi N (2009) Extracting rules for classification problems: AIS based approach. Expert Syst Appl 36(7):10494–10502
Article Google Scholar
Kira K, Reddell L (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings AAAI-95. AAAI Press/The MIT Press, Cambridge, pp 129–134
Klosgen W, Z’ytkow JM (2002) Handbook of data mining and knowledge discovery. Oxford University Press, Oxford
MATH Google Scholar
Kurgan L, Cios KJ (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16(2):145–153
Article Google Scholar
Lipton ZC, Kale DC, Elkan C, Wetzel R (2016) Learning to diagnose with LSTM recurrent neural networks. In: ICLR 2016, Caribe Hilton, San Juan, Puerto Rico
Liu X, Hui F (2014) PSO-based support vector machine with Cuckoo Search technique for clinical disease diagnoses. Sci World J vol. 2014, Article ID 548483, 7 pages. http://dx.doi.org/10.1155/2014/548483
Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst Appl 38(4):4600–4607
Article Google Scholar
Mitchell TM (1997) Machine learning. McGraw-Hill, New York
MATH Google Scholar
Montalbano M (1974) Decision tables. SRA, Chicago
Google Scholar
Nahar J, Imam T, Tickle KS, Chen YPP (2013) Computational intelligence for heart disease diagnosis: a medical knowledge driven approach. Expert Syst Appl 40(1):96–104
Article Google Scholar
Nikolaiev S, Timoshenko Y (2015) Reinvention of the cardiovascular diseases prevention and prediction due to ubiquitous convergence of mobile apps and machine learning. In: IEEE explorer proceedings titled—information technologies in innovation business (ITIB), 7–9 October, 2015, Kharkiv, Ukraine
Ozcift A, Gulten A (2011) Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Comput Methods Programs Biomed 104(3):443–451
Article Google Scholar
Park YJ, Chun SH, Kim BC (2011) Cost-sensitive case-based reasoning using a genetic algorithm: application to medical diagnosis. Artif Intell Med 51(2):133–145
Article Google Scholar
Pawlak Z, Slowinski R (1994) Rough set approach to multi-attribute decision analysis. Eur J Oper Res 472:43–459
MATH Google Scholar
Pfahringer B (1995) Supervised and unsupervised discretization of continuous features. In: Proceedings of 12th international conference on machine learning, pp 456–463
Purushottam, Saxena K, Sharma R (2016) Efficient heart disease prediction system. Procedia Comput Sci 85:962–969
Article Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufman, San Mateo
Google Scholar
Quinn MJ (1994) Parallel computing: theory and practice, 2nd edn. McGraw-Hill Inc, New York. ISBN 0-07-051294-9
Google Scholar
Sarkar BK (2016) A case study on partitioning data for classification. Int J Inf Decis Sci 8(1):73–91
Google Scholar
Sarkar BK, Sachdev K, Bharati S, Bhaskar A (2005) An interface for converting rules generated by C4.5 to the most suitable format for Genetic Algorithm. In: Proceedings of the eighth international conference on IT (CIT-2005), Bhubaneswar, India, pp 113–115
Sarkar BK, Sana SS, Chaudhuri KS (2011a) MIL: a data discretization approach. Int J Data Min Model Manag (IJDMMM) 3(3):303–318
Google Scholar
Sarkar BK, Sana SS, Chaudhuri KS (2011b) Selecting informative rules with parallel genetic algorithm in classification problem. Appl Math Comput 218:3247–3264
MATH Google Scholar
Simon D (2013) Evolutionary optimization algorithms: biologically-inspired and population-based approaches to computer intelligence. Wiley, Hoboken
MATH Google Scholar
Skurichina M, Duin RPW (2002) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Appl 5(2):121–135
Article MathSciNet MATH Google Scholar
Stasis AC, Loukis EN, Pavlopoulos SA, Koutsouris D (2003) Using decision tree algorithms as a basis for a heart sound diagnosis decision support system. In: Proceedings of the 4th annual IEEE conference on information technology applications in biomedicine, UK, pp 354–357
Suresh P, Ananda Raj MD (2018) Study and analysis of prediction model for heart disease: an optimization approach using genetic algorithm. Int J Pure Appl Math 119(16):5323–5336
Google Scholar
Tsai C-J, Lee C-I, Yang W-P (2008) A discretization algorithm based on class-attribute contingency coefficient. Inf Sci 178:714–731
Article Google Scholar
Vijayarani S, Divya M (2011) An efficient algorithm for generating classification rules. Int J Comput Sci Technol 2(4):512–515
Google Scholar
WEKA 3.4.6. Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka. Accessed Feb 2018

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India
Bikash Kanti Sarkar

Authors

Bikash Kanti Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bikash Kanti Sarkar.

Ethics declarations

Conflict of interest

The author declares that there is no conflict of interest regarding the publication of this paper.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Description Heart (Swiss) dataset

Classification dataset

The dataset for a classification problem is called as classification dataset. For better understanding about classification dataset, heart (Swiss) dataset with 13 non-target attributes (denoted as A₁,…, A₁₃) and its values (used in the present experiment) are described below.

1.
Age (A₁): age in years.
2.
Sex (A₂): sex (1 = male; 0 = female).
3.
Cp (A₃): chest pain type (4 types): Value 1: typical angina, Value 2: atypical angina, Value 3: non-anginal pain and Value 4: asymptomatic.
4.
Trestbps (Resting blood pressure) (A₄): in mm Hg on admission to the hospital.
5.
Chol (A₅): serum cholesterol in mg/dl,fbs.
6.
Fbs (A₆): (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false).
7.
Restecg (A₇): resting electrocardiographic results (of 3 types: Value 0: normal, Value 1: having ST-T wave abnormality (T-wave inversions and/or ST elevation or depression of > 0.05 mV), Value 2: showing probable or definite left ventricular hypertrophy by Estes’ criteria.
8.
Thalach (A₈): maximum heart rate achieved.
9.
Exang (A₉): exercise-induced angina (1 = yes; 0 = no).
10.
Oldpeak (A₁₀) = ST depression induced by exercise relative to rest.
11.
Slope (A₁₁): the slope of the peak exercise ST (T-wave) segment with values: Value 1: upsloping, Value 2: flat and Value 3: downsloping.
12.
Ca (A₁₂): number of major vessels (0-3) coloured by fluoroscopy.
13.
Thalassemia (Thal) (A₁₃): an inherited blood disorder: 3 = normal; 6 = fixed defect; 7 = reversible defect.

The num/goal (i.e. class denoted by C) field of each database refers to the presence and level or absence of heart disease in five forms (integer values: 0 to 4) in the patient, where the value 0 indicates the absence of heart disease (i.e. healthy) and 4 indicates severe. Rest of the values implies the presence of certain level of heart disease (i.e. unhealthy). Due to personal security, patients’ personal identification is replaced with dummy values (decided by the experts).

The dataset has total 123 instances in UCI data repository, and it is in non-discretized form. One part of the entire dataset is shown in Table 8.

Table 8 A part of the heart (Swiss) dataset (non-discretized form)

Full size table

A part of the discretized heart (Swiss) dataset (after preprocessed by MIL-discretizer) is shown in Table 9.

Table 9 A part of the discretized heart (Swiss) dataset

Full size table

Hybrid model

In data mining, a model that integrates the strengths of the data mining (base) approaches to enhance/improve the performance of the base approaches is called as hybrid model. Employing GA in hybrid system usually works to refine the solutions.

Imbalanced dataset

A dataset in which the number(s) of instances of some class(es) is/are very less in comparison with the instances of other classes is termed as imbalanced dataset. The instances of a class with very less in number are known as rare cases.

Appendix B: Rule generation by PRISM Learner

The PRISM learner generates ‘If–Then’ rules over the training dataset, and a sample rule set is shown below. In particular, the PRISM learner works as follows:

The attribute with maximum accuracy (a) = p/t, where p = number of instances (in the training dataset) for attribute value A = v for the target class and t = total number of instances (in the dataset) in which A = v (irrespective to any class value), is selected as a pre-condition for the current rule. Next, discard the examples covered by A = v from the training set (Fig. 6).

2.1 A copy of ‘IF–Then’ rule set (R) generated by PRISM learner

1.
If (A₉ = 1) and (A₁₃ = 1), then (C = 0).

[It may be read as: if (Exang (A₉) = 1 (no)) and (Thalassemia (A₁₃) = 1 (normal)), then heart disease is not present. Actually, the values of A₉ and A₁₃ here are the discretized/mapped values due to application of MIL-discretizer. For example, value 0 (before discretization) for A₉ stands for no, but it is now 1. Likewise, earlier value 3 for A₁₃ stands for normal, but it is now 1.]

2.
If (A₁ = 1) and (A₂ = 1) and (A₄ = 1), then (C = 1).
3.
If (A₂ = 2) and (A₁₃ = 2), then (C = 1).

[If (sex (A₂) = Male) and (Thalassemia (A₁₃) = 2 (fixed defect)), then (heart disease is present with level-1)].

4.
If (A₅ = 1) and (A₉ = 2) and (A₁₀ = 5), then (C = 2).
5.
If (A₁ = 6) and (A₉ = 2), then (C = 2).
6.
If (A₅ = 2) and (A₉ = 2) and (A₁₀ = 5), then (C = 3).
7.
If (A₉ = 2) and (A₁₀ = 6), then (C = 4).

Appendix C: Interface method and its role

Role of the interface s/w

Let us take the above ‘If-Then’ rules as the input to the interface method which finally gives the rules in tabular structure as shown in (Table 10).

Table 10 Rule set (R) in I(R) form using interface software

Full size table

With respect to the proposed GA, Rule-1 and Rule-2 are shown in encoded format as follows:

Rule-1: 000000000000000000000000001000000000011000.
Rule-2: 001001000010000000000000000000000000000001.

Here, each binary block consists of 3 bits and each block represents one attribute (shown above in sequence). Last block represents the target attribute, whereas the rest blocks represent non-target attribute. ‘*’ (don’t care value) = 000.

Over the encoded rules, two-point crossover and then mutation operations are performed to generate new binary offspring which are decoded as per the suggested decoding scheme (discussed in Sect. 3.2.1) resulting in the decoded rules.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarkar, B.K. Hybrid model for prediction of heart disease. Soft Comput 24, 1903–1925 (2020). https://doi.org/10.1007/s00500-019-04022-2

Download citation

Published: 15 May 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s00500-019-04022-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid model for prediction of heart disease

Abstract

Access this article

Similar content being viewed by others

Early prediction model for coronary heart disease using genetic algorithms, hyper-parameter optimization and machine learning techniques

Diagnosis of Heart Disease Using an Intelligent Method: A Hybrid ANN – GA Approach

Efficient Heart Disease Prediction Using Modified Hybrid Classifier

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Description Heart (Swiss) dataset

Classification dataset

Hybrid model

Imbalanced dataset

Appendix B: Rule generation by PRISM Learner

2.1 A copy of ‘IF–Then’ rule set (R) generated by PRISM learner

Appendix C: Interface method and its role

Role of the interface s/w

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid model for prediction of heart disease

Abstract

Access this article

Similar content being viewed by others

Early prediction model for coronary heart disease using genetic algorithms, hyper-parameter optimization and machine learning techniques

Diagnosis of Heart Disease Using an Intelligent Method: A Hybrid ANN – GA Approach

Efficient Heart Disease Prediction Using Modified Hybrid Classifier

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Description Heart (Swiss) dataset

Classification dataset

Hybrid model

Imbalanced dataset

Appendix B: Rule generation by PRISM Learner

2.1 A copy of ‘IF–Then’ rule set (R) generated by PRISM learner

Appendix C: Interface method and its role

Role of the interface s/w

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation