A new machine learning technique for an accurate diagnosis of coronary artery disease
Introduction
Coronary artery disease (CAD) is the commonest type of cardiovascular disease (CVD). It is the main cause of death around the world [1]. Statistics published by World Health Organization (WHO) reveal that about 17.7 million of people died of CAD in 2015, contributing to 31% of all mortality [2], [3]. It is likely that an accurate CAD detection and timely intervention could have averted many of these CAD deaths. This research aims to present a new methodology (a combination of using several classical machine learning algorithms and evolutionary algorithms) to detect CAD.
Abbreviations:
Data mining (sometimes also named knowledge discovery) is a computer-based process to extract useful information from huge amounts of raw data. Some well-known data mining algorithms include: Support Vector Machines (SVMs), Neural Networks (NNs), Decision Trees (DTs), Genetic algorithms (GAs), and Bayesian Networks (BNs). Each of these methods has both advantages and disadvantages and should be used with caution. Based on the type of data and conditions of research (e.g., including variables and parameters for each algorithm), some of these methods produce good results for predicting a disease, while others may not perform optimally. The performance of the methods depends not only on the nature of the algorithm but also on the type of data and conditions of implementation, which may have a significant impact on the results. Therefore, customizing the approach precisely to the type of data being mined and the outcome that is sought can improve the performance of the method.
Nowadays, the use of different machine learning algorithms for diverse applications is increasingly popular and common. Due to the huge size and complexity of data involved in these various applications, extracting hidden and useful information from them is essential. This information can be used to enhance the quality of various services. Analyzing this information, organizations, governments, researchers and others can provide improved services and add value to their interactions with their users, customers, patients, etc. This study concentrates on the application of machine learning and data mining methods to healthcare data [4], [5], [6].
In the healthcare domain, different machine learning techniques have been applied in the past few decades to study various diseases, such as Parkinson's disease [7], [8], liver disease [9], [10], heart disease [11], [12], [13], [14], breast cancer [15], [16], lung cancer [17], etc. In general, the applied methods provided good results for predicting the respective diseases. Timely diagnosis of these diseases, including cancers, with the least possible error is the foremost expectation of patients. Not surprisingly, the detection process demands special knowledge and experience. We believe that machine learning as well as data mining can be applied to improve the accuracy of the approach, decrease the number of diagnostic errors, and ultimately deliver quality services to the patients.
The main objective of the current paper is to design a novel effective model using several machine learning and data mining methods to detect CAD. In this study, we used a Diagnosis Support System (DSS) [18] as CDSS (clinical decision support system), in line with its growing popularity and ubiquitous application in diverse information systems. We tested ten reputable traditional machine learning techniques namely Naive Bayes classifier (gaussNB), linear discriminant analysis (LDA), K-nearest neighbors (KNN), QDA (quadratic discriminant analysis), SVM (nu-SVC called nuSVM), MLP1 (a multilayer perceptron algorithm with one hidden layer that the hidden layer included maximum 200 neurons), linear SVM (LinSVM), Random Forest (with 1000 trees), logistic regression (called reglog), and C-SVC as a type of SVM (called SVC). In the end, the three-best performing algorithms (nu-SVC, nuSVM, and LinSVM) were selected. The clinical CAD dataset used in our study included 303 records with 54 features.
The first step in our experiments was pre-processing of both categorical and numerical attributes using a normalization approach. For feature selection, the elimination of redundant features was performed by using a GA and a PSO. We then optimized the parameters when coupled with feature selection and cross-validation techniques. Our findings showed that the performance of traditional algorithms significantly enhanced using the proposed methodology (named New 2level Genetic optimizer: N2Genetic optimizer). The proposed methodology can potentially be applied for an early diagnosis of CAD as a powerful computer-aided diagnosis system.
Several reasons motivated us to apply data mining and machine learning techniques on CAD data. First, CAD is a leading cause of mortality and morbidity worldwide. Early diagnosis and effective management can prevent these adverse outcomes and improve quality of life. Second, the diagnostic accuracies of conventional methods are not satisfactory. We propose a new methodology for classifying CAD that we believe would surpass the current methods. Third, we aim to design a new DSS system, which is a computer-aided diagnosis system for an early detection of CAD that has a minimal clinical error in the CAD diagnosis. Finally, we envisage that the proposed methodology can be useful in triage, reduce specialist's input, ultimately leading to both cost and time savings during CAD diagnosis. This should appeal to physicians, patients and.
The three main contributions of our work are as follows: (1) investigating pre-processing and testing of well-known machine learning methods applied to CAD diagnosis; (2) studying the use of feature selection based on GA or PSO and (3) designing a new genetic training technique —N2Genetic optimizer — based on the fusion of cross-validation with GA or PSO for parameter optimization and feature selection.
The rest of the work is organized as follows. The literature review is presented in Section 2. In this section, we reviewed some studies that had used data mining and machine learning techniques for detection of heart disease detection, especially CAD. Section 3 reports the proposed methodology. Experimental outcomes and discussion are presented in Section 4. The conclusion and future work are presented in Section 5.
Section snippets
An overview of related work
Remarkable progress has been made in the use of different machine learning algorithms applied to medical datasets for detecting different of diseases, e.g. detection of various types of cancer. The main clinical methods for assessment of heart disease include electrocardiogram (ECG) [19], echocardiogram [20], cardiac computerized tomography (CT) scan [21], Holter monitoring [22], cardiac magnetic resonance imaging (MRI) [23], blood tests [24], and cardiac catheterization [25]. In this section,
Methodology
Several studies on the heart disease detection have been conducted using machine learning algorithms and different data sets [67], [68], [69], [70], [71], [72]. Current research investigates CAD detection using a novel approach. This section provides more details about the proposed methodology. The description of the methodology used is presented below:
- 1.
Pre-processing (data manipulation and normalization):
- -
Categorical attributes -> change according to one hot encoding. Coding 1 with N, for
- -
Experimental results
In this section, feature selection is included to the previous methods. GA was used in parallel twice (parameters optimization + feature selection) with:
- (1)
Fitness function equal to accuracy; and
- (2)
Fitness function equal to F1-score.
As expressed earlier, in the preliminary tests, 10 algorithms including GaussNB, LDA, KNN, QDA, reglog, nuSVM, MLP, LinSVM, Random Forest, and SVC were compared. However, only the best three classifiers (nu-SVC, nuSVM, and LinSVM) were selected for the rest of our study.
Conclusion and future work
CAD is a common deadly heart disease. We presented a novel approach for an early diagnosis of CAD, which may improve the clinical decision-making process. The principal contribution of this study is the machine-based detection system for predicting CAD with a better performance compared to classical machine learning techniques. We have first applied ten traditional machine learning algorithms on the Z-Alizadeh Sani heart disease dataset. In our experiments, we carried out data pre-processing
Declaration of Competing Interest
None.
References (88)
A comparison of multiple classification methods for diagnosis of Parkinson disease
Expert. Syst. Appl.
(2010)- et al.
Diagnosis of valvular heart disease through neural networks ensembles
Comput. Methods Programs Biomed.
(2009) - et al.
Effective diagnosis of heart disease through neural networks ensembles
Expert. Syst. Appl.
(2009) - et al.
Classification of ECG heartbeats using nonlinear decomposition methods and support vector machine
Comput. Biol. Med.
(2017) - et al.
Classification of imbalanced ECG beats using re-sampling techniques and Adaboost ensemble classifier
Biomed. Signal Process. Control
(2018) - et al.
Breast cancer data analysis for survivability studies and prediction
Comput. Methods Programs Biomed.
(2018) - et al.
A knowledge-based system for breast cancer classification using fuzzy logic method
Telemat. Inf.
(2017) - et al.
Data mining in lung cancer pathologic staging diagnosis: correlation between clinical and pathology information
Expert Syst. Appl.
(2015) - et al.
Thromboembolic risks of left atrial thrombus detected by transesophageal echocardiogram
Am. J. Cardiol.
(1997) - et al.
Prediction of serious arrhythmic events after myocardial infarction: signal-averaged electrocardiogram, holter monitoring and radionuclide ventriculography
J. Am. Coll. Cardiol.
(1987)
Diagnostic performance of stress cardiac magnetic resonance imaging in the detection of coronary artery disease: a meta-analysis
J. Am. Coll. Cardiol.
Current complications of diagnostic and therapeutic cardiac catheterization
J. Am. Coll. Cardiol.
Novel methodology of cardiac health recognition based on ECG signals and evolutionary-neural system
Expert Syst. Appl.
A data mining approach for diagnosis of coronary artery disease
Comput. Methods Programs Biomed.
Coronary artery disease detection using computational intelligence methods
Knowl. Based. Syst.
Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm
Comput. Methods Programs Biomed.
Incremental role of resting myocardial computed tomography perfusion for predicting physiologically significant coronary artery disease: a machine learning approach
J. Nuclear Cardiol.
A novel ensemble algorithm for biomedical classification based on ant colony optimization
Appl. Soft. Comput.
Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries
Comput. Methods Programs Biomed.
A novel differential particle swarm optimization for parameter selection of support vector machines for monitoring metal-oxide surge arrester conditions
Swarm Evol. Comput.
An efficient compression of ECG signals using deep convolutional autoencoders
Cogn. Syst. Res.
A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification
Comput. Biol. Med.
Automated detection of arrhythmias using different intervals of tachycardia ECG segments with convolutional neural network
Inf. Sci.
Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals
Inf. Sci.
Novel genetic ensembles of classifiers applied to myocardium dysfunction recognition based on ECG signals
Swarm Evol. Comput.
Classification of myocardial infarction with multi-lead ECG signals and deep CNN
Pattern Recognit. Lett.
Cascaded LSTM recurrent neural network for automated sleep stage classification using single-channel EEG signals
Comput. Biol. Med.
Automated diagnosis of arrhythmia using combination of CNN and LSTM techniques with variable length heart beats
Comput. Biol. Med.
Automated detection of coronary artery disease using different durations of ECG segments with convolutional neural network
Knowl. Based. Syst.
Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus images
Inf. Sci.
Pławiak
Automated diagnosis of coronary artery disease: a review and workflow
Cardiol. Res. Pract.
The inevitable application of big data to health care
JAMA
Integrating predictive analytics into high-value care: the dawn of precision delivery
JAMA
Machine learning and the profession of medicine
JAMA
Impact of Patients’ gender on Parkinson's disease using classification algorithms
J. AI Data Mining
Rule optimization of boosted C5. 0 classification using genetic algorithm for liver disease prediction
Improving the diagnosis of liver disease using multilayer perceptron neural network and boosted decision trees
J. Med. Biol. Eng.
Cognitive computing in intelligent medical pattern recognition systems
Intelligent Control and Automation
Value of the electrocardiogram in identifying heart failure due to left ventricular systolic dysfunction
BMJ
Computed tomography—an increasing source of radiation exposure
NEngl. J. Med.
Cited by (219)
Postoperative delirium prediction after cardiac surgery using machine learning models
2024, Computers in Biology and MedicineComputational detection and interpretation of heart disease based on conditional variational auto-encoder and stacked ensemble-learning framework
2024, Biomedical Signal Processing and ControlMultiple-criteria decision making, feature selection, and deep learning: A golden triangle for heart disease identification
2023, Engineering Applications of Artificial Intelligence