A machine learning-based approach for predicting the outbreak of cardiovascular diseases in patients on dialysis

doi:10.1016/j.cmpb.2019.05.005

Computer Methods and Programs in Biomedicine

Volume 177, August 2019, Pages 9-15

https://doi.org/10.1016/j.cmpb.2019.05.005 Get rights and content

Abstract

Background and Objective: Patients with End- Stage Kidney Disease (ESKD) have a unique cardiovascular risk. This study aims at predicting, with a certain precision, death and cardiovascular diseases in dialysis patients.

Methods: To achieve our aim, machine learning techniques have been used. Two datasets have been taken into consideration: the first is an Italian dataset obtained from the Istituto di Fisiologia Clinica of Consiglio Nazionale delle Ricerche of Reggio Calabria; the second is an American dataset provided by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) repository. From each one we obtained 5 datasets, according to the outcome of interest. We tested different types of algorithm (both linear and non-linear), but the final choice was to use Support Vector Machine. In particular, we obtained the best performances using the non-linear SVC with RBF kernel algorithm, optimizing it with GridSearch. The last is an algorithm useful to search the best combination of hyper-parameters (in our case, to find the best couple (C, γ)), in order to improve the accuracy of the algorithm.

Results: The use of non-linear SVC with RBF kernel algorithm, optimized with GridSearch, allowed to obtain an accuracy of 95.25% in the Italian dataset and of 92.15% in the American dataset, in a timeframe of 2.5 years,in the prediction of Ischaemic Heart Disease. A worse performance was obtained for the other outcomes.

Conclusions: The machine learning-based approach applied in our study is able to predict, with a high accuracy, the outbreak of cardiovascular diseases in patients on dialysis.

Introduction

Patients with end-stage kidney disease (ESKD) have an incredibly high risk of death and cardiovascular disease, risk which is strongly associated with the level of renal function both in community-based studies and in selected populations with established cardiovascular disease [1].

Moderate renal insufficiency carries a 19% excess risk for cardiovascular complications [2], and the risk is even higher in the elderly [3] and in patients with pre-existing cardiovascular disease [4], [5], [6], [7]. In the general population, the so called Framingham risk factors are able to predict mortality and cardiovascular events [8], [9], [10]. Similarly, the Body Mass Index (BMI) was found to interact with the Framingham score in predicting incident CV diseases in general population [11] and in comorbid conditions [12], while educational status [13] and marital status [14] have been widely associated with mortality in general population. Contrarily to the general population, where as much as the 75% of excess risk for coronary heart disease could be explained by Framingham risk factors [15], the excess of risk of CVD in CKD patients it is not so easy to explain. Other factors, the so called “uremic risk factors” might contribute to the increase in cardiovascular risk in patients with ESRD [1], [16].

Machine Learning has been extensively used for predicting clinical outcomes. Literature referred to this technique is increasing, so trying to include it in a paper is a hard task. Here we will limit our description to just some of the most recent studies in various clinical fields and some surveys that may guide the interested reader, for example Jothi et al. [17], focused on healthcare data mining. Even more recently, a comprehensive survey of the most widely used computational models and algorithms can be found in Shishvan et al. [18]. In Kavakiotis et al. [19] and Durgadevi and Kalpana [20] is presented a thorough description of Machine Learning methods applied to diabetes mellitus, whereas in Delen et al. [21] the survival time of patients after thoracic transplantations has been successfully predicted. An application of classifiers for the estimation of heart failure can be found in Tripoliti et al. [22], [23], while Sartakhti et al. [24] applied a support vector machine with optimization by means of simulated annealing for the diagnosis of hepatitis disease. In Lopez-Martnez et al. [25] logistic regression is applied to a large dataset to study the risk factors responsible of the emergence of hypertension. The identification of high-risk patients is the main focus of Panicacci et al. [26]. The authors predict the risk of hospitalization relying on socio-economic and administrative data related to aged citizens.

Recent approaches include the employment of multiple algorithms in order to reduce variance and increase the accuracy of diagnosis. In Wang et al. [27], in which a SVM ensemble is used to predict the outbreak of breast cancer, whereas in Zheng et al. [28] a hybrid algorithm of K-means and SVM applied to the diagnosis of breast cancer is presented. With reference to the clinical field of our study, chronic kidney disease have been studied in Abdelaziz et al. [29] and Park et al. [30], devoted to the analysis of acute kidney injury in cancer patients. Chen et al. [31] applied logistic regression to a large dataset in order to predict the emergence of kidney stone disease. Our study faces a binary classification problem with supervised learning, since we are aware of both of the set of inputs made up of all the features, and of the objective to be achieved, represented by the outcome. To predict the mortality due to cardiovascular events and the outbreak of cardiovascular diseases in dialysis patients, several machine learning algorithms (both linear and not) have been used [32]:

•
Logistic Regression (LR);
•
K-Nearest Neighbor (KNN);
•
Classification Decision Tree (CART);
•
Naïve Bayes (NB);
•
LinearSVC (SVCL);
•
Support Vector Classifier with Radial Basis Function kernel (SVCR);
•
SVC with Polynomial kernel (SVCP).

Among all, the algorithms with the greatest predictive power are LR and SVC with RBF kernel. We decided to use the latter because it is a very powerful and widespread algorithm, and it has already been used for other similar studies.

Finally, it is important to underline that the GridSearch optimization algorithm [33] has been used in order to achieve greater accuracy.

Section snippets

Datasets

In this Section the three datasets adopted in this work are described. General characteristics are summarised in Supplemental Table1. In Supplemental Table 2 how many samples belong to each class are shown, while in Supplemental Table 3 the features are listed. For the purpose of the study we considered 5 cardiovascular outcomes such as cardiovascular death, heart failure, ischemia, arrhythmia, other cardiovascular (65 events), and consequently, 5 datasets.

Data preprocessing

Before starting, categorical variables were checked and converted into numerical form. For example, regarding the outcome, the class 0 indicated the non-occurrence of the event, while the class 1 indicated the occurrence of the event. Moreover, it was necessary to identify and manage missing values (if any). The Italian dataset had no missing values, contrarily to the American one.

In some of the existing records, the outcome t variable was missing and, in this case, we excluded the patients

Model selection and evaluation

To select the model to use, first of all, we need to understand if we were dealing with a linear or non-linear problem.

Usually, in order to measure the degree of correlation (i.e. of linear dependence) between two variables, we proceed with visual analysis using scatter plots.

Due to the high number of variables, this approach was not adopted.

In Supplemental Fig. 1 the two- and three-dimensional scatter plots of calcium as a function of glucose, in order to ascertain if the two data clouds are

Discussion

The final results show that the model used (SVC with RBF kernel and GridSearch algorithm) allows to obtain important results in the prediction of mortality and on the onset of cardiovascular diseases in dialysis patients. Similarly, to the Framingham risk score, developed and validated in the general population, and the other risk scores validated in the field of nephrology [38], [39] our model, once validated in a wider context, will allow to predict the individual risk of mortality and/or CV

Acknowledgments

The HEMO study was conducted by the HEMO Investigators and supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). The data from the HEMO study reported here were supplied by the NIDDK Central Repositories. This manuscript was not prepared in collaboration with Investigators of the HEMO study and does not necessarily reflect the opinions or views of the HEMO study, the NIDDK Central Repositories, or the NIDDK.

References (41)

C. Zoccali
Traditional and emerging cardiovascular and renal risk factors: an epidemiologic perspective
Kidney Int.
(2006)
B.F. Culleton et al.
Cardiovascular disease and mortality in a community-based cohort with mild renal insufficiency
Kidney Int.
(1999)
S.S. Mahmood et al.
The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective
Lancet
(2014)
R.B. Schnabel et al.
50 year trends in atrial fibrillation prevalence, incidence, risk factors, and mortality in the Framingham Heart Study: a cohort study
Lancet
(2015)
J. Robards et al.
Marital status, health and mortality
Maturitas
(2012)
C. Zoccali et al.
Traditional and emerging cardiovascular risk factors in end-stage renal disease
Kidney Int.
(2003)
N. Jothi et al.
Data mining in healthcare - a review
Proc. Comput. Sci.
(2015)
I. Kavakiotis et al.
Machine learning and data mining methods in diabetes research
Comput. Struct. Biotechnol. J.
(2017)
E.E. Tripoliti et al.
Heart failure: diagnosis, severity estimation and prediction of adverse events through machine learning techniques
Comput. Struct. Biotechnol. J.
(2017)
J.S. Sartakhti et al.
Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA)
Comput. Methods Programs Biomed.
(2012)

H. Wang et al.

A support vector machine-based ensemble algorithm for breast cancer diagnosis

Eur. J. Oper. Res.

(2018)

B. Zheng et al.

Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms

Expert Syst. Appl.

(2014)

A. Abdelaziz et al.

A machine learning model for improving healthcare services on cloud computing environment

Measurement

(2018)

Z. Chen et al.

Development of a personalized diagnostic model for kidney stone disease tailored to acute care by integrating large clinical, demographics and laboratory data: the diagnostic acute care algorithm - kidney stones (DACA-KS)

BMC Med. Inform. Decis. Making

(2018)

J. Floege et al.

Development and validation of a predictive mortality risk score from a european hemodialysis cohort

Kidney Int.

(2015)

S.D. Anker et al.

Development and validation of cardiovascular risk scores for haemodialysis patients

Int. J. Cardiol.

(2016)

T. Greene et al.

Design and statistical issues of the hemodialysis (HEMO) study

Controll. Clin. Trials

(2000)

D.E. Weiner et al.

Chronic kidney disease as a risk factor for cardiovascular disease and all-cause mortality: a pooled analysis of community-based studies

J. Am. Soc. Nephrol.

(2004)

P.W. De Leeuw et al.

Prognostic significance of renal function in elderly patients with isolated systolic hypertension: results from the Syst-Eur trial

J. Am. Soc. Nephrol.

(2002)

G. Schillaci et al.

High-normal serum creatinine concentration is a predictor of cardiovascular risk in essential hypertension

Arch. Intern. Med.

(2001)

Cited by (90)

Recent Advances and Future Perspectives in the Use of Machine Learning and Mathematical Models in Nephrology
2022, Advances in Chronic Kidney Disease
Citation Excerpt :
Their worst-performing model was a logistic regression with an AUROC of 0.92, and the similarity in performance raises the question of advantages in the use of logistic regression, due to its higher interpretability.84 Support vector machines were used to predict the outbreak of cardiovascular disease in dialysis patients,85 and natural language processing at annotations in the EHR for symptom identification of dialysis patients.86 One application in which ML (deep learning in particular) excels is in the identification of patterns in images.
We reviewed some of the latest advancements in the use of mathematical models in nephrology. We looked over 2 distinct categories of mathematical models that are widely used in biological research and pointed out some of their strengths and weaknesses when applied to health care, especially in the context of nephrology. A mechanistic dynamical system allows the representation of causal relations among the system variables but with a more complex and longer development/implementation phase. Artificial intelligence/machine learning provides predictive tools that allow identifying correlative patterns in large data sets, but they are usually harder-to-interpret black boxes. Chronic kidney disease (CKD), a major worldwide health problem, generates copious quantities of data that can be leveraged by choice of the appropriate model; also, there is a large number of dialysis parameters that need to be determined at every treatment session that can benefit from predictive mechanistic models. Following important steps in the use of mathematical methods in medical science might be in the intersection of seemingly antagonistic frameworks, by leveraging the strength of each to provide better care.
Cardiovascular disease detection from high utility rare rule mining
2022, Artificial Intelligence in Medicine
We propose a method to search rare cardiovascular disease symptom rules from historical health examination records according to its hazard ratio utility and further detect the disease given new medical record data. Further, we aim to assist both medical experts and patients by alerting the current symptoms and preparing the early treatments. In general, the proposed method first deals with the uncertainty of age and other continuous features using a fuzzy set. Next, we define the hazard ratio utility of each item set to assist the mining process. Based on the utility, we discover the rare cardiovascular disease patterns employing High Utility Rare Itemset Mining. At last, we add a prediction step to check the given health record data whether diagnosed cardiovascular. Subsequently, we can obtain rare symptoms of cardiovascular disease, which are later applied to detect the new related record data. The rare symptoms that are confirmed by their utility risk for cardiovascular disease can assist the medical experts' decision better than the common symptoms as it is often hard to be recognized at a glance. The proposed method evaluated on a public cardiovascular dataset. The experimental results showed that the generated rare cardiovascular disease patterns successfully applied to detect the cardiovascular given the symptoms data.
Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review
2024, BMC Medicine
Soft Set-based Parameter Reduction Algorithm Through a Discernibility Matrix and the Hybrid Approach for the Risk-Factor Prediction of Cardiovascular Diseases by Various Machine Learning Techniques
2024, Pertanika Journal of Science and Technology
Prediction of cardiovascular disease risk based on major contributing features
2023, Scientific Reports
Cardiovascular disease risk prediction using machine learning
2023, AIP Conference Proceedings

View all citing articles on Scopus

View full text

A machine learning-based approach for predicting the outbreak of cardiovascular diseases in patients on dialysis

Abstract

Introduction

Section snippets

Datasets

Data preprocessing

Model selection and evaluation

Discussion

Acknowledgments

Kidney Int.

Kidney Int.

Lancet

Lancet

Maturitas

Kidney Int.

Proc. Comput. Sci.

Comput. Struct. Biotechnol. J.

Comput. Struct. Biotechnol. J.

Comput. Methods Programs Biomed.

Eur. J. Oper. Res.

Expert Syst. Appl.

Measurement

BMC Med. Inform. Decis. Making

Kidney Int.

Int. J. Cardiol.

Controll. Clin. Trials

Chronic kidney disease as a risk factor for cardiovascular disease and all-cause mortality: a pooled analysis of community-based studies

J. Am. Soc. Nephrol.

Prognostic significance of renal function in elderly patients with isolated systolic hypertension: results from the Syst-Eur trial

J. Am. Soc. Nephrol.

High-normal serum creatinine concentration is a predictor of cardiovascular risk in essential hypertension

Arch. Intern. Med.