A machine learning-based approach for predicting the outbreak of cardiovascular diseases in patients on dialysis
Introduction
Patients with end-stage kidney disease (ESKD) have an incredibly high risk of death and cardiovascular disease, risk which is strongly associated with the level of renal function both in community-based studies and in selected populations with established cardiovascular disease [1].
Moderate renal insufficiency carries a 19% excess risk for cardiovascular complications [2], and the risk is even higher in the elderly [3] and in patients with pre-existing cardiovascular disease [4], [5], [6], [7]. In the general population, the so called Framingham risk factors are able to predict mortality and cardiovascular events [8], [9], [10]. Similarly, the Body Mass Index (BMI) was found to interact with the Framingham score in predicting incident CV diseases in general population [11] and in comorbid conditions [12], while educational status [13] and marital status [14] have been widely associated with mortality in general population. Contrarily to the general population, where as much as the 75% of excess risk for coronary heart disease could be explained by Framingham risk factors [15], the excess of risk of CVD in CKD patients it is not so easy to explain. Other factors, the so called “uremic risk factors” might contribute to the increase in cardiovascular risk in patients with ESRD [1], [16].
Machine Learning has been extensively used for predicting clinical outcomes. Literature referred to this technique is increasing, so trying to include it in a paper is a hard task. Here we will limit our description to just some of the most recent studies in various clinical fields and some surveys that may guide the interested reader, for example Jothi et al. [17], focused on healthcare data mining. Even more recently, a comprehensive survey of the most widely used computational models and algorithms can be found in Shishvan et al. [18]. In Kavakiotis et al. [19] and Durgadevi and Kalpana [20] is presented a thorough description of Machine Learning methods applied to diabetes mellitus, whereas in Delen et al. [21] the survival time of patients after thoracic transplantations has been successfully predicted. An application of classifiers for the estimation of heart failure can be found in Tripoliti et al. [22], [23], while Sartakhti et al. [24] applied a support vector machine with optimization by means of simulated annealing for the diagnosis of hepatitis disease. In Lopez-Martnez et al. [25] logistic regression is applied to a large dataset to study the risk factors responsible of the emergence of hypertension. The identification of high-risk patients is the main focus of Panicacci et al. [26]. The authors predict the risk of hospitalization relying on socio-economic and administrative data related to aged citizens.
Recent approaches include the employment of multiple algorithms in order to reduce variance and increase the accuracy of diagnosis. In Wang et al. [27], in which a SVM ensemble is used to predict the outbreak of breast cancer, whereas in Zheng et al. [28] a hybrid algorithm of K-means and SVM applied to the diagnosis of breast cancer is presented. With reference to the clinical field of our study, chronic kidney disease have been studied in Abdelaziz et al. [29] and Park et al. [30], devoted to the analysis of acute kidney injury in cancer patients. Chen et al. [31] applied logistic regression to a large dataset in order to predict the emergence of kidney stone disease. Our study faces a binary classification problem with supervised learning, since we are aware of both of the set of inputs made up of all the features, and of the objective to be achieved, represented by the outcome. To predict the mortality due to cardiovascular events and the outbreak of cardiovascular diseases in dialysis patients, several machine learning algorithms (both linear and not) have been used [32]:
- •
Logistic Regression (LR);
- •
K-Nearest Neighbor (KNN);
- •
Classification Decision Tree (CART);
- •
Naïve Bayes (NB);
- •
LinearSVC (SVCL);
- •
Support Vector Classifier with Radial Basis Function kernel (SVCR);
- •
SVC with Polynomial kernel (SVCP).
Among all, the algorithms with the greatest predictive power are LR and SVC with RBF kernel. We decided to use the latter because it is a very powerful and widespread algorithm, and it has already been used for other similar studies.
Finally, it is important to underline that the GridSearch optimization algorithm [33] has been used in order to achieve greater accuracy.
Section snippets
Datasets
In this Section the three datasets adopted in this work are described. General characteristics are summarised in Supplemental Table1. In Supplemental Table 2 how many samples belong to each class are shown, while in Supplemental Table 3 the features are listed. For the purpose of the study we considered 5 cardiovascular outcomes such as cardiovascular death, heart failure, ischemia, arrhythmia, other cardiovascular (65 events), and consequently, 5 datasets.
Data preprocessing
Before starting, categorical variables were checked and converted into numerical form. For example, regarding the outcome, the class 0 indicated the non-occurrence of the event, while the class 1 indicated the occurrence of the event. Moreover, it was necessary to identify and manage missing values (if any). The Italian dataset had no missing values, contrarily to the American one.
In some of the existing records, the outcome t variable was missing and, in this case, we excluded the patients
Model selection and evaluation
To select the model to use, first of all, we need to understand if we were dealing with a linear or non-linear problem.
Usually, in order to measure the degree of correlation (i.e. of linear dependence) between two variables, we proceed with visual analysis using scatter plots.
Due to the high number of variables, this approach was not adopted.
In Supplemental Fig. 1 the two- and three-dimensional scatter plots of calcium as a function of glucose, in order to ascertain if the two data clouds are
Discussion
The final results show that the model used (SVC with RBF kernel and GridSearch algorithm) allows to obtain important results in the prediction of mortality and on the onset of cardiovascular diseases in dialysis patients. Similarly, to the Framingham risk score, developed and validated in the general population, and the other risk scores validated in the field of nephrology [38], [39] our model, once validated in a wider context, will allow to predict the individual risk of mortality and/or CV
Acknowledgments
The HEMO study was conducted by the HEMO Investigators and supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). The data from the HEMO study reported here were supplied by the NIDDK Central Repositories. This manuscript was not prepared in collaboration with Investigators of the HEMO study and does not necessarily reflect the opinions or views of the HEMO study, the NIDDK Central Repositories, or the NIDDK.
References (41)
Traditional and emerging cardiovascular and renal risk factors: an epidemiologic perspective
Kidney Int.
(2006)- et al.
Cardiovascular disease and mortality in a community-based cohort with mild renal insufficiency
Kidney Int.
(1999) - et al.
The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective
Lancet
(2014) - et al.
50 year trends in atrial fibrillation prevalence, incidence, risk factors, and mortality in the Framingham Heart Study: a cohort study
Lancet
(2015) - et al.
Marital status, health and mortality
Maturitas
(2012) - et al.
Traditional and emerging cardiovascular risk factors in end-stage renal disease
Kidney Int.
(2003) - et al.
Data mining in healthcare - a review
Proc. Comput. Sci.
(2015) - et al.
Machine learning and data mining methods in diabetes research
Comput. Struct. Biotechnol. J.
(2017) - et al.
Heart failure: diagnosis, severity estimation and prediction of adverse events through machine learning techniques
Comput. Struct. Biotechnol. J.
(2017) - et al.
Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA)
Comput. Methods Programs Biomed.
(2012)
A support vector machine-based ensemble algorithm for breast cancer diagnosis
Eur. J. Oper. Res.
Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms
Expert Syst. Appl.
A machine learning model for improving healthcare services on cloud computing environment
Measurement
Development of a personalized diagnostic model for kidney stone disease tailored to acute care by integrating large clinical, demographics and laboratory data: the diagnostic acute care algorithm - kidney stones (DACA-KS)
BMC Med. Inform. Decis. Making
Development and validation of a predictive mortality risk score from a european hemodialysis cohort
Kidney Int.
Development and validation of cardiovascular risk scores for haemodialysis patients
Int. J. Cardiol.
Design and statistical issues of the hemodialysis (HEMO) study
Controll. Clin. Trials
Chronic kidney disease as a risk factor for cardiovascular disease and all-cause mortality: a pooled analysis of community-based studies
J. Am. Soc. Nephrol.
Prognostic significance of renal function in elderly patients with isolated systolic hypertension: results from the Syst-Eur trial
J. Am. Soc. Nephrol.
High-normal serum creatinine concentration is a predictor of cardiovascular risk in essential hypertension
Arch. Intern. Med.
Cited by (90)
Recent Advances and Future Perspectives in the Use of Machine Learning and Mathematical Models in Nephrology
2022, Advances in Chronic Kidney DiseaseCitation Excerpt :Their worst-performing model was a logistic regression with an AUROC of 0.92, and the similarity in performance raises the question of advantages in the use of logistic regression, due to its higher interpretability.84 Support vector machines were used to predict the outbreak of cardiovascular disease in dialysis patients,85 and natural language processing at annotations in the EHR for symptom identification of dialysis patients.86 One application in which ML (deep learning in particular) excels is in the identification of patterns in images.
Cardiovascular disease detection from high utility rare rule mining
2022, Artificial Intelligence in MedicinePrediction of cardiovascular disease risk based on major contributing features
2023, Scientific ReportsCardiovascular disease risk prediction using machine learning
2023, AIP Conference Proceedings