Machine learning-based risk prediction model for cardiovascular disease using a hybrid dataset
Introduction
Heart disease (HD) is a prevalent disease that afflicts many people in their middle or old age, and it frequently results in fatal complications. According to 2008 health survey, stroke accounted for about one in 18 deaths in the United States (US). In US, 6,55,000 people per year are died by HD. CVDs affect the cardiovascular system. Approximately one in every eighteen Americans died as a direct result of a stroke in 2008, according to government statistics. [1]. To manage CVD, lifestyle changes are necessary, or the healthcare provider may prescribe medications. The earlier CVD is detected, the easier it is to treat. The common symptoms of CVD are chest pain, an irregular heartbeat, nausea, etc. The most frequently identified possible CVD cause remained BMI. Having high cholesterol and high blood pressure were the second and third most common risk factors for CVD. According to the 2011 survey, men were 1.64 times more likely than women to have CVD [2]. Faced with a global viral pandemic like COVID-19 [3], We must emphasize international objectives to reduce the early mortality led by CVD, which limits healthy and sustainable development in all countries around the world. There is an abundance of research data and hospital patient records available. There are many open resources available to access healthcare information, and research can be conducted to determine how various information and communication technologies can be utilized to predict/ diagnose HD before it turns fatal. ML-based techniques are becoming more common in business and society, and they are now being employed to healthcare [4]. ML is a scientific discipline that studies how machines acquire knowledge from data and develop themself. It is primarily based on statistics and probability [5]. However, when it comes to decision making process, it outperforms standard statistical methodologies. The information gathered from a dataset and fed into the algorithm is referred to as features. The quality of the features offered to the algorithm determines the model’s prediction accuracy.
The job of the ML developer is to identify the subset of attributes that will best fit the objective, thereby boosting the model’s accuracy. There are three basic steps to take in developing the ML prediction model, namely training, testing, and validation [6]. Training is essential because the prediction or classification model’s accuracy is dependent on the training data. The algorithm’s performance will be evaluated using the test dataset. The k-fold validation is required to determine the stability of the model [7]. The primary aim of this research is to build the best early-stage CVD prediction model based on the most optimal attributes. Among the sub-goals are a review of existing approaches for detecting CVD; creating a hybrid dataset with no missing values; determining the best features using the Pearson’s r coefficient of correlation feature selection technique; building various prediction models on a “Sathvi” dataset using different ML algorithms; and evaluating the performance of the best ML algorithm using k-fold cross validation.
Section snippets
Related work
An algorithm’s ability to learn from its own data and experience is known as ML. It is regarded as a component of artificial intelligence. It has a wide range of applications in the fields of electrical [8], health care [9], agriculture [10], meteorology [11], and so on. The HD risk prediction model was developed by Shah et al. [12] using 14 essential attributes. They used NB, decision trees, k-NN, and random forests for data mining classification. They discovered that the k-NN classifier has
Materials and methods
The following stages are involved in the development of a CVD risk prediction model. It begins with the creation of the “Sathvi” dataset, followed by pre-processing the data, feature selection, application of ML classification algorithms, identification of the best ML algorithm, and k-fold cross validation of the selected model.
Proposed machine learning classifiers
The NB, XGBoost, k-NN, SVM, MLP, and CatBoost ML classifiers have been applied for prediction. It is described in this section.
Training and test dataset
The modelling step infers a representative model from the data. Training datasets are collections of data used to construct models, and they contain known features as well as target. Validation of the created model will also require comparison to another well-known dataset referred to as the test dataset or validation dataset. To facilitate this process, it is feasible to partition the entire known dataset into a training and a test set [32]. The “Sathvi” dataset has an 80:20 split between
Conclusion
In this research, the “Sathvi” dataset has been created using the existing four CVD datasets with 531 instances. It does not have any missing values. The “hybrid” and “Sathvi” datasets are available as supplementary files for public use. The risk prediction model was developed with six ML classifiers and identified that the CatBoost ML classifier performs better with a mean accuracy of 94.34% by performing 10-fold cross validation. The risk prediction model was developed with 10 attributes.
CRediT authorship contribution statement
Karthick Kanagarathinam: Conceptualization, Data curation, Writing – original draft, Investigation, Methodology. Durairaj Sankaran: Supervision, Validation, Writing – review & editing. R. Manikandan: Formal analysis, Software, Visualization.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Dr. K. Karthick is working as an Associate Professor in Department of Electrical and Electronics Engineering, GMR Institute of Technology, Rajam, India. He received his B.E. degree in Electrical and Electronics Engineering from Periyar University, Salem, India and a M.E. degree in Power Electronics and Drives from Anna University, Chennai, India. He completed his Doctorate in Electrical Engineering from Anna University, Chennai. He has more than 16 years of experience in teaching. He is the
References (32)
- et al.
The potential for artificial intelligence in healthcare
Future Healthc. J.
(2019) - et al.
Detection and diagnosis of chronic kidney disease using deep learning-based heterogeneous modified artificial neural network
Future Gener. Comput. Syst.
(2020) - et al.
Machine learning approach to predict leaf colour change in fagus sylvatica L. (Spain)
Agricult. Forest Meteorol.
(2021) - et al.
An improved ensemble learning approach for the prediction of heart disease risk
Inform. Med. Unlocked
(2020) - et al.
Deep learning approach for active classification of electrocardiogram signals
Inform. Sci.
(2016) - et al.
Machine learning for the evaluation of the presence of heart disease
Procedia Comput. Sci.
(2020) - et al.
Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions
J. Hydrol.
(2019) - et al.
Heart disease and stroke statistics–2012 update: a report from the American Heart Association
Circulation
(2012) - et al.
Risk factors associated with cardiovascular disease among adult Nevadans
PLoS One
(2021) - et al.
Analysis of ‘earlyR’ epidemic model and time series model for prediction of COVID-19 registered cases
Mater. Today: Proc.
(2020)
Statistical and machine learning forecasting methods: Concerns and ways forward
PLoS One
Machine learning algorithm validation with a limited sample size
PLoS One
Data classification with k-fold cross validation and holdout accuracy estimation methods with 5 different machine learning techniques
Power quality disturbance detection using machine learning algorithm
Machine learning applications for precision agriculture: A comprehensive review
IEEE Access
Heart disease prediction using machine learning techniques
SN Comput. Sci.
Cited by (17)
MDensNet201-IDRSRNet: Efficient cardiovascular disease prediction system using hybrid deep learning
2024, Biomedical Signal Processing and ControlPotential directions on coronary artery disease prediction using machine learning algorithms: A survey
2024, IAES International Journal of Artificial IntelligenceA Dense Network Approach with Gaussian Optimizer for Cardiovascular Disease Prediction
2023, New Generation ComputingEnhancing Sustainable Urban Energy Management through Short-Term Wind Power Forecasting Using LSTM Neural Network
2023, Sustainability (Switzerland)
Dr. K. Karthick is working as an Associate Professor in Department of Electrical and Electronics Engineering, GMR Institute of Technology, Rajam, India. He received his B.E. degree in Electrical and Electronics Engineering from Periyar University, Salem, India and a M.E. degree in Power Electronics and Drives from Anna University, Chennai, India. He completed his Doctorate in Electrical Engineering from Anna University, Chennai. He has more than 16 years of experience in teaching. He is the member of ISTE. His research interests include data analytics, machine learning, text detection and recognition, image processing, and Electrical Drives.
Dr. S. Durairaj is working as an Assistant Professor in the Department of Mechatronics Engineering at K S Rangasamy College of Technology, Tiruchengode, Tamil Nadu, India. He completed his Doctorate in May 2017 at Anna University, Chennai, India. He completed his M.E. degree in Power Electronics and Drives in 2009. He has more than 12 years of experience in teaching. His research interest includes Green Energy, Power Electronics and drives, machine learning, etc.
Dr. R. Manikandan received his B.E degree in Electronics and Instrumentation Engineering from Annamalai University, Chidambaram in 2002. He obtained his M.E degree in Applied Electronics from Anna University, Chennai in 2008 and his Ph.D. degree in Image/Video Processing from the Department of Advanced Sports Training and Technology at Tamil Nadu Physical Education and Sports University, Chennai in 2014. His main research interests include automation, computer vision and image/video processing. He is now a Professor in the Department of Electronics and Communication Engineering at Panimalar Engineering College, Chennai.