Abstract
Coronary heart disease (CHD) is a major public health problem affecting a nation’s economic and social development. Risk assessing CHD in a timely manner helps to stop, reverse, and reduce the spread of many chronic diseases and health hazards. This paper proposes a cloud-random forest (C-RF) model combining cloud model and random forest to assess the risk of CHD. In this model, based on the traditional classification and regression trees (CART), a weight determining algorithm based on the cloud model and decision-making trial and evaluation laboratory is applied to obtain the weights of the evaluation attributes. The attribute weight and the gain value of the smallest Gini coefficient corresponding to the same attribute are weighted and summed. The weighted sum is then used to replace the original gain value. This value rule is used as a new CART node split criterion to construct a new decision tree, thus forming a new random forest, namely, the C-RF. The Framingham dataset of the Kaggle platform is the research sample for the empirical analysis. Comparing the C-RF model with CART, support vector machine (SVM), convolutional neural network (CNN), and random forest (RF) using standard performance evaluation indexes such as accuracy, error rates, ROC curve and AUC value. The result shows that the classification accuracy of the C-RF model is 85%, which is improved by 8, 9, 4 and 3% respectively compared with CART, SVM, CNN and RF. The error rate of the first type is 13.99%, which is 6.99, 7.44, 4.47 and 3.02% lower than CART, SVM, CNN and RF respectively. The AUC value is 0.85, which is also higher than other comparison models. Thus, the C-RF model is more superior on classification performance and classification effect in the risk assessment of CHD.
Similar content being viewed by others
Data availability
The authors confirm that the data supporting the findings of this study are available within the article.
References
Ahmed H, Younis EMG, Hendawi A, Ali AA (2020) Heart disease identification from patients’ social posts, machine learning solution on Spark. Future Gener Comput Syst 111:714–722
Ali F, El-Sappagh S, Islam SMR, Kwak D, Ali A, Imran M, Kwak KS (2020) A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf Fusion 63:208–222
Avci E (2009) A new intelligent diagnosis system for the heart valve diseases by using genetic-SVM classifier. Expert Syst Appl 36(7):10618–10626
Bentéjac C, Csörgő A, Martínez-Muñoz G (2021) A comparative analysis of gradient boosting algorithms. Artif Intell Rev 54(3):1937–1967
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cardiovascular Diseases (2020) https://www.who.int/westernpacific/health-topics/cardi. Accessed 20 Dec 2020
Chen H, Lin Z, Wu H, Wang L, Wu T, Tan C (2015) Diagnosis of colorectal cancer by near-infrared optical fiber spectroscopy and random forest. Spectrochim Acta Part A 135:185–191
Chen L, Nan G, Li M, Feng B, Liu Q (2022) Manufacturer’s online selling strategies under spillovers from online to offline sales. J Oper Res Soc. https://doi.org/10.1080/01605682.2022.2032426
Cuixart BC, Alemán Sánchez JJA, Banegas BJRB et al (2018) Recomendaciones preventivas cardiovasculares. Actualización PAPPS 2018. Aten Primaria 50:4–28
D’Agostino RB (2008) General cardiovascular risk profile for use in primary care the Framingham heart study. Circulation 118(4):743–753
Dutta A, Batabyal T, Basu M, Acton ST (2020) An efficient convolutional neural network for coronary heart disease prediction. Expert Syst Appl 159:113408
Fontela E, Gabus A (1974) DEMATEL: progress achieved. Futures 6(4):361–363
Gajowniczek K, Grzegorczyk I, Ząbkowski T, Bajaj C (2020) Weighted random forests to improve arrhythmia classification. Electronics 9(1):99
Gao MY, Yang HL, Xiao QZ, Goh M (2021) A novel method for carbon emission forecasting based on Gompertz’s law and fractional grey model: evidence from American industrial sector. Renew Energy 181:803–819
Gao MY, Yang HL, Xiao QZ, Goh M (2022) COVID-19 lockdowns and air quality: evidence from grey spatiotemporal forecasts. Socio-Econ Plan Sci. https://doi.org/10.1016/j.seps.2022.101228
Gárate-Escamila AK, Hassani AHE, Andrès E (2020) Classification models for heart disease prediction using feature selection and PCA. Inform Med Unlocked 19:100330
Grajski KA, Breiman L, Prisco GVD, Freeman WJ (1986) Classification of EEG spatial patterns with a tree-structured methodology: CART. IEEE Trans Biomed Eng 33(12):1076–1086
Guo K, Fu XY, Zhang HM, Wang MJ, Hong SL, Ma SX (2021) Predicting the postoperative blood coagulation state of children with congenital heart disease by machine learning based on real-world data. Transl Pediatrics 10(1):33–43
Hamad K, Al-Ruzouq R, Zeiada W, Dabous SA, Khalil MA (2020) Predicting incident duration using random forests. Transp A 16(3):1269–1293
Han S, Kim H, Lee Y (2020) Double random forest. Mach Learn 109(8):1569–1586
Han SF, Jia XY, Zhu RF, Cao Y, Xu ZY, Meng YF (2021) Gastroenterology nurse prescribing in China: a delphi method. J Adv Nurs 77(3):1228–1243
Herrera F, Herrera-Viedma E, Martı́nez L (2000) A fusion approach for managing multi-granularity linguistic term sets in decision making. Fuzzy Sets Syst 114(1):43–58
Holloway-Brown J, Helmstedt KJ, Mengersen KL (2021) Spatial random forest (S-RF): a random forest approach for spatially interpolating missing land-cover data with multiple classes. Int J Remote Sens 42(10):3756–3776
Hosni M, Carrillo de Gea JM, Idri A, Bajta ME, Alemán JLF, García-Mateos G, Abnane I (2021) A systematic mapping study for ensemble classification methods in cardiovascular disease. Artif Intell Rev 54(4):2827–2861
Jain V, Phophalia A (2020) M-ary random forest—a new multidimensional partitioning approach to random forest. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10047-9
Juan-Jose B, Enrique P, Ester GO, Gema V, Emilia C, Gergana K, Cristian H, Manuel FL (2019) Comparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease). J Biomed Inform 97:103257
Junior JC, Binuesa F, Caneo LF, Turquetto ALR, Arita ECTC, Barbosa AC, Fernandes AMS, Trindade EM, Jatene FB, Dossou P, Jatene MB (2020) Improving preoperative risk-of-death prediction in surgery congenital heart defects using artificial intelligence model: a pilot study. PLoS ONE 15(9):e0238199
Kang YX, Mao SH, Zhang YH (2022) Fractional time-varying grey traffic flow model based on viscoelastic fluid and its application. Transp Res Part B 157:149–174
Li DY, Du Y (2005) Uncertainty artificial intelligence. National Defense Industry Press, Arlington
Li DY, Liu CY, Gan WY (2009) A new cognitive model: cloud model. Int J Intell Syst 24(3):357–375
Li B, Dong XJ, Wen JH (2022) Cooperative-driving control for mixed fleets at wireless charging sections for lane changing behaviour. Energy 243:122976
Liang XW, Jiang AP, Li T, Xue YY, Wang GT (2020) LR-SMOTE—An improved unbalanced data set oversampling based on K-means and SVM. Knowl Based Syst 196:105845
Madani A, Arnaout R, Mofrad M, Arnaout R (2018) Fast and accurate view classification of echocardiograms using deep learning. NPJ Digit Med 1(1):6
Mander A, Clayton D (2000) Hotdeck imputation. Stata Tech Bull 9(51):156–166
Masetic Z, Subasi A (2016) Congestive heart failure detection using random forest classifier. Comput Methods Programs Biomed 130:54–64
Miao KH, Miao JH, Miao GJ (2016) Diagnosing coronary heart disease using ensemble machine learning. Int J Adv Comput Sci Appl 7:30–39
Organization WH (1999) The double burden: emerging epidemics and persistent problems. World Health Rep 221:7
Qian CJ, Wang L, Gao YZ, Yousuf A, Yang XP, Oto A, Shen DG (2016) In vivo MRI based prostate cancer localization with random forests and auto-context model. Comput Med Imaging Graph 52:44–57
Rao CJ, Gao Y (2022) Evaluation mechanism design for the development level of urban-rural integration based on an improved TOPSIS method. Mathematics 10(3):380
Rao CJ, Yan BJ (2020) Study on the interactive influence between economic growth and environmental pollution. Environ Sci Pollut Res 27(31):39442–39465
Rao CJ, Lin H, Liu M (2020a) Design of comprehensive evaluation index system for P2P credit risk of “three rural” borrowers. Soft Comput 24(15):11493–11509
Rao CJ, Liu M, Goh M, Wen JH (2020b) 2-stage modified random forest model for credit risk assessment of P2P network lending to “Three Rurals” borrowers. Appl Soft Comput 95:106570
Ricciardi C, Edmunds KJ, Recenti M, Sigurdsson S, Gudnason V, Carraro U, Gargiulo P (2020) Assessing cardiovascular risks from a mid-thigh CT image: a tree-based machine learning approach using radiodensitometric distributions. Sci Rep 10(1):2863
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104
Safdar S, Zafar S, Zafar N, Khan NF (2018) Machine learning based decision support systems (DSS) for heart disease diagnosis: a review. Artif Intell Rev 50(4):597–623
Shah SMS, Shah FA, Hussain SA (2020) Support vector machines-based heart disease diagnosis using feature subset, wrapping selection and extraction methods. Comput Electr Eng 84:106628
Shao YE, Hou CD, Chiu CC (2014) Hybrid intelligent modeling schemes for heart disease classification. Appl Soft Comput 14:47–52
Shi XP, Wong YD, Li MZF, Palanisamy C, Chai C (2019) A feature learning approach based on XGBoost for driving assessment and risk prediction. Accid Anal Prev 129:170–179
Shilaskar S, Ghatol A (2013) Feature selection for medical diagnosis: evaluation for cardiovascular diseases. Expert Syst Appl 40(10):4146–4153
Soliman H (2020) Random forest based searching approach for RDF. IEEE Access 8:50367–50376
Tian C, Peng JJ, Zhang S, Wang JQ, Goh M (2021) A sustainability evaluation framework for WET-PPP projects based on a picture fuzzy similarity-based VIKOR method. J Clean Prod 289:125130
Tian C, Peng JJ, Zhang ZQ, Wang JQ, Goh M (2022) An extended picture fuzzy MULTIMOORA method based on Schweizer-Sklar aggregation operators. Soft Comput. https://doi.org/10.1007/s00500-021-06690-5
Valarmathi R, Sheela T (2021) Heart disease prediction using hyper parameter optimization (HPO) tuning. Biomed Signal Process Control 70:103033
Wang JQ, Liu T (2012) Uncertain linguistic multi-criteria group decision-making based on cloud model. Control Decis 27(8):1185–1190
Wang YY, Wang DJ, Wang YZ, Jin YC (2017) Improved random forest ensemble classification method to predict survival of colorectal cancer. Manage Sci 30(1):95–106
Wang ST, Wang YY, Wang DJ, Yin YQ, Wang YZ, Jin YC (2020) An improved random forest-based rule extraction method for breast cancer diagnosis. Appl Soft Comput 86:105941
Wei G, Zhao J, Feng YL, He AX, Yu J (2020) A novel hybrid feature selection method based on dynamic feature importance. Appl Soft Comput 93:106337
Wen JH, Wu CZ, Zhang RY, Xiao XP, Nengchao Nv NC, Shi Y (2020) Rear-end collision warning of connected automated vehicles based on a novel stochastic local multivehicle optimal velocity model. Accid Anal Prev 148:105800
Xiao C, Li Y, Jiang YM (2020) Heart coronary artery segmentation and disease risk warning based on a deep learning algorithm. IEEE Access 8:140108–140121
Xie H, Li SY, Sun YH, Han W (2018) Research on DEMATEL method for solving attribute weight based on cloud model. Comput Eng Appl 54(7):257–263
Zhang JY, Zhu HL, Chen YK, Yang CG, Cheng HM, Li Y, Zhong WX, Wang F (2021) Ensemble machine learning approach for screening of coronary heart disease based on echocardiography and risk factors. BMC Med Inform Decis Mak 21(1):187
Zhong Y, Yang HY, Zhang YC, Li P (2021) Online rebuilding regression random forests. Knowl Based Syst 221:106960
Acknowledgements
We would like to thank the editors and the anonymous reviewers for their helpful comments.
Funding
This work is supported by the National Natural Science Foundation of China (No. 72071150, 71671135, 71871174).
Author information
Authors and Affiliations
Contributions
JW: software, writing—original draft. CR: conceptualization, methodology, data curation. MG: formal analysis, supervision, writing—review & editing. XX: visualization, investigation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Ethical approval and consent to participate
There no ethical approval and patient consent to participate are required for this study.
Consent for publication
The authors confirm that the final version of the manuscript has been reviewed, approved, and consented for publication by all authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, J., Rao, C., Goh, M. et al. Risk assessment of coronary heart disease based on cloud-random forest. Artif Intell Rev 56, 203–232 (2023). https://doi.org/10.1007/s10462-022-10170-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-022-10170-z