A predictive analytics framework for identifying patients at risk of developing multiple medical complications caused by chronic diseases
Introduction
There is a body of literature in healthcare analytics that looks at the patterns in electronic medical records to predict medical outcomes [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]. Recent developments revealed that chronic diseases cause several medical complications [11]. For example, diabetes type 2 increases the risk for medical complications such as numbness in feet, kidney conditions, high blood pressure, loss of vision and stroke. The advancements in predictive analytics are now being used for data mining of electronic medical records in order to predict patients at risk of developing the complications caused by a chronic disease [[12], [13], [14], [15],16,[17], [18], [19], [20], [21]]. Most of the literature in this area predicts multiple medical complications of chronic diseases by independently predicting each complication. For instance, Choi et al. [13] use machine learning methods to predict four medical complications caused by diabetes type 2. In this paper, each of these four complications were predicted separately, regardless of their inter-relationships. Similarly, Sangi et al. [22] utilize different predictive models to predict the development of T2D complications among diabetes patients. Sangi et al. [22] assumes the complications have no interrelationships.
Despite the approach taken in the healthcare predictive analytics literature to ignore the relationship among multiple complications caused by chronic diseases, the literature noted the dependencies among complications of chronic diseases [11]. For example, Maron [23] systematically reviews the complications for hypertrophic cardiomyopathy and found many of these complications are related to each other. For instance, in order to identify diabetes patients at risk of vision loss, Piri et al. [8] consider the co-existence of other diabetes-related complications as predictors and implement single-task learning. However, in practice, the complications of chronic diseases may develop concurrently, and at the time of prediction analysis, we may not necessarily have information about a particular complication to set it as a predictor. This paper aims to predict multiple complications caused by chronic diseases when they are interrelated. This study extends the literature in this area by setting all multiple complications of a chronic disease that are in the scope of analysis as predicted variables and not predictors.
Despite hypothesis-testing methods that investigate the effect of a variable or a group of variables on developing complications of chronic diseases, the machine learning algorithms extract non-trial patterns in medical datasets or electronic medical records. Furthermore, the machine learning algorithms classify observations [24]. Therefore, the use of machine learning minimizes the insufficiency of using hypothesis testing, which requires a hypothesis based on previous literature [5]. Machine learning algorithms such as logistic regression (LR), neural networks, Support Vector Machine (SVM), and decision trees (DT) have been used extensively in healthcare [19,[25], [26], [27], [28], [29], [30], [31], [32]]. The applications of machine learning in healthcare include, but are not limited to, predicting hospital readmissions [2,33], predicting survival of a medical procedure [5], early detection of chronic diseases [7,13,14,34], and predicting survival of chronic disease [35,36]. Another growing application of the machine learning algorithms is predicting the complications caused by chronic diseases. For instance, machine learning is used to predict vision loss caused by diabetes [8,37]; Kothari et al. [38] predicts heart disease and stroke in patients with type 2 diabetes; Pahl et al. [39] utilizes machine learning to predict heart failure resulting in death among children with cardiomyopathy.
As mentioned earlier, the literature on using machine learning to predict complications of chronic diseases only focuses on one specific complication and develops predictive models that can capture the occurrence of complications. This contrasts with the fact that among patients with chronic diseases, multiple complications are commonly observed and these medical complications are proven to be interrelated [11]. The studies in this area predict multiple clinical variables independently and assume that there is no relationship among different clinical variables e.g. [13]. However, the complications of chronic diseases are often related. For example, heart failure, problems with heart valves, cardiac arrest and sudden death are complications associated with hypertrophic cardiomyopathy [40], which have been shown to be interrelated [23]. Recent work by Piri et al. [8] tried to address this issue when predicting vision loss among diabetes patients by setting related complications as predictors. However, the complications may develop concurrently and the patient may not have these complications, but may be at risk of developing them. Therefore, the complications must be considered prediction outcomes.
Despite the literature that uses single-task learning (STL) to independently predict patients at risk of developing multiple medical complications of a chronic disease, this study aims to use multi-task learning (MTL) [41] by taking the dependencies of complications into account. MTL has been widely used in the literature of healthcare analytics [42,43] to identify patients at risk of developing a condition when the risk factors are related. For instance, MTL has been used to predict the progression of Alzheimer’s disease based on interacting clinical variables [44,45]; Razavian et al. [46] utilizes MTL to predict chronic kidney disease progression. Unfortunately, these studies use MTL in order to predict one particular medical condition such as Alzheimer’s or chronic kidney disease. To the best of our knowledge, the development of MLT-based models to predict multiple medical complications of a chronic condition has not yet been addressed. As such, this paper is an attempt to answer the following research question (RQ):
Does concurrent modeling of medical complications through the utilization of their intrinsic correlation improve the predication of said complications when compared with the independent prediction of multiple complications for a chronic disease?
The main difference between the current work and the literature in healthcare analytics is related to the utilization of MTL for the prediction of multiple complications of chronic diseases. Despite the existing research, which considers only a single medical complication or does not consider intrinsic correlations among complications, the current study extends the said literature by providing an MTL-based model that predicts multiple complications of chronic diseases through the utilization of the dependencies in relation to the complications. The proposed models are compared when predicting heart failure, problems with heart valves, cardiac arrest and sudden death as commonly experienced complications of hypertrophic cardiomyopathy.
The remainder of this paper is organized as follows. Section 2 presents the method suggested in this study; section 3 explains the evaluation of the proposed method; section 4 presents the analysis of the evaluation results; and section 5 discusses the findings and concludes the paper.
Section snippets
Proposed methods
Our objective is to identify patients with a particular chronic disease, hereafter to be referred to as “patients”, who are at risk of developing multiple medical complications related to their chronic disease. In this section, we first set our definition and build our analysis framework, then we propose two methods: (1) independent prediction of multiple complications that assumes no correlation among different complications and uses STL, and (2) concurrent prediction of multiple complications
Evaluation
This section explains how IPMC and CPMC were evaluated and compared to each other in terms of their performance for predicting medical complications of chronic diseases. IPMC and CPMC are general methods independent of what chronic disease is under study. However, for the purpose of evaluation, the present study compares them in the context of hypertrophic cardiomyopathy and its common complications: heart failure, problems with heart valves, and cardiac arrest [40]. Fig. 1 presents the
CPMC performs better than IPMC
As previously discussed, the first objective of evaluation was to compare IPMC and CPMC to examine the performance of STL and MTL. IPMC and CPMC have been trained to predict three complications of hypertrophic cardiomyopathy patients using four different predictive models: LR, DT, SVM and ANN. However, the evaluations have been repeated for 1-year, 3-year, and 8-year predictions. Table 4 and Table 5 show that CPMC outperforms the alternative IPMC predictive models. While a better discrimination
Discussion and conclusion
Policy makers and healthcare professionals have sought to reconstruct daily generated EMRs and to integrate them into large clinical data warehouses for use in auditing, continuous quality improvement, health service planning, epidemiological studies, and evaluation research [[80], [81], [82], [83]]. Although managing the increasing amount of daily generated EMRs is essential [84], the potential benefits of EMRs are not limited to storage and retrieval of patients’ records. EMRs stored over
Acknowledgment
The authors would like to thank the anonymous reviewers and the editor for their insightful comments and suggestions. Dr. Madjid Tavana is grateful for the partial support he received from the Czech Science Foundation (GAˇCR19-13946S) for this research.
References (91)
- et al.
Personal health indexing based on medical examinations: a data mining approach
Decis Support Syst
(2016) - et al.
Predicting heart transplantation outcomes through data analytics
Decis Support Syst
(2017) - et al.
An analytic approach to better understanding and management of coronary surgeries
Decis Support Syst
(2012) - et al.
A data analytics approach to building a clinical decision support system for diabetic retinopathy: developing and deploying a model ensemble
Decis Support Syst
(2017) - et al.
Integration of data mining classification techniques and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence
Artif Intell Med
(2017) - et al.
Special section on artificial intelligence for diabetes
Artif Intell Med
(2018) - et al.
Robust predictive model for evaluating breast cancer survivability
Eng Appl Artif Intell
(2013) - et al.
Evolutionary-driven support vector machines for determining the degree of liver fibrosis in chronic hepatitis C
Artif Intell Med
(2011) - et al.
Using data mining techniques to predict hospitalization of hemodialysis patients
Decis Support Syst
(2011) - et al.
Lung sounds classification using convolutional neural networks
Artif Intell Med
(2018)
Personalized prediction of drug efficacy for diabetes treatment via patient-level sequential modeling with neural networks
Artif Intell Med
Bayesian averaging over Decision Tree models for trauma severity scoring
Artif Intell Med
Sequential decision tree using the analytic hierarchy process for decision support in rectal cancer
Artif Intell Med
A machine learning-based framework to identify type 2 diabetes through electronic health records
Int J Media Inf Lit
An evaluation of artificial neural networks in predicting pancreatic Cancer survival
J Gastrointest Surg
Predicting overall survivability in comorbidity of cancers: a data mining approach
Decis Support Syst
Incidence of and risk factors for sudden cardiac death in children with dilated cardiomyopathy: a report from the pediatric cardiomyopathy registry
J Am Coll Cardiol
Prediction of anti-cancer drug response by kernelized multi-task learning
Artif Intell Med
Position-aware deep multi-task learning for drug–drug interaction extraction
Artif Intell Med
Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in alzheimer’s disease
NeuroImage
Mathematical modeling and Bayesian estimation for error-prone retail shelf audits
Decis Support Syst
Assessing data quality – a probability-based metric for semantic consistency
Decis Support Syst
Using contextual features and multi-view ensemble learning in product defect identification from online discussion forums
Decis Support Syst
Matrices with high completely positive semidefinite rank
Linear Algebra Its Appl.
Generating random correlation matrices based on vines and extended onion method
J Multivar Anal
Forecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components?
J Econom
Face detection using discriminating feature analysis and support vector machine
Pattern Recognit
Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values
Comput Biol Med
Breast cancer data analysis for survivability studies and prediction
Comput Methods Programs Biomed
Predicting survival time for kidney dialysis patients: a data mining approach
Comput Biol Med
An experimental comparison of performance measures for classification
Pattern Recognit Lett
A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival
Decis Support Syst
An overview of ontologies and data resources in medical domains
Expert Syst Appl
A bi-level interactive decision support framework to identify data mining-oriented electronic health record architectures
Appl Soft Comput
Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature
Int J Media Inf Lit
Healthcare predictive analytics for risk profiling in chronic care: a Bayesian multitask learning approach
MIS Q
Implementing electronic health care predictive analytics: considerations and challenges
Health Aff. (Millwood)
Predictive analytics for readmission of patients with congestive heart failure
Inf. Syst. Res.
Big data in health care: using analytics to identify and manage high-risk and high-cost patients
Health Aff. (Millwood)
A machine learning approach to improving dynamic decision making
Inf. Syst. Res.
An interoperable clinical decision-support system for early detection of SIRS in pediatric intensive care using openEHR
Artif Intell Med
Significant morbidity and mortality among hospitalized end-stage liver disease patients in Medicare
J Pain Symptom Manage
Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning
Invest Ophthalmol Vis Sci
Using recurrent neural network models for early detection of heart failure onset
J Am Med Inform Assoc
Machine learning methods to predict diabetes complications
J Diabetes Sci Technol
Cited by (6)
Generalized mixed prediction chain model and its application in forecasting chronic complications
2023, Journal of the Operational Research SocietyImprove individual treatment by comparing treatment benefits: cancer artificial intelligence survival analysis system for cervical carcinoma
2022, Journal of Translational MedicineAn Experimental Analysis of Drift Detection Methods on Multi-Class Imbalanced Data Streams
2022, Applied Sciences (Switzerland)Predictive analysis using machine learning: Review of trends and methods
2020, 2020 International Symposium on Advanced Electrical and Communication Technologies, ISAECT 2020