A predictive analytics framework for identifying patients at risk of developing multiple medical complications caused by chronic diseases

https://doi.org/10.1016/j.artmed.2019.101750Get rights and content

Highlights

  • Patients with chronic diseases are often at risk for multiple correlated complications.

  • Single-task learning predicts these complications but ignores their correlations.

  • We use single- and multi-task learning with different predictive models.

  • We compare prediction performance of hypertrophic cardiomyopathy complications.

  • We show multi-task learning implemented by logistic regression has the best performance.

Abstract

Chronic diseases often cause several medical complications. This paper aims to predict multiple complications among patients with a chronic disease. The literature uses single-task learning algorithms to predict complications independently and assumes no correlation among complications of chronic diseases. We propose two methods (independent prediction of complications with single-task learning and concurrent prediction of complications with multi-task learning) and show that medical complications of chronic diseases can be correlated. We use a case study and compare the performance of these two methods by predicting complications of hypertrophic cardiomyopathy on 106 predictors in 1078 electronic medical records from April 2009-April 2017, inclusive. The methods are implemented using logistic regression, artificial neural networks, decision trees, and support vector machines. The results show multi-task learning with logistic regression improves the performance of predictions in terms of both discrimination and calibration.

Introduction

There is a body of literature in healthcare analytics that looks at the patterns in electronic medical records to predict medical outcomes [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]. Recent developments revealed that chronic diseases cause several medical complications [11]. For example, diabetes type 2 increases the risk for medical complications such as numbness in feet, kidney conditions, high blood pressure, loss of vision and stroke. The advancements in predictive analytics are now being used for data mining of electronic medical records in order to predict patients at risk of developing the complications caused by a chronic disease [[12], [13], [14], [15],16,[17], [18], [19], [20], [21]]. Most of the literature in this area predicts multiple medical complications of chronic diseases by independently predicting each complication. For instance, Choi et al. [13] use machine learning methods to predict four medical complications caused by diabetes type 2. In this paper, each of these four complications were predicted separately, regardless of their inter-relationships. Similarly, Sangi et al. [22] utilize different predictive models to predict the development of T2D complications among diabetes patients. Sangi et al. [22] assumes the complications have no interrelationships.

Despite the approach taken in the healthcare predictive analytics literature to ignore the relationship among multiple complications caused by chronic diseases, the literature noted the dependencies among complications of chronic diseases [11]. For example, Maron [23] systematically reviews the complications for hypertrophic cardiomyopathy and found many of these complications are related to each other. For instance, in order to identify diabetes patients at risk of vision loss, Piri et al. [8] consider the co-existence of other diabetes-related complications as predictors and implement single-task learning. However, in practice, the complications of chronic diseases may develop concurrently, and at the time of prediction analysis, we may not necessarily have information about a particular complication to set it as a predictor. This paper aims to predict multiple complications caused by chronic diseases when they are interrelated. This study extends the literature in this area by setting all multiple complications of a chronic disease that are in the scope of analysis as predicted variables and not predictors.

Despite hypothesis-testing methods that investigate the effect of a variable or a group of variables on developing complications of chronic diseases, the machine learning algorithms extract non-trial patterns in medical datasets or electronic medical records. Furthermore, the machine learning algorithms classify observations [24]. Therefore, the use of machine learning minimizes the insufficiency of using hypothesis testing, which requires a hypothesis based on previous literature [5]. Machine learning algorithms such as logistic regression (LR), neural networks, Support Vector Machine (SVM), and decision trees (DT) have been used extensively in healthcare [19,[25], [26], [27], [28], [29], [30], [31], [32]]. The applications of machine learning in healthcare include, but are not limited to, predicting hospital readmissions [2,33], predicting survival of a medical procedure [5], early detection of chronic diseases [7,13,14,34], and predicting survival of chronic disease [35,36]. Another growing application of the machine learning algorithms is predicting the complications caused by chronic diseases. For instance, machine learning is used to predict vision loss caused by diabetes [8,37]; Kothari et al. [38] predicts heart disease and stroke in patients with type 2 diabetes; Pahl et al. [39] utilizes machine learning to predict heart failure resulting in death among children with cardiomyopathy.

As mentioned earlier, the literature on using machine learning to predict complications of chronic diseases only focuses on one specific complication and develops predictive models that can capture the occurrence of complications. This contrasts with the fact that among patients with chronic diseases, multiple complications are commonly observed and these medical complications are proven to be interrelated [11]. The studies in this area predict multiple clinical variables independently and assume that there is no relationship among different clinical variables e.g. [13]. However, the complications of chronic diseases are often related. For example, heart failure, problems with heart valves, cardiac arrest and sudden death are complications associated with hypertrophic cardiomyopathy [40], which have been shown to be interrelated [23]. Recent work by Piri et al. [8] tried to address this issue when predicting vision loss among diabetes patients by setting related complications as predictors. However, the complications may develop concurrently and the patient may not have these complications, but may be at risk of developing them. Therefore, the complications must be considered prediction outcomes.

Despite the literature that uses single-task learning (STL) to independently predict patients at risk of developing multiple medical complications of a chronic disease, this study aims to use multi-task learning (MTL) [41] by taking the dependencies of complications into account. MTL has been widely used in the literature of healthcare analytics [42,43] to identify patients at risk of developing a condition when the risk factors are related. For instance, MTL has been used to predict the progression of Alzheimer’s disease based on interacting clinical variables [44,45]; Razavian et al. [46] utilizes MTL to predict chronic kidney disease progression. Unfortunately, these studies use MTL in order to predict one particular medical condition such as Alzheimer’s or chronic kidney disease. To the best of our knowledge, the development of MLT-based models to predict multiple medical complications of a chronic condition has not yet been addressed. As such, this paper is an attempt to answer the following research question (RQ):

Does concurrent modeling of medical complications through the utilization of their intrinsic correlation improve the predication of said complications when compared with the independent prediction of multiple complications for a chronic disease?

The main difference between the current work and the literature in healthcare analytics is related to the utilization of MTL for the prediction of multiple complications of chronic diseases. Despite the existing research, which considers only a single medical complication or does not consider intrinsic correlations among complications, the current study extends the said literature by providing an MTL-based model that predicts multiple complications of chronic diseases through the utilization of the dependencies in relation to the complications. The proposed models are compared when predicting heart failure, problems with heart valves, cardiac arrest and sudden death as commonly experienced complications of hypertrophic cardiomyopathy.

The remainder of this paper is organized as follows. Section 2 presents the method suggested in this study; section 3 explains the evaluation of the proposed method; section 4 presents the analysis of the evaluation results; and section 5 discusses the findings and concludes the paper.

Section snippets

Proposed methods

Our objective is to identify patients with a particular chronic disease, hereafter to be referred to as “patients”, who are at risk of developing multiple medical complications related to their chronic disease. In this section, we first set our definition and build our analysis framework, then we propose two methods: (1) independent prediction of multiple complications that assumes no correlation among different complications and uses STL, and (2) concurrent prediction of multiple complications

Evaluation

This section explains how IPMC and CPMC were evaluated and compared to each other in terms of their performance for predicting medical complications of chronic diseases. IPMC and CPMC are general methods independent of what chronic disease is under study. However, for the purpose of evaluation, the present study compares them in the context of hypertrophic cardiomyopathy and its common complications: heart failure, problems with heart valves, and cardiac arrest [40]. Fig. 1 presents the

CPMC performs better than IPMC

As previously discussed, the first objective of evaluation was to compare IPMC and CPMC to examine the performance of STL and MTL. IPMC and CPMC have been trained to predict three complications of hypertrophic cardiomyopathy patients using four different predictive models: LR, DT, SVM and ANN. However, the evaluations have been repeated for 1-year, 3-year, and 8-year predictions. Table 4 and Table 5 show that CPMC outperforms the alternative IPMC predictive models. While a better discrimination

Discussion and conclusion

Policy makers and healthcare professionals have sought to reconstruct daily generated EMRs and to integrate them into large clinical data warehouses for use in auditing, continuous quality improvement, health service planning, epidemiological studies, and evaluation research [[80], [81], [82], [83]]. Although managing the increasing amount of daily generated EMRs is essential [84], the potential benefits of EMRs are not limited to storage and retrieval of patients’ records. EMRs stored over

Acknowledgment

The authors would like to thank the anonymous reviewers and the editor for their insightful comments and suggestions. Dr. Madjid Tavana is grateful for the partial support he received from the Czech Science Foundation (GAˇCR19-13946S) for this research.

References (91)

  • S. Kang

    Personalized prediction of drug efficacy for diabetes treatment via patient-level sequential modeling with neural networks

    Artif Intell Med

    (2018)
  • V. Schetinin et al.

    Bayesian averaging over Decision Tree models for trauma severity scoring

    Artif Intell Med

    (2018)
  • A. Suner et al.

    Sequential decision tree using the analytic hierarchy process for decision support in rectal cancer

    Artif Intell Med

    (2012)
  • T. Zheng et al.

    A machine learning-based framework to identify type 2 diabetes through electronic health records

    Int J Media Inf Lit

    (2017)
  • S. Walczak et al.

    An evaluation of artificial neural networks in predicting pancreatic Cancer survival

    J Gastrointest Surg

    (2017)
  • H.M. Zolbanin et al.

    Predicting overall survivability in comorbidity of cancers: a data mining approach

    Decis Support Syst

    (2015)
  • E. Pahl et al.

    Incidence of and risk factors for sudden cardiac death in children with dilated cardiomyopathy: a report from the pediatric cardiomyopathy registry

    J Am Coll Cardiol

    (2012)
  • M. Tan

    Prediction of anti-cancer drug response by kernelized multi-task learning

    Artif Intell Med

    (2016)
  • D. Zhou et al.

    Position-aware deep multi-task learning for drug–drug interaction extraction

    Artif Intell Med

    (2018)
  • D. Zhang et al.

    Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in alzheimer’s disease

    NeuroImage

    (2012)
  • H.H.-C. Chuang

    Mathematical modeling and Bayesian estimation for error-prone retail shelf audits

    Decis Support Syst

    (2015)
  • B. Heinrich et al.

    Assessing data quality – a probability-based metric for semantic consistency

    Decis Support Syst

    (2018)
  • Y. Liu et al.

    Using contextual features and multi-view ensemble learning in product defect identification from online discussion forums

    Decis Support Syst

    (2018)
  • S. Gribling et al.

    Matrices with high completely positive semidefinite rank

    Linear Algebra Its Appl.

    (2017)
  • D. Lewandowski et al.

    Generating random correlation matrices based on vines and extended onion method

    J Multivar Anal

    (2009)
  • C. De Mol et al.

    Forecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components?

    J Econom

    (2008)
  • P. Shih et al.

    Face detection using discriminating feature analysis and support vector machine

    Pattern Recognit

    (2006)
  • P.J. García-Laencina et al.

    Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values

    Comput Biol Med

    (2015)
  • N. Shukla et al.

    Breast cancer data analysis for survivability studies and prediction

    Comput Methods Programs Biomed

    (2018)
  • A. Kusiak et al.

    Predicting survival time for kidney dialysis patients: a data mining approach

    Comput Biol Med

    (2005)
  • C. Ferri et al.

    An experimental comparison of performance measures for classification

    Pattern Recognit Lett

    (2009)
  • A. Dag et al.

    A probabilistic data-driven framework for scoring the preoperative recipient-donor heart transplant survival

    Decis Support Syst

    (2016)
  • M. Ivanović et al.

    An overview of ontologies and data resources in medical domains

    Expert Syst Appl

    (2014)
  • F. Zandi

    A bi-level interactive decision support framework to identify data mining-oriented electronic health record architectures

    Appl Soft Comput

    (2014)
  • S.-T. Liaw et al.

    Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature

    Int J Media Inf Lit

    (2013)
  • Y.-K. Lin et al.

    Healthcare predictive analytics for risk profiling in chronic care: a Bayesian multitask learning approach

    MIS Q

    (2017)
  • R. Amarasingham et al.

    Implementing electronic health care predictive analytics: considerations and challenges

    Health Aff. (Millwood)

    (2014)
  • I. Bardhan et al.

    Predictive analytics for readmission of patients with congestive heart failure

    Inf. Syst. Res.

    (2014)
  • D.W. Bates et al.

    Big data in health care: using analytics to identify and manage high-risk and high-cost patients

    Health Aff. (Millwood)

    (2014)
  • G. Meyer et al.

    A machine learning approach to improving dynamic decision making

    Inf. Syst. Res.

    (2014)
  • A. Wulff et al.

    An interoperable clinical decision-support system for early detection of SIRS in pediatric intensive care using openEHR

    Artif Intell Med

    (2018)
  • C.L. Brown et al.

    Significant morbidity and mortality among hospitalized end-stage liver disease patients in Medicare

    J Pain Symptom Manage

    (2016)
  • M.D. Abràmoff et al.

    Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning

    Invest Ophthalmol Vis Sci

    (2016)
  • E. Choi et al.

    Using recurrent neural network models for early detection of heart failure onset

    J Am Med Inform Assoc

    (2016)
  • A. Dagliati et al.

    Machine learning methods to predict diabetes complications

    J Diabetes Sci Technol

    (2017)
  • View full text