Predictive models for hospital readmission risk: A systematic review of methods

https://doi.org/10.1016/j.cmpb.2018.06.006Get rights and content

Highlights

  • We provide a comprehensive review of the state of the art machine learning methods for readmission risk prediction.

  • From 265 publications reviewed, 77 studies were selected through the systematic literature review process.

  • Recent comparative studies suggest that machine learning techniques can improve prediction ability over traditional statistical approaches.

Abstract

Objectives

Hospital readmission risk prediction facilitates the identification of patients potentially at high risk so that resources can be used more efficiently in terms of cost-benefit. In this context, several models for readmission risk prediction have been proposed in recent years. The goal of this review is to give an overview of prediction models for hospital readmission, describe the data analysis methods and algorithms used for building the models, and synthesize their results.

Methods

Studies that reported the predictive performance of a model for hospital readmission risk were included. We defined the scope of the review and accordingly built a search query to select the candidate papers. This query string was used as input for the chosen search engines, namely PubMed and Google Scholar. For each study, we recorded the population, feature selection method, classification algorithm, sample size, readmission threshold, readmission rate and predictive performance of the model.

Results

We identified 77 studies that met the inclusion criteria, out of 265 citations. In 68% of the studies (n = 52) logistic regression or other regression techniques were utilized as the main method. Ten (13%) studies used survival analysis for model construction, while 14 (18%) used machine learning techniques for classification, of which decision tree-based methods and SVM were the most utilized algorithms. Among these, only four studies reported the use of any class imbalance addressing technique, of which resampling is the most frequent (75%). The performance of the models varied significantly among studies, with Area Under the ROC Curve (AUC) values in the ranges between 0.54 and 0.92.

Conclusion

Logistic regression and survival analysis have been traditionally the most widely used techniques for model building. Nevertheless, machine learning techniques are becoming increasingly popular in recent years. Recent comparative studies suggest that machine learning techniques can improve prediction ability over traditional statistical approaches. Regardless, the lack of an appropriate benchmark dataset of hospital readmissions makes a comparison of models’ performance across different studies difficult.

Introduction

Hospital readmissions are defined as admissions to a hospital within a -usually short- time span after discharge from hospital. Readmissions are frequent and costly events that impose tremendous burden on patients and healthcare systems [1], [2].

Hospital readmissions are becoming a strong concern of hospitals and policy makers as a measure of the quality of care given and have been adopted by many organizations as a quality indicator [3]. Centers for Medicare and Medicaid Services (CMS) in the USA [4] and policy makers in the UK [5] have introduced financial penalties to hospitals with high readmission rates by reducing the payment for patients readmitted within 30-day of discharge. Therefore, there is a growing interest within the research community to address this problem from a data analysis perspective.

Some authors have performed bibliographic review studies with the aim of synthesizing the literature on prediction models to estimate the readmission risk. In 2011 Kansagara et al. [6] presented what is probably the most referenced systematic review paper in the field. This thorough review was mainly focused on model performance description and comparison to assess the suitability of the models for clinical or administrative use. The authors of the paper conclude that most readmission risk prediction models perform poorly and efforts to improve their performance are needed. The study also concludes that readmission risk prediction is a complex problem by nature, with many inherent limitations.

In 2015 Swain and Kharrazi [7] conducted a semi-systematic review of readmission predictive factors published prior to March 2013. This review was, to some degree, based on the work by Kansagara et al. and can be considered an extension of it. This work was focused on identifying the most significant predictive factors from previous readmission prediction models.

Other studies place the focus on a certain subpopulation rather than covering all the published risk prediction models. Ross et al. [8] conducted a review of statistical models for the readmission of heart failure (HF) patients. This work included the identification of analytic models, in addition to identifying patient characteristics associated with readmission. A more recent study by Leppin et al. [9] reviewed randomized trials that assessed the effect of interventions intended to prevent 30-day hospital readmissions.

Most of the previous review studies have focused on measuring the discrimination ability of the models and identifying predictive characteristics associated with readmission. In different but related fields, review studies targeting the analysis of data analysis approaches can be found. For instance, a systematic literature review on data mining techniques applied in cardiology was recently presented in [10].

Nevertheless, to our knowledge no study regarding machine learning techniques, including feature selection and class imbalance has been presented in the field of readmission prediction. Since those are areas which may provide improvements over previous methods, we have already carried out some experiments that support our initial thoughts [11], [12].

The objective of this study is to systematically review the prediction models for hospital readmission by describing the data analysis methods and algorithms used for building the models.

This paper is organized as follows: Section 2 summarizes the research methodology of our study. In section 3 we present the review results. In Section 4 we discuss the results and findings of this study. Lastly, Section 5 presents the conclusions and future work.

Section snippets

Materials and methods

In this work, we conduct a systematic review following the three stages proposed by Tranfield et al. [13], namely planning, conducting and reporting. According to this methodology, we first define the research questions. Then we define the search strategy by identifying the source databases and the inclusion and exclusion criteria. Next, we present the data extraction procedure and then present the results.

Results

Table 3 describes in detail the 76 studies included for review, after the selection process. In this section, we present the results of our systematic review study. First, we present an overview of the results and then we discuss the results regarding specific research questions.

Prediction performance of the models

The Area Under Receiver Operating Characteristic (ROC) Curve or c-statistic is the standard de facto metric for measuring the discrimination ability of readmission risk prediction models. Given that the main goal of some studies is to identify predictors associated to readmission, these kind of studies often do not provide the c-statistic as the overall performance metric. In addition, a minority of studies report sensitivity and specificity scores along with PPV instead of AUC [51], [25], [84]

Conclusions

Our literature review included 77 studies that described prediction models for hospital readmission risk. Although statistical modelling techniques have prevailed and are still popular techniques, machine learning approaches have emerged in recent years as a promising technique that can improve the predictive ability of readmission risk prediction models.

Clinical trials tend to follow established procedures when it comes to statistical data analysis. Logistic regression and Cox regression (or

References (98)

  • D.D. McManus et al.

    Reliability of predicting early hospital readmission after discharge for an acute coronary syndrome using claims-based data

    Am. J. Cardiol.

    (2016)
  • L. Turgeman et al.

    A mixed-ensemble model for hospital readmission

    Artif. Intell. Med.

    (2016)
  • C. Walsh et al.

    The effects of data sources, cohort selection, and outcome definition on a predictive model of risk of thirty-day hospital readmissions

    J. Biomed. Inform.

    (2014)
  • WangL. et al.

    Predicting risk of hospitalization or death among patients with heart failure in the veterans health administration

    Am. J. Cardiol.

    (2012)
  • A.J. Watson et al.

    Linking electronic health record-extracted psychosocial data in real-time to risk of readmission for heart failure

    Psychosomatics

    (2011)
  • YuS. et al.

    Predicting readmission risk with institution-specific prediction models

    Artif. Intell. Med.

    (2015)
  • A. Zapatero et al.

    Predictive model of readmission to internal medicine wards

    Eur. J. Intern. Med

    (2012)
  • ZhengB. et al.

    Predictive modeling of hospital readmissions using metaheuristics and data mining

    Expert Syst. Appl.

    (2015)
  • H.M. Krumholz et al.

    Do non-clinical factors improve prediction of readmission risk?: Results from the Tele-Hf study

    JACC: Heart Fail.

    (2016)
  • S.N. Vigod et al.

    READMIT: a clinical risk index to predict 30-day readmission after discharge from acute psychiatric units

    J. Psychiatr. Res.

    (2015)
  • L. Fernandez-Gasso et al.

    Trends, causes and timing of 30-day readmissions after hospitalization for heart failure: 11-year population-based analysis with linked data

    Int. J. Cardiol.

    (2017)
  • J. Futoma et al.

    A comparison of models for predicting early hospital readmissions

    J. Biomed. Inform.

    (2015)
  • S.F. Crone et al.

    Instance sampling in credit scoring: An empirical study of sample size and balancing

    Int. J. Forecast

    (2012)
  • A.P. Bradley

    The use of the area under the ROC curve in the evaluation of machine learning algorithms

    Pattern Recognit.

    (1997)
  • T. Fawcett

    An introduction to ROC analysis

    Pattern Recognit. Lett.

    (2006)
  • W. Ouwerkerk et al.

    Factors influencing the predictive power of models for predicting mortality and/or heart failure hospitalization in patients with heart failure

    JACC: Heart Fail.

    (2014)
  • M. Mesgarpour et al.

    Ensemble risk model of emergency admissions (ERMER)

    Int. J. Med. Inf.

    (2017)
  • K. Dharmarajan et al.

    Diagnoses and timing of 30-day readmissions after hospitalization for heart failure, acute myocardial infarction, or pneumonia

    JAMA

    (2013)
  • C.A. Baillie et al.

    The readmission risk flag: using the electronic health record to automatically identify patients at risk for 30‐day readmission

    J. Hosp. Med.

    (2013)
  • HHS. Medicare program; hospital inpatient prospective payment systems for acute care hospitals and the long-term care hospital prospective payment system and Fiscal Year 2014 rates; quality reporting requirements for specific providers; hospital conditions of participation; payment policies related to patient status

    Final rules. Fed. Regist.

    (2013)
  • Z. Kmietowicz

    Hospitals will be fined for emergency readmissions, says Lansley

    BMJ: Br. Med. J. (Online)

    (2010)
  • D. Kansagara et al.

    Risk prediction models for hospital readmission: a systematic review

    JAMA

    (2011)
  • J.S. Ross et al.

    Statistical models and patient predictors of readmission for heart failure: a systematic review

    Arch. Intern. Med.

    (2008)
  • A.L. Leppin et al.

    Preventing 30-day hospital readmissions: a systematic review and meta-analysis of randomized trials

    JAMA Intern Med.

    (2014)
  • A. Artetxe et al.

    Emergency department readmission risk prediction: a case study in Chile

  • D. Tranfield et al.

    Towards a methodology for developing evidence‐informed management knowledge by means of systematic review

    Br. J. Manag.

    (2003)
  • D. Moher et al.

    Preferred reporting items for systematic reviews and meta-analyses: the PRISMA Statement

    PLoS Med.

    (2009)
  • S. Greenland

    Modeling and variable selection in epidemiologic analysis

    Am. J. Public Health

    (1989)
  • S.E. AbdelRahman et al.

    A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study

    BMC Med. Inf. Decis. Making

    (2014)
  • A. Alassaad et al.

    A tool for prediction of risk of rehospitalisation and mortality in the hospitalised elderly: secondary analysis of clinical trial data

    BMJ Open

    (2015)
  • N. Allaudeen et al.

    Redefining readmission risk factors for general medicine patients

    J. Hosp. Med.

    (2011)
  • L.A. Allen et al.

    Rates and predictors of 30-day readmission among commercially insured and medicaid-enrolled patients hospitalized with systolic heart failure

    Circ: Heart Fail.

    (2012)
  • G.M. Allison et al.

    Prediction model for 30-day hospital readmissions among patients discharged receiving outpatient parenteral antibiotic therapy

    Clin. Infect. Dis.

    (2013)
  • B. Amalakuhan et al.

    A prediction model for COPD readmissions: catching up, catching our breath, and improving a national problem

    J. Community Hosp. Intern. Med. Perspect.

    (2012)
  • R. Amarasingham et al.

    An automated model to identify heart failure patients at risk for 30-day readmission or death using electronic medical record data

    Med. Care

    (2010)
  • P.A. Baltodano et al.

    A validated, risk assessment tool for predicting readmission after open ventral hernia repair

    Hernia

    (2016)
  • I. Bergese et al.

    An innovative model to predict pediatric emergency department return visits

    Pediatr. Emerg. Care

    (2017)
  • J. Billings et al.

    Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (PARR-30)

    BMJ Open

    (2012)
  • CaiX. et al.

    Real-time prediction of mortality, readmission, and length of stay using electronic health record data

    J. Am. Med. Inform. Assoc.

    (2015)
  • Cited by (128)

    View all citing articles on Scopus
    View full text