Abstract
In recent years, prediction of 30-day hospital readmission risk received increased interest in the area of Healthcare Predictive Analytics because of high human and financial impact. However, lack of data, high class and feature imbalance, and sparsity of the data make this task so challenging that most of the efforts to produce accurate data-driven readmission predictive models failed. We address these problems by proposing a novel method for generation of virtual examples that exploits synergetic effect of data driven models and domain knowledge by integrating qualitative knowledge and available data as complementary information sources. Domain knowledge, presented in the form of ICD-9 hierarchy of diagnoses, is used to characterize rare or unseen co-morbidities, which presumably have similar outcome according to ICD-9 hierarchy. We evaluate the proposed method on 66,994 pediatric hospital discharge records from California, State Inpatient Databases (SID), Healthcare Cost and Utilization Project (HCUP) in the period from 2009 to 2011, and show improved prediction of 30-day hospital readmission accuracy compared to state-of-the-art alternative methods. We attribute the improvement obtained by the proposed method to the fact that rare diseases have high percentage of readmission, and models based entirely on data usually fail to detect this qualitative information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bavisetty, S., Grody, W.W., Yazdani, S.: Emergence of pediatric rare diseases: review of present policies and opportunities for improvement. Rare Dis. 1(1) (2013)
Behara, R., Agarwal, A., Fatteh, F., Furht, B.: Predicting Hospital Readmission Risk for COPD Using EHR Information. Handbook of Medical and Healthcare Technologies, pp. 297–308. Springer, New York (2013)
Cao, X.H., Stojkovic, I., Obradovic, Z.: Predicting sepsis severity from limited temporal observations. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 37–48. Springer, Heidelberg (2014)
Davis, D.A., Chawla, N.V., Christakis, N.A., Barabási, A.L.: Time to CARE: a collaborative engine for practical disease prediction. Data Min. Knowl. Disc. 20(3), 388–415 (2010)
Ghalwash, M., Obradovic, Z.: A data-driven model for optimizing therapy duration for septic patients. In: Proceedings 14th SIAM Int’l Conference Data Mining, 3rd Workshop on Data Mining for Medicine and Healthcare, Philadelphia, April 2014
Hasan, O., Meltzer, D.O., Shaykevich, S.A., Bell, C.M., et al.: Hospital readmission in general medicine patients: a prediction model. J. Gen. Intern. Med. 25(3), 211–219 (2010)
HCUP State Inpatient Databases (SID), Healthcare Cost and Utilization Project (HCUP). 2009–2011. Agency for Healthcare Research and Quality, Rockville, MD. www.hcup-us.ahrq.gov/sidoverview.jsp. 27 Feb 2015
Krumholz, H.M., Lin, Z., Drye, E.E., Desai, M.M., Han, L.F., Rapp, M.T., Normand, S.L.T.: An administrative claims measure suitable for profiling hospital performance based on 30-day all-cause readmission rates among patients with acute myocardial infarction. Cir. Cardiovas. Qual. Outcomes 4(2), 243–252 (2011)
Legname, G.: Novel approaches to diagnosis and therapy in neurodegenerative diseases. In: Toi, V.V. (ed.). IFMBE Proceedings, vol. 46, pp. 155–158Springer, Heidelberg (2015)
Li, D.C., Wen, I.H.: A genetic algorithm-based virtual sample generation technique to improve small data set learning. Neurocomputing 143, 222–230 (2014)
Mathew, G., Obradovic, Z.: Distributed privacy preserving decision system for predicting hospitalization risks in hospitals with insufficient data. In: Machine Learning in Health Informatics Workshop: International Conference on Machine Learning Applications–ICMLA, pp. 178–183. Boca Raton, FL, USA (2012)
Mathew, G., Obradovic, Z.: Distributed privacy preserving decision support system for highly imbalanced clinical data. ACM Transactions on Management Information Systems. 4(3) Article No. 12, October 2013 (2013)
Mathew, G., Obradovic, Z.: A distributed decision support algorithm that preserves personal privacy. J. Intell. Inf. Syst. 44, 107–132 (2014)
McCoy, A.B., Wright, A., Eysenbach, G., Malin, B.A., Patterson, E.S., Xu, H., Sittig, D.F.: State of the art in clinical informatics: evidence and examples. Yearb. Med. Inform. 8, 13–19 (2013)
Mirchevska, V., Luštrek, M., Gams, M.: Combining domain knowledge and machine learning for robust fall detection. Expert Syst. 31(2), 163–175 (2014)
Ooi, B.C., Tan, K.L., Tran, Q.T., Yip, J.W., Chen, G., Ling, Z.J., Zhang, M.: Contextual crowd intelligence. ACM SIGKDD Explor. Newsl. 16(1), 39–46 (2014)
Poggio, T., Vetter, T.: Recognition and structure from one 2D model view: Ob- servations on prototypes, object classes and symmetries. Artificial Intell. Lab., MIT, Cambridge, MA, A.I. Memo no. 1347, April 1992 (1992)
Polychronopoulou, A., Obradovic, Z.: Hospital pricing estimation by gaussian conditional random fields based regression on graphs. In: Proceedings of the 2014 IEEE International Conference on Bioinformatics and Biomedicine, Belfast, UK, Nov 2014
Shams, I., Ajorlou, S., Yang, K.: A predictive analytics approach to reducing 30-day avoidable readmissions among patients with heart failure, acute myocardial infarction, pneumonia, or COPD. Health care management science, 1–16 (2014)
Singh, A., Nadkarni, G., Guttag, J., Bottinger, E.: Leveraging hierarchy in medical codes for predictive modeling. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 96–103. ACM, September 2014
Srivastava, R., Keren, R.: Pediatric readmissions as a hospital quality measure. JAMA 309(4), 396–398 (2013)
Steele, S., Bilchik, A., Eberhardt, J., Kalina, P., Nissan, A., Johnson, E., Stojadinovic, A.: Using machine-learned bayesian belief networks to predict perioperative risk of clostridium difficile infection following colon surgery. Interactive J. Med. Res. 1(2) (2012)
Stiglic, G., Pernek, I., Kokol, P., Obradovic, Z.: Disease prediction based on prior knowledge. In: ACM SIGKDD Workshop on Health Informatics, in Conjunction with the 18th SIGKDD Conference Knowledge Discovery and Data Mining, Beijing, China, Aug 2012
Stiglic, G., Wang, F., Davey, A., Obradovic, Z.: Readmission classification using stacked regularized logistic regression models. In: Proceedings of the AMIA 2014 Annual Symposium, Washington, DC, Nov 2014
Yang, J., Yu, X., Xie, Z.Q., Zhang, J.P.: A novel virtual sample generation method based on Gaussian distribution. Knowl. Based Syst. 24(6), 740–748 (2011)
Acknowledgments
This research was supported by DARPA Grant FA9550-12-1-0406 negotiated by AFOSR, National Science Foundation through major research instrumentation, grant number CNS-09-58854, and by SNSF Joint Research project (SCOPES), ID: IZ73Z0_152415.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Vukicevic, M., Radovanovic, S., Kovacevic, A., Stiglic, G., Obradovic, Z. (2015). Improving Hospital Readmission Prediction Using Domain Knowledge Based Virtual Examples. In: Uden, L., Heričko, M., Ting, IH. (eds) Knowledge Management in Organizations. KMO 2015. Lecture Notes in Business Information Processing, vol 224. Springer, Cham. https://doi.org/10.1007/978-3-319-21009-4_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-21009-4_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21008-7
Online ISBN: 978-3-319-21009-4
eBook Packages: Computer ScienceComputer Science (R0)