Abstract
Administrative data, including medical claims, are frequently used to train machine learning-based models used for predicting patient outcomes. Despite many efforts in using administrative data, little systematic work has been done in understanding how the codes in such data should be represented before model construction. Traditionally, the presence/absence of codes representing diagnoses or procedures (Binary representation) over a fixed period (typically one year) is used. More recently, some studies included temporal information into data representation, such as counting, calculating time from diagnosis, and using multiple time windows. This paper investigates different methods of administrative data representation and more specifically diagnoses extracted from claims data before applying machine learning algorithms. Then the study compares two data representations (Binary and Temporal Min-Max) using two classification problems: one-year mortality prediction and high utilization of medical services prediction. The results indicated that Temporal Min-Max representation outperforms Binary representation in both predictive models. It was shown that the optimal way of representing the data is problem-dependent, thus optimization of representation parameters is required as part of the modeling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Connelly, R., Playford, C., Gayle, V., Dibben, C.: The role of administrative data in the big data revolution in social science research. Soc. Sci. Res. 59, 1–12 (2016)
CMS Forms List. https://www.cms.gov/Medicare/CMS-Forms/CMS-Forms/CMS-Forms-List
Ferver, K., Burton, B., Jesilow, P.: The use of claims data in healthcare research. Open Public Health J. 2, 11–24 (2009)
Cadarette, S.M., Wong, L.: An introduction to health care administrative data. Can. J. Hosp. Pharm. 68, 232 (2015)
Wilson, J., Bock, A.: https://www.optum.com/content/dam/optum/resources/whitePapers/Benefits-of-using-both-claims-and-EMR-data-in-HC-analysis-WhitePaper-ACS.pdf
Berg, G.D., Gurley, V.F.: Development and validation of 15-month mortality prediction models: a retrospective observational comparison of machine-learning techniques in a national sample of Medicare recipients. BMJ Open 9, 7 (2019)
Makar, M., et al.: Short-term mortality prediction for elderly patients using medicare claims data. Int. J. Mach. Learn. Comput. 5(3), 192–197 (2015)
Desai, R.J., et al.: Comparison of machine learning methods with traditional models for use of administrative claims with electronic medical records to predict heart failure outcomes. JAMA Netw. Open 3, 1 (2020)
He, D., et al.: Mining high-dimensional administrative claims data to predict early hospital readmissions. J. Am. Med. Inform. Assoc. 21(2), 272–279 (2014)
Min, X., Yu, B., Wang, F.: Predictive modeling of the hospital readmission risk from patients’ claims data using machine learning: a case study on COPD. Sci. Rep. 9, 1–10 (2019)
Morel, D., et al.: Predicting hospital readmission in patients with mental or substance use disorders: a machine learning approach. Int. J. Med. Inform. 139, 104136 (2020)
Osawa, I., et al.: Machine-learning-based prediction models for high-need high-cost patients using nationwide clinical and claims data. NPJ Digit. Med. 3, 1 (2020)
Luo, L., et al.: Using machine learning approaches to predict high-cost chronic obstructive pulmonary disease patients in China. Health Informatics J. 26(3), 1577–1598 (2019)
Chen, S., et al.: Using applied machine learning to predict healthcare utilization based on socioeconomic determinants of care. Am. J. Manag. Care 26(1), 26–31 (2020)
Davis, M.M., et al.: Geographic and population-level disparities in colorectal cancer testing: a multilevel analysis of Medicaid and commercial claims data. Prev. Med. 101, 44–52 (2017)
Singh, J.A., et al.: Trends in and disparities for acute myocardial infarction: an analysis of Medicare claims data from 1992 to 2010. BMC Med. 12, 1 (2014)
Inguva, S., et al.: Factors influencing Human papillomavirus (HPV) vaccination series completion in Mississippi Medicaid. Vaccine 38(8), 2051–2057 (2020)
Gray, S.E., et al.: Association between workers’ compensation claim processing times and work disability duration: analysis of population level claims data. Health Policy 123(10), 982–991 (2019)
Miotto, R., et al.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 1 (2016)
Ngiam, K.Y., Khor, I.W.: Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20, 5 (2019)
Malley, B., Ramazzotti, D., Wu, J.: Data prerocessing. In: Secondary Analysis of Electronic Health Records, pp. 115–141. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43742-2_12
Wojtusiak, J.: Data-driven constructive induction in the learnable evolution model. In: Proceedings of the 16th International Conference Intelligent Information Systems (2008)
Castillo, S., et al.: Algorithms and tools for the preprocessing of LC–MS metabolomics data. Chemom. Intell. Lab. Syst. 108(1), 23–32 (2011)
Stein, J.D., Lum, F., Lee, P.P., Rich, W.L., Coleman, A.L.: Use of health care claims data to study patients with ophthalmologic conditions. Ophthalmology 121, 1134–1141 (2014)
Tran, T., et al.: A framework for feature extraction from hospital medical data with applications in risk prediction. BMC Bioinform. 15, 1 (2014)
Liu, L., et al.: Learning the joint representation of heterogeneous temporal events for clinical endpoint prediction, https://arxiv.org/abs/1803.04837
Xie, Y., et al.: Analyzing health insurance claims on different timescales to predict days in hospital. J. Biomed. Inform. 60, 187–196 (2016)
Singh, A., et al.: Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration. J. Biomed. Inform. 53, 220–228 (2015)
Rajkomar, A., et al.: Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1(1), 1–10 (2018)
Kim, Y.J., Park, H.: Improving prediction of high-cost health care users with medical check-up data. Big Data. 7(3), 163–175 (2019)
Wojtusiak, J., et al.: Computational Barthel Index: an automated tool for assessing and predicting activities of daily living among nursing home patients. BMC Med. Inform. Decis. Mak. 21, 1 (2021)
Clinical Classifications Software (CCS) for ICD-9-CM. https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp. Accessed 14 May 2021
Elixhauser, A., et al.: Comorbidity measures for use with administrative data. Med. Care 36(1), 8–27 (1998)
Charlson, M.E., et al.: A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J. Chronic Dis. 40(5), 373–383 (1987)
Quan, H., et al.: Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med. Care 43, 1130–1139 (2005)
Lynam, A.L., et al.: Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adults. Diagn. Progn. Res. 4, 1 (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 8 (1997)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Asadzadehzanjani, N., Wojtusiak, J. (2021). Administrative Health Data Representation for Mortality and High Utilization Prediction. In: Rezig, E.K., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2021 2021. Lecture Notes in Computer Science(), vol 12921. Springer, Cham. https://doi.org/10.1007/978-3-030-93663-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-93663-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93662-4
Online ISBN: 978-3-030-93663-1
eBook Packages: Computer ScienceComputer Science (R0)