Skip to main content

Administrative Health Data Representation for Mortality and High Utilization Prediction

  • Conference paper
  • First Online:
Heterogeneous Data Management, Polystores, and Analytics for Healthcare (DMAH 2021, Poly 2021)

Abstract

Administrative data, including medical claims, are frequently used to train machine learning-based models used for predicting patient outcomes. Despite many efforts in using administrative data, little systematic work has been done in understanding how the codes in such data should be represented before model construction. Traditionally, the presence/absence of codes representing diagnoses or procedures (Binary representation) over a fixed period (typically one year) is used. More recently, some studies included temporal information into data representation, such as counting, calculating time from diagnosis, and using multiple time windows. This paper investigates different methods of administrative data representation and more specifically diagnoses extracted from claims data before applying machine learning algorithms. Then the study compares two data representations (Binary and Temporal Min-Max) using two classification problems: one-year mortality prediction and high utilization of medical services prediction. The results indicated that Temporal Min-Max representation outperforms Binary representation in both predictive models. It was shown that the optimal way of representing the data is problem-dependent, thus optimization of representation parameters is required as part of the modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Connelly, R., Playford, C., Gayle, V., Dibben, C.: The role of administrative data in the big data revolution in social science research. Soc. Sci. Res. 59, 1–12 (2016)

    Article  Google Scholar 

  2. CMS Forms List. https://www.cms.gov/Medicare/CMS-Forms/CMS-Forms/CMS-Forms-List

  3. Ferver, K., Burton, B., Jesilow, P.: The use of claims data in healthcare research. Open Public Health J. 2, 11–24 (2009)

    Article  Google Scholar 

  4. Cadarette, S.M., Wong, L.: An introduction to health care administrative data. Can. J. Hosp. Pharm. 68, 232 (2015)

    Google Scholar 

  5. Wilson, J., Bock, A.: https://www.optum.com/content/dam/optum/resources/whitePapers/Benefits-of-using-both-claims-and-EMR-data-in-HC-analysis-WhitePaper-ACS.pdf

  6. Berg, G.D., Gurley, V.F.: Development and validation of 15-month mortality prediction models: a retrospective observational comparison of machine-learning techniques in a national sample of Medicare recipients. BMJ Open 9, 7 (2019)

    Article  Google Scholar 

  7. Makar, M., et al.: Short-term mortality prediction for elderly patients using medicare claims data. Int. J. Mach. Learn. Comput. 5(3), 192–197 (2015)

    Article  Google Scholar 

  8. Desai, R.J., et al.: Comparison of machine learning methods with traditional models for use of administrative claims with electronic medical records to predict heart failure outcomes. JAMA Netw. Open 3, 1 (2020)

    Article  Google Scholar 

  9. He, D., et al.: Mining high-dimensional administrative claims data to predict early hospital readmissions. J. Am. Med. Inform. Assoc. 21(2), 272–279 (2014)

    Article  Google Scholar 

  10. Min, X., Yu, B., Wang, F.: Predictive modeling of the hospital readmission risk from patients’ claims data using machine learning: a case study on COPD. Sci. Rep. 9, 1–10 (2019)

    Google Scholar 

  11. Morel, D., et al.: Predicting hospital readmission in patients with mental or substance use disorders: a machine learning approach. Int. J. Med. Inform. 139, 104136 (2020)

    Google Scholar 

  12. Osawa, I., et al.: Machine-learning-based prediction models for high-need high-cost patients using nationwide clinical and claims data. NPJ Digit. Med. 3, 1 (2020)

    Google Scholar 

  13. Luo, L., et al.: Using machine learning approaches to predict high-cost chronic obstructive pulmonary disease patients in China. Health Informatics J. 26(3), 1577–1598 (2019)

    Article  Google Scholar 

  14. Chen, S., et al.: Using applied machine learning to predict healthcare utilization based on socioeconomic determinants of care. Am. J. Manag. Care 26(1), 26–31 (2020)

    Article  Google Scholar 

  15. Davis, M.M., et al.: Geographic and population-level disparities in colorectal cancer testing: a multilevel analysis of Medicaid and commercial claims data. Prev. Med. 101, 44–52 (2017)

    Article  Google Scholar 

  16. Singh, J.A., et al.: Trends in and disparities for acute myocardial infarction: an analysis of Medicare claims data from 1992 to 2010. BMC Med. 12, 1 (2014)

    Article  Google Scholar 

  17. Inguva, S., et al.: Factors influencing Human papillomavirus (HPV) vaccination series completion in Mississippi Medicaid. Vaccine 38(8), 2051–2057 (2020)

    Article  Google Scholar 

  18. Gray, S.E., et al.: Association between workers’ compensation claim processing times and work disability duration: analysis of population level claims data. Health Policy 123(10), 982–991 (2019)

    Article  Google Scholar 

  19. Miotto, R., et al.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 1 (2016)

    Article  Google Scholar 

  20. Ngiam, K.Y., Khor, I.W.: Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20, 5 (2019)

    Article  Google Scholar 

  21. Malley, B., Ramazzotti, D., Wu, J.: Data prerocessing. In: Secondary Analysis of Electronic Health Records, pp. 115–141. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43742-2_12

    Chapter  Google Scholar 

  22. Wojtusiak, J.: Data-driven constructive induction in the learnable evolution model. In: Proceedings of the 16th International Conference Intelligent Information Systems (2008)

    Google Scholar 

  23. Castillo, S., et al.: Algorithms and tools for the preprocessing of LC–MS metabolomics data. Chemom. Intell. Lab. Syst. 108(1), 23–32 (2011)

    Article  Google Scholar 

  24. Stein, J.D., Lum, F., Lee, P.P., Rich, W.L., Coleman, A.L.: Use of health care claims data to study patients with ophthalmologic conditions. Ophthalmology 121, 1134–1141 (2014)

    Article  Google Scholar 

  25. Tran, T., et al.: A framework for feature extraction from hospital medical data with applications in risk prediction. BMC Bioinform. 15, 1 (2014)

    Article  Google Scholar 

  26. Liu, L., et al.: Learning the joint representation of heterogeneous temporal events for clinical endpoint prediction, https://arxiv.org/abs/1803.04837

  27. Xie, Y., et al.: Analyzing health insurance claims on different timescales to predict days in hospital. J. Biomed. Inform. 60, 187–196 (2016)

    Article  Google Scholar 

  28. Singh, A., et al.: Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration. J. Biomed. Inform. 53, 220–228 (2015)

    Article  Google Scholar 

  29. Rajkomar, A., et al.: Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1(1), 1–10 (2018)

    Article  Google Scholar 

  30. Kim, Y.J., Park, H.: Improving prediction of high-cost health care users with medical check-up data. Big Data. 7(3), 163–175 (2019)

    Article  Google Scholar 

  31. Wojtusiak, J., et al.: Computational Barthel Index: an automated tool for assessing and predicting activities of daily living among nursing home patients. BMC Med. Inform. Decis. Mak. 21, 1 (2021)

    Article  Google Scholar 

  32. Clinical Classifications Software (CCS) for ICD-9-CM. https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp. Accessed 14 May 2021

  33. Elixhauser, A., et al.: Comorbidity measures for use with administrative data. Med. Care 36(1), 8–27 (1998)

    Article  Google Scholar 

  34. Charlson, M.E., et al.: A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J. Chronic Dis. 40(5), 373–383 (1987)

    Article  Google Scholar 

  35. Quan, H., et al.: Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med. Care 43, 1130–1139 (2005)

    Article  Google Scholar 

  36. Lynam, A.L., et al.: Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adults. Diagn. Progn. Res. 4, 1 (2020)

    Article  Google Scholar 

  37. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 8 (1997)

    Google Scholar 

  38. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Negin Asadzadehzanjani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Asadzadehzanjani, N., Wojtusiak, J. (2021). Administrative Health Data Representation for Mortality and High Utilization Prediction. In: Rezig, E.K., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2021 2021. Lecture Notes in Computer Science(), vol 12921. Springer, Cham. https://doi.org/10.1007/978-3-030-93663-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93663-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93662-4

  • Online ISBN: 978-3-030-93663-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics