skip to main content
10.1145/2649387.2649407acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Leveraging hierarchy in medical codes for predictive modeling

Published: 20 September 2014 Publication History

Abstract

ICD-9 codes are among the most important patient information recorded in electronic health records. They have been shown to be useful for predictive modeling of different adverse outcomes in patients, including diabetes and heart failure. An important characteristic of ICD-9 codes is the hierarchical relationships among different codes. Nevertheless, the most common feature representation used to incorporate ICD-9 codes in predictive models disregards the structural relationships.
In this paper, we explore different methods to leverage the hierarchical structure in ICD-9 codes with the goal of improving performance of predictive models. We compare methods that leverage hierarchy by 1) incorporating the information during feature construction, 2) using a learning algorithm that addresses the structure in the ICD-9 codes when building a model, or 3) doing both. We propose and evaluate a novel feature engineering approach to leverage hierarchy, while simultaneously reducing feature dimensionality.
Our experiments indicate that significant improvement in predictive performance can be achieved by properly exploiting ICD-9 hierarchy. Using two clinical tasks: predicting chronic kidney disease progression (Task-CKD), and predicting incident heart failure (Task-HF), we show that methods that use hierarchy outperform the conventional approach in F-score (0.44 vs 0.36 for Task-HF and 0.40 vs 0.37 for Task-CKD) and relative risk (4.6 vs 3.3 for Task-HF and 5.9 vs 3.8 for Task-CKD).

References

[1]
Centers for Disease Control and Prevention (CDC). International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). 2013.
[2]
A phewas approach in studying hla-drb1*1501. Genes and Immunity, 14, 2013.
[3]
CMS. 2010 ICD-10-CM. Centers for Medicare and Medicaid Services, 2010.
[4]
J. C. Denny, M. D. Ritchie, M. A. Basford, J. M. Pulley, L. Bastarache, K. Brown-Gentry, D. Wang, D. R. Masys, D. M. Roden, and D. C. Crawford. Phewas: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics, 26(9):1205--1210, 2010.
[5]
J. E. Ho, C. Liu, and A. L. et al. Galectin-3, a marker of cardiac fibrosis, predicts incident heart failure in the community. Journal of the American College of Cardiology, 60(14):1249--1256, 2012.
[6]
R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for hierarchical sparse coding. J. Mach. Learn. Res., 12:2297--2334, July 2011.
[7]
R. Krishnan, N. Razavian, and S. N. et al. Early detection of diabetes from health claims. NIPS workshop in Machine Learning for Clinical Data Analysis and Healthcare, 2013.
[8]
D. Lloyd-Jones, R. J. Adams, and T. M. B. et al. Heart disease and stroke statistics 2010 update: A report from the american heart association. Circulation, 121(7):e46--e215, 2010.
[9]
L. R. Loehr and W. D. R. et al. Association of multiple anthropometrics of overweight and obesity with incident heart failure: The atherosclerosis risk in communities study. Circulation: Heart Failure, 2(1):18--24, 2009.
[10]
S. Ma, X. Song, and J. Huang. Supervised group lasso with applications to microarray data analysis. BMC Bioinformatics, 8(1):1--17, 2007.
[11]
R. Muirheard. Aspects of Multivariate Statistical Theory. Wiley, 1982.
[12]
A. Perotte, R. Pivovarov, and K. N. et al. Diagnosis code assignment: models and evaluation metrics. Journal of the American Medical Informatics Association, 2013.
[13]
D. M. Powers. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2:37--63, 2011.
[14]
M. Ritchie, J. Denny, D. Crawford, and A. R. et al. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. American Journal of Human Genetics, 86(4):560--572, 2010.
[15]
N. Simon, J. Friedman, T. Hastie, and R. Tibshirani. A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2):231--245, 2013.
[16]
E. Steyerberg, A. Vickers, N. Cook, and et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epimemiology, 2010.
[17]
J. Sun, J. Hu, and D. L. et al. Combining knowledge and data driven insights for identifying risk factors using electronic health records. American Medical Informatics Association, 2012.
[18]
N. Tangri, L. Stevens, J. Griffith, and et al. A predictive model for progression of chronic kidney disease to kidney failure. JAMA, 305(15):1553--1559, 2011.
[19]
F. Tsui, M. Wagner, V. Dato, and C. Chang. Value of ICD-9-coded chief complaints for detection of epidemics. Journal of the American Medical Informatics Association, 2002.
[20]
J. Wiens and J. Guttag. Patient risk stratification for hospital-associated c. diff as a time-series classification task. Neural Information Processing Systems (NIPS), 2012.
[21]
L. Yuan, J. Liu, and J. Ye. Efficient methods for overlapping group lasso. Neural Information Processing Systems (NIPS), 2011.
[22]
L. Yuan, J. Liu, and J. Ye. Efficient methods for overlapping group lasso. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9):2104--2116, Sept 2013.
[23]
M. Yuan, M. Yuan, Y. Lin, and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68:49--67, 2006.
[24]
P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics, 2009.

Cited By

View all
  • (2019)Performance Evaluation of Ensemble-Based Machine Learning Techniques for Prediction of Chronic Kidney DiseaseEnergy Transfer and Dissipation in Plasma Turbulence10.1007/978-981-13-5953-8_34(415-426)Online publication date: 3-May-2019
  • (2018)Predicting acute kidney injury at hospital re-entry using high-dimensional electronic health record dataPLOS ONE10.1371/journal.pone.020492013:11(e0204920)Online publication date: 20-Nov-2018
  • (2018)A System for Automated Determination of Perioperative Patient AcuityJournal of Medical Systems10.1007/s10916-018-0977-742:7(1-11)Online publication date: 1-Jul-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
September 2014
851 pages
ISBN:9781450328944
DOI:10.1145/2649387
  • General Chairs:
  • Pierre Baldi,
  • Wei Wang
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 September 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ICD-9 codes
  2. feature hierarchy
  3. machine learning in healthcare and medicine
  4. predictive modeling

Qualifiers

  • Research-article

Funding Sources

  • Quanta Computers Inc.

Conference

BCB '14
Sponsor:
BCB '14: ACM-BCB '14
September 20 - 23, 2014
California, Newport Beach

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Performance Evaluation of Ensemble-Based Machine Learning Techniques for Prediction of Chronic Kidney DiseaseEnergy Transfer and Dissipation in Plasma Turbulence10.1007/978-981-13-5953-8_34(415-426)Online publication date: 3-May-2019
  • (2018)Predicting acute kidney injury at hospital re-entry using high-dimensional electronic health record dataPLOS ONE10.1371/journal.pone.020492013:11(e0204920)Online publication date: 20-Nov-2018
  • (2018)A System for Automated Determination of Perioperative Patient AcuityJournal of Medical Systems10.1007/s10916-018-0977-742:7(1-11)Online publication date: 1-Jul-2018
  • (2017)Semi-Supervised Prediction of Comorbid Rare Conditions Using Medical Claims Data2017 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW.2017.68(478-485)Online publication date: Nov-2017
  • (2016)Predictive analytics for chronic kidney disease using machine learning techniques2016 Management and Innovation Technology International Conference (MITicon)10.1109/MITICON.2016.8025242(MIT-80-MIT-83)Online publication date: Oct-2016
  • (2015)Improving Hospital Readmission Prediction Using Domain Knowledge Based Virtual ExamplesKnowledge Management in Organizations10.1007/978-3-319-21009-4_51(695-706)Online publication date: 4-Aug-2015
  • (2015)Mining Hierarchical Pathology Data Using Inductive Logic ProgrammingArtificial Intelligence in Medicine10.1007/978-3-319-19551-3_9(76-85)Online publication date: 2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media