Skip to main content

Automated Coding of Medical Diagnostics from Free-Text: The Role of Parameters Optimization and Imbalanced Classes

  • Conference paper
  • First Online:
Data Integration in the Life Sciences (DILS 2018)

Abstract

The extraction of codes from Electronic Health Records (EHR) data is an important task because extracted codes can be used for different purposes such as billing and reimbursement, quality control, epidemiological studies, and cohort identification for clinical trials. The codes are based on standardized vocabularies. Diagnostics, for example, are frequently coded using the International Classification of Diseases (ICD), which is a taxonomy of diagnosis codes organized in a hierarchical structure. Extracting codes from free-text medical notes in EHR such as the discharge summary requires the review of patient data searching for information that can be coded in a standardized manner. The manual human coding assignment is a complex and time-consuming process. The use of machine learning and natural language processing approaches have been receiving an increasing attention to automate the process of ICD coding. In this article, we investigate the use of Support Vector Machines (SVM) and the binary relevance method for multi-label classification in the task of automatic ICD coding from free-text discharge summaries. In particular, we explored the role of SVM parameters optimization and class weighting for addressing imbalanced class. Experiments conducted with the Medical Information Mart for Intensive Care III (MIMIC III) database reached 49.86% of f1-macro for the 100 most frequent diagnostics. Our findings indicated that optimization of SVM parameters and the use of class weighting can improve the effectiveness of the classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.who.int/classifications/icd/en/.

  2. 2.

    https://www.nlm.nih.gov/research/umls/about_umls.html.

  3. 3.

    https://metamap.nlm.nih.gov/.

  4. 4.

    https://mimic.physionet.org/.

  5. 5.

    http://scikit-learn.org/stable/.

  6. 6.

    https://www.nltk.org/.

  7. 7.

    The opinions expressed in this work do not necessarily reflect those of the funding agencies.

References

  1. Chaudhry, B.: Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144(10), 742 (2006)

    Article  Google Scholar 

  2. Navas, H., Osornio, A.L., Baum, A., Gomez, A., Luna, D., de Quiros, F.G.B.: Creation and evaluation of a terminology server for the interactive coding of discharge summaries. Stud. Health Technol. Inform. 129, 650–654 (2007)

    Google Scholar 

  3. Rios, A., Kavuluru, R.: Supervised extraction of diagnosis codes from EMRs: role of feature selection, data selection, and probabilistic thresholding. In: 2013 IEEE International Conference on Healthcare Informatics, pp. 66–73 (2013)

    Google Scholar 

  4. Scheurwegs, E., Luyckx, K., Luyten, L., Daelemans, W., Van den Bulcke, T.: Data integration of structured and unstructured sources for assigning clinical codes to patient stays. J. Am. Med. Inform. Assoc. 23(e1), 11–19 (2016)

    Article  Google Scholar 

  5. Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. 21(2), 231–237 (2014)

    Article  Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  7. Kavuluru, R., Rios, A., Lu, Y.: An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif. Intell. Med. 65(2), 155–166 (2015)

    Article  Google Scholar 

  8. Dougherty, M., Seabold, S., White, S.: Study Reveals hard facts on CAC. J. AHIMA 84(7), 54–56 (2013)

    Google Scholar 

  9. Helwe, C., Elbassuoni, S., Geha, M., Hitti, E., Makhlouf Obermeyer, C.: CCS coding of discharge diagnoses via deep neural networks. In: Proceedings of the 2017 International Conference on Digital Health, DH 2017, pp. 175–179 (2017)

    Google Scholar 

  10. Wang, S., Chang, X., Li, X., Long, G., Yao, L., Sheng, Q.: Diagnosis code assignment using sparsity-based disease correlation embedding. IEEE Trans. Knowl. Data Eng. 28(12), 3191–3202 (2016)

    Article  Google Scholar 

  11. Rizzo, S.G., Montesi, D., Fabbri, A., Marchesini, G.: ICD code retrieval: novel approach for assisted disease classification. In: Ashish, N., Ambite, J.-L. (eds.) DILS 2015. LNCS, vol. 9162, pp. 147–161. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21843-4_12

    Chapter  Google Scholar 

  12. Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. BMC Bioinf. 9(Suppl. 3), S10 (2008)

    Article  Google Scholar 

  13. Stanfill, M.H., Williams, M., Fenton, S.H., Jenders, R.A., Hersh, W.R.: A systematic literature review of automated clinical coding and classification systems. J. Am. Med. Inform. Assoc. 17(6), 646–651 (2010)

    Article  Google Scholar 

  14. Zhang, Y.: A hierarchical approach to encoding medical concepts for clinical notes. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Student Research Workshop, HLT 2008, p. 67 (2008)

    Google Scholar 

  15. Subotin, M., Davis, A.R.: A method for modeling co-occurrence propensity of clinical codes with application to ICD-10-PCS auto-coding. J. Am. Med. Inform. Assoc. 23(5), 866–871 (2016)

    Article  Google Scholar 

  16. Berndorfer, S., Henriksson, A.: Automated diagnosis coding with combined text representations. Stud. Health Technol. Inform. 235, 201–205 (2017)

    Google Scholar 

  17. Shi, H., Xie, P., Hu, Z., Zhang, M., Xing, E.P.: Towards automated ICD coding using deep learning, pp. 1–11 (2017)

    Google Scholar 

  18. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  19. Haykin, S.: Neural Networks and Learning Machines, vol. 3. Pearson, Upper Saddle River (2009)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the São Paulo Research Foundation (FAPESP) (Grant #2017/02325-5)Footnote 7.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luiz Virginio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Virginio, L., dos Reis, J.C. (2019). Automated Coding of Medical Diagnostics from Free-Text: The Role of Parameters Optimization and Imbalanced Classes. In: Auer, S., Vidal, ME. (eds) Data Integration in the Life Sciences. DILS 2018. Lecture Notes in Computer Science(), vol 11371. Springer, Cham. https://doi.org/10.1007/978-3-030-06016-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-06016-9_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-06015-2

  • Online ISBN: 978-3-030-06016-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics