Abstract
The extraction of codes from Electronic Health Records (EHR) data is an important task because extracted codes can be used for different purposes such as billing and reimbursement, quality control, epidemiological studies, and cohort identification for clinical trials. The codes are based on standardized vocabularies. Diagnostics, for example, are frequently coded using the International Classification of Diseases (ICD), which is a taxonomy of diagnosis codes organized in a hierarchical structure. Extracting codes from free-text medical notes in EHR such as the discharge summary requires the review of patient data searching for information that can be coded in a standardized manner. The manual human coding assignment is a complex and time-consuming process. The use of machine learning and natural language processing approaches have been receiving an increasing attention to automate the process of ICD coding. In this article, we investigate the use of Support Vector Machines (SVM) and the binary relevance method for multi-label classification in the task of automatic ICD coding from free-text discharge summaries. In particular, we explored the role of SVM parameters optimization and class weighting for addressing imbalanced class. Experiments conducted with the Medical Information Mart for Intensive Care III (MIMIC III) database reached 49.86% of f1-macro for the 100 most frequent diagnostics. Our findings indicated that optimization of SVM parameters and the use of class weighting can improve the effectiveness of the classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
The opinions expressed in this work do not necessarily reflect those of the funding agencies.
References
Chaudhry, B.: Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144(10), 742 (2006)
Navas, H., Osornio, A.L., Baum, A., Gomez, A., Luna, D., de Quiros, F.G.B.: Creation and evaluation of a terminology server for the interactive coding of discharge summaries. Stud. Health Technol. Inform. 129, 650–654 (2007)
Rios, A., Kavuluru, R.: Supervised extraction of diagnosis codes from EMRs: role of feature selection, data selection, and probabilistic thresholding. In: 2013 IEEE International Conference on Healthcare Informatics, pp. 66–73 (2013)
Scheurwegs, E., Luyckx, K., Luyten, L., Daelemans, W., Van den Bulcke, T.: Data integration of structured and unstructured sources for assigning clinical codes to patient stays. J. Am. Med. Inform. Assoc. 23(e1), 11–19 (2016)
Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. 21(2), 231–237 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Kavuluru, R., Rios, A., Lu, Y.: An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif. Intell. Med. 65(2), 155–166 (2015)
Dougherty, M., Seabold, S., White, S.: Study Reveals hard facts on CAC. J. AHIMA 84(7), 54–56 (2013)
Helwe, C., Elbassuoni, S., Geha, M., Hitti, E., Makhlouf Obermeyer, C.: CCS coding of discharge diagnoses via deep neural networks. In: Proceedings of the 2017 International Conference on Digital Health, DH 2017, pp. 175–179 (2017)
Wang, S., Chang, X., Li, X., Long, G., Yao, L., Sheng, Q.: Diagnosis code assignment using sparsity-based disease correlation embedding. IEEE Trans. Knowl. Data Eng. 28(12), 3191–3202 (2016)
Rizzo, S.G., Montesi, D., Fabbri, A., Marchesini, G.: ICD code retrieval: novel approach for assisted disease classification. In: Ashish, N., Ambite, J.-L. (eds.) DILS 2015. LNCS, vol. 9162, pp. 147–161. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21843-4_12
Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. BMC Bioinf. 9(Suppl. 3), S10 (2008)
Stanfill, M.H., Williams, M., Fenton, S.H., Jenders, R.A., Hersh, W.R.: A systematic literature review of automated clinical coding and classification systems. J. Am. Med. Inform. Assoc. 17(6), 646–651 (2010)
Zhang, Y.: A hierarchical approach to encoding medical concepts for clinical notes. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Student Research Workshop, HLT 2008, p. 67 (2008)
Subotin, M., Davis, A.R.: A method for modeling co-occurrence propensity of clinical codes with application to ICD-10-PCS auto-coding. J. Am. Med. Inform. Assoc. 23(5), 866–871 (2016)
Berndorfer, S., Henriksson, A.: Automated diagnosis coding with combined text representations. Stud. Health Technol. Inform. 235, 201–205 (2017)
Shi, H., Xie, P., Hu, Z., Zhang, M., Xing, E.P.: Towards automated ICD coding using deep learning, pp. 1–11 (2017)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Haykin, S.: Neural Networks and Learning Machines, vol. 3. Pearson, Upper Saddle River (2009)
Acknowledgements
This work is supported by the São Paulo Research Foundation (FAPESP) (Grant #2017/02325-5)Footnote 7.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Virginio, L., dos Reis, J.C. (2019). Automated Coding of Medical Diagnostics from Free-Text: The Role of Parameters Optimization and Imbalanced Classes. In: Auer, S., Vidal, ME. (eds) Data Integration in the Life Sciences. DILS 2018. Lecture Notes in Computer Science(), vol 11371. Springer, Cham. https://doi.org/10.1007/978-3-030-06016-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-06016-9_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06015-2
Online ISBN: 978-3-030-06016-9
eBook Packages: Computer ScienceComputer Science (R0)