Abstract
ICD coding is usually considered as a multi-label prediction task, which assigns accurate multiple ICD codes to clinical texts. In this paper, we present a novel way of injecting prior knowledge of hierarchical structures into BERT (HieBERT) to predict ICD codes automatically. Hierarchical structures consist of code tree positions and code tree sequence LSTM embeddings. We generate them as hierarchical representations of ICD codes. Besides, we train a clinical BERT model on millions of clinical texts to capture contextual and co-occurrence information. Then, we propose an aligning method to map the hierarchical representations into BERT vector space. We implement HieBERT on a widely used dataset: MIMIC-III. And the experimental results indicate that our proposed model achieves great performance compared with previous works.
S. Wang and D. Tang—Represents equal contribution to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: Balcan, M., Weinberger, K.Q. (eds.) Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, 19–24 June 2016. JMLR Workshop and Conference Proceedings, vol. 48, pp. 2091–2100. JMLR.org (2016)
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 (2019)
Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks. In: Precup, D., Teh, Y.W. (eds.) ICML 2017. Proceedings of Machine Learning Research, vol. 70, pp. 933–941. PMLR (2017)
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Barzilay, R., Kan, M. (eds.) ACL 2017, pp. 562–570. ACL (2017)
Jouhet, V., et al.: Automated classification of free-text pathology reports for registration of incident cases of cancer. Methods Inf. Med. 51(3), 242 (2012)
Kim, Y.: Convolutional neural networks for sentence classification. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) ACL 2014. pp. 1746–1751. ACL (2014)
Mullenbach, J., Wiegreffe, S., Duke, J., Sun, J., Eisenstein, J.: Explainable prediction of medical codes from clinical text. In: NAACL 2018, June 2018
Nadathur, S.G.: Maximising the value of hospital administrative datasets. Aust. Health Rev. 34(2), 216–223 (2010)
Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. 21(2), 231–237 (2014)
Sänger, M., Weber, L., Kittner, M., Leser, U.: Classifying German animal experiment summaries with multi-lingual BERT at CLEF eHealth 2019 task 1. In: CLEF (Working Notes) (2019)
dos Santos, C.N., Tan, M., Xiang, B., Zhou, B.: Attentive pooling networks. CoRR (2016). http://arxiv.org/abs/1602.03609
Shang, J., Ma, T., Xiao, C., Sun, J.: Pre-training of graph augmented transformers for medication recommendation. arXiv preprint arXiv:1906.00346 (2019)
Shiv, V.L., Quirk, C.: Novel positional encodings to enable tree-based transformers. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) NeurIPS 2019, pp. 12058–12068 (2019)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: ACL, July 2015
Xie, P., Xing, E.: A neural architecture for automated ICD coding. In: ACL 2018, July 2018
Xie, X., Xiong, Y., Yu, P.S., Zhu, Y.: EHR coding with multi-scale feature attention and structured knowledge graph propagation. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 649–658 (2019)
Yin, W., Schütze, H.: Attentive convolution. CoRR abs/1710.00519 (2017)
Zhang, Z., Liu, J., Razavian, N.: BERT-XML: large scale automated ICD coding using BERT pretraining. In: Proceedings of the 3rd Clinical Natural Language Processing Workshop, November 2020
Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)
Acknowledgements
We thank the reviewers and colleagues for their valuable feedback. This research was supported by the Chinese National 242 Information Security Program (2021A008), Beijing NOVA Program (Cross-discipline, Z191100001119014), the National Key Research and Development Program of China (2017YFB1002300, 2017YFC1700300), National Natural Science Foundation of China (61702234).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, S., Tang, D., Zhang, L. (2021). A Large-Scale Hierarchical Structure Knowledge Enhanced Pre-training Framework for Automatic ICD Coding. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1517. Springer, Cham. https://doi.org/10.1007/978-3-030-92310-5_57
Download citation
DOI: https://doi.org/10.1007/978-3-030-92310-5_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92309-9
Online ISBN: 978-3-030-92310-5
eBook Packages: Computer ScienceComputer Science (R0)