A Large-Scale Hierarchical Structure Knowledge Enhanced Pre-training Framework for Automatic ICD Coding

Wang, Shi; Tang, Daniel; Zhang, Luchen

doi:10.1007/978-3-030-92310-5_57

Shi Wang¹⁰,
Daniel Tang¹⁰ &
Luchen Zhang¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1517))

Included in the following conference series:

International Conference on Neural Information Processing

1809 Accesses
3 Citations

Abstract

ICD coding is usually considered as a multi-label prediction task, which assigns accurate multiple ICD codes to clinical texts. In this paper, we present a novel way of injecting prior knowledge of hierarchical structures into BERT (HieBERT) to predict ICD codes automatically. Hierarchical structures consist of code tree positions and code tree sequence LSTM embeddings. We generate them as hierarchical representations of ICD codes. Besides, we train a clinical BERT model on millions of clinical texts to capture contextual and co-occurrence information. Then, we propose an aligning method to map the hierarchical representations into BERT vector space. We implement HieBERT on a widely used dataset: MIMIC-III. And the experimental results indicate that our proposed model achieves great performance compared with previous works.

S. Wang and D. Tang—Represents equal contribution to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: Balcan, M., Weinberger, K.Q. (eds.) Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, 19–24 June 2016. JMLR Workshop and Conference Proceedings, vol. 48, pp. 2091–2100. JMLR.org (2016)
Google Scholar
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 (2019)
Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks. In: Precup, D., Teh, Y.W. (eds.) ICML 2017. Proceedings of Machine Learning Research, vol. 70, pp. 933–941. PMLR (2017)
Google Scholar
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Article Google Scholar
Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Barzilay, R., Kan, M. (eds.) ACL 2017, pp. 562–570. ACL (2017)
Google Scholar
Jouhet, V., et al.: Automated classification of free-text pathology reports for registration of incident cases of cancer. Methods Inf. Med. 51(3), 242 (2012)
Article Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) ACL 2014. pp. 1746–1751. ACL (2014)
Google Scholar
Mullenbach, J., Wiegreffe, S., Duke, J., Sun, J., Eisenstein, J.: Explainable prediction of medical codes from clinical text. In: NAACL 2018, June 2018
Google Scholar
Nadathur, S.G.: Maximising the value of hospital administrative datasets. Aust. Health Rev. 34(2), 216–223 (2010)
Article Google Scholar
Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. 21(2), 231–237 (2014)
Article Google Scholar
Sänger, M., Weber, L., Kittner, M., Leser, U.: Classifying German animal experiment summaries with multi-lingual BERT at CLEF eHealth 2019 task 1. In: CLEF (Working Notes) (2019)
Google Scholar
dos Santos, C.N., Tan, M., Xiang, B., Zhou, B.: Attentive pooling networks. CoRR (2016). http://arxiv.org/abs/1602.03609
Shang, J., Ma, T., Xiao, C., Sun, J.: Pre-training of graph augmented transformers for medication recommendation. arXiv preprint arXiv:1906.00346 (2019)
Shiv, V.L., Quirk, C.: Novel positional encodings to enable tree-based transformers. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) NeurIPS 2019, pp. 12058–12068 (2019)
Google Scholar
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: ACL, July 2015
Google Scholar
Xie, P., Xing, E.: A neural architecture for automated ICD coding. In: ACL 2018, July 2018
Google Scholar
Xie, X., Xiong, Y., Yu, P.S., Zhu, Y.: EHR coding with multi-scale feature attention and structured knowledge graph propagation. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 649–658 (2019)
Google Scholar
Yin, W., Schütze, H.: Attentive convolution. CoRR abs/1710.00519 (2017)
Google Scholar
Zhang, Z., Liu, J., Razavian, N.: BERT-XML: large scale automated ICD coding using BERT pretraining. In: Proceedings of the 3rd Clinical Natural Language Processing Workshop, November 2020
Google Scholar
Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)
Article Google Scholar

Download references

Acknowledgements

We thank the reviewers and colleagues for their valuable feedback. This research was supported by the Chinese National 242 Information Security Program (2021A008), Beijing NOVA Program (Cross-discipline, Z191100001119014), the National Key Research and Development Program of China (2017YFB1002300, 2017YFC1700300), National Natural Science Foundation of China (61702234).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Shi Wang & Daniel Tang
National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing, China
Luchen Zhang

Authors

Shi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Tang
View author publications
You can also search for this author in PubMed Google Scholar
Luchen Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shi Wang .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Tang, D., Zhang, L. (2021). A Large-Scale Hierarchical Structure Knowledge Enhanced Pre-training Framework for Automatic ICD Coding. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1517. Springer, Cham. https://doi.org/10.1007/978-3-030-92310-5_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-92310-5_57
Published: 02 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92309-9
Online ISBN: 978-3-030-92310-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Large-Scale Hierarchical Structure Knowledge Enhanced Pre-training Framework for Automatic ICD Coding