Skip to main content

A Large-Scale Hierarchical Structure Knowledge Enhanced Pre-training Framework for Automatic ICD Coding

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1517))

Included in the following conference series:

Abstract

ICD coding is usually considered as a multi-label prediction task, which assigns accurate multiple ICD codes to clinical texts. In this paper, we present a novel way of injecting prior knowledge of hierarchical structures into BERT (HieBERT) to predict ICD codes automatically. Hierarchical structures consist of code tree positions and code tree sequence LSTM embeddings. We generate them as hierarchical representations of ICD codes. Besides, we train a clinical BERT model on millions of clinical texts to capture contextual and co-occurrence information. Then, we propose an aligning method to map the hierarchical representations into BERT vector space. We implement HieBERT on a widely used dataset: MIMIC-III. And the experimental results indicate that our proposed model achieves great performance compared with previous works.

S. Wang and D. Tang—Represents equal contribution to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: Balcan, M., Weinberger, K.Q. (eds.) Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, 19–24 June 2016. JMLR Workshop and Conference Proceedings, vol. 48, pp. 2091–2100. JMLR.org (2016)

    Google Scholar 

  2. Alsentzer, E., et al.: Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 (2019)

  3. Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks. In: Precup, D., Teh, Y.W. (eds.) ICML 2017. Proceedings of Machine Learning Research, vol. 70, pp. 933–941. PMLR (2017)

    Google Scholar 

  4. Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)

    Article  Google Scholar 

  5. Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Barzilay, R., Kan, M. (eds.) ACL 2017, pp. 562–570. ACL (2017)

    Google Scholar 

  6. Jouhet, V., et al.: Automated classification of free-text pathology reports for registration of incident cases of cancer. Methods Inf. Med. 51(3), 242 (2012)

    Article  Google Scholar 

  7. Kim, Y.: Convolutional neural networks for sentence classification. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) ACL 2014. pp. 1746–1751. ACL (2014)

    Google Scholar 

  8. Mullenbach, J., Wiegreffe, S., Duke, J., Sun, J., Eisenstein, J.: Explainable prediction of medical codes from clinical text. In: NAACL 2018, June 2018

    Google Scholar 

  9. Nadathur, S.G.: Maximising the value of hospital administrative datasets. Aust. Health Rev. 34(2), 216–223 (2010)

    Article  Google Scholar 

  10. Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. 21(2), 231–237 (2014)

    Article  Google Scholar 

  11. Sänger, M., Weber, L., Kittner, M., Leser, U.: Classifying German animal experiment summaries with multi-lingual BERT at CLEF eHealth 2019 task 1. In: CLEF (Working Notes) (2019)

    Google Scholar 

  12. dos Santos, C.N., Tan, M., Xiang, B., Zhou, B.: Attentive pooling networks. CoRR (2016). http://arxiv.org/abs/1602.03609

  13. Shang, J., Ma, T., Xiao, C., Sun, J.: Pre-training of graph augmented transformers for medication recommendation. arXiv preprint arXiv:1906.00346 (2019)

  14. Shiv, V.L., Quirk, C.: Novel positional encodings to enable tree-based transformers. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) NeurIPS 2019, pp. 12058–12068 (2019)

    Google Scholar 

  15. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: ACL, July 2015

    Google Scholar 

  16. Xie, P., Xing, E.: A neural architecture for automated ICD coding. In: ACL 2018, July 2018

    Google Scholar 

  17. Xie, X., Xiong, Y., Yu, P.S., Zhu, Y.: EHR coding with multi-scale feature attention and structured knowledge graph propagation. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 649–658 (2019)

    Google Scholar 

  18. Yin, W., Schütze, H.: Attentive convolution. CoRR abs/1710.00519 (2017)

    Google Scholar 

  19. Zhang, Z., Liu, J., Razavian, N.: BERT-XML: large scale automated ICD coding using BERT pretraining. In: Proceedings of the 3rd Clinical Natural Language Processing Workshop, November 2020

    Google Scholar 

  20. Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

We thank the reviewers and colleagues for their valuable feedback. This research was supported by the Chinese National 242 Information Security Program (2021A008), Beijing NOVA Program (Cross-discipline, Z191100001119014), the National Key Research and Development Program of China (2017YFB1002300, 2017YFC1700300), National Natural Science Foundation of China (61702234).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shi Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S., Tang, D., Zhang, L. (2021). A Large-Scale Hierarchical Structure Knowledge Enhanced Pre-training Framework for Automatic ICD Coding. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1517. Springer, Cham. https://doi.org/10.1007/978-3-030-92310-5_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92310-5_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92309-9

  • Online ISBN: 978-3-030-92310-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics