Skip to main content

SqueezeBioBERT: BioBERT Distillation for Healthcare Natural Language Processing

  • Conference paper
  • First Online:
Computational Data and Social Networks (CSoNet 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12575))

Included in the following conference series:

Abstract

Healthcare text mining attracts increasing research interest as electronic health record (EHR) and healthcare claim data have skyrocketed over the past decade. Recently, deep pre-trained language models have improved many natural language processing tasks significantly. However, directly applying them to healthcare text mining won’t generate satisfactory results, because those models are trained from generic domain corpora, which contains a word distribution shift from healthcare corpora. Moreover, deep pre-trained language models are generally computationally expensive and memory intensive, which makes them very difficult to use on resource-restricted devices. In this work, we designed a novel knowledge distillation method, which is very effective for Transformer-based models. We applied this knowledge distillation method to BioBERT [5], and experiments show that knowledge encoded in the large BioBERT can be effectively transferred to a compressed version of SqueezeBioBERT. We evaluated SqueezeBioBERT on three healthcare text mining tasks: named entity recognition, relation extraction and question answering. The result shows that SqueezeBioBERT achieves more than 95% of the performance of teacher BioBERT on these three tasks, while being 4.2X smaller.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 5998–6008 (2017)

    Google Scholar 

  2. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  3. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  4. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237 (2019)

  5. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2019)

    Google Scholar 

  6. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  7. Jiao, X., et al.: TinyBERT: distilling BERT for natural language understanding. arXiv preprint arXiv:1909.10351 (2019)

  8. Sun, Z., Yu, H., Song, X., Liu, R., Yang, Y., Zhou, D.: MobileBERT: a compact task-agnostic BERT for resource-limited devices. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)

    Google Scholar 

  9. Bucilua, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, New York, NY, USA, pp. 535–541. ACM (2006)

    Google Scholar 

  10. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  11. Urban, G., et al.: Do deep convolutional nets really need to be deep (or even convolutional)? In: Proceedings of the International Conference on Learning Representations (2016)

    Google Scholar 

  12. Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Proceedings of the Advances in Neural Information Processing Systems, pp. 2654–2662 (2014)

    Google Scholar 

  13. Dogan, R.I., et al.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)

    Article  Google Scholar 

  14. Uzuner, O., et al.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18, 552–556 (2011)

    Article  Google Scholar 

  15. Li, J., et al.: BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016, 1–10 (2016)

    Google Scholar 

  16. Krallinger, M., et al.: The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminform. 7, 1–17 (2015)

    Article  Google Scholar 

  17. Smith, L., et al.: Overview of BioCreative II gene mention recognition. Genome Biol. 9, 1–19 (2008). https://doi.org/10.1186/gb-2008-9-s2-s2

    Article  Google Scholar 

  18. Kim, J.-D., et al.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP), Geneva, Switzerland, pp. 73–78 (2004). COLING. https://www.aclweb.org/anthology/W04-1213

  19. Gerner, M., et al.: LINNAEUS: a species name identification system for biomedical literature. BMC Bioinform. 11, 85 (2010). https://doi.org/10.1186/1471-2105-11-85

    Article  Google Scholar 

  20. Pafilis, E., et al.: The SPECIES and ORGANISMS resources for fast and accurate identification of taxonomic names in text. PLoS One 8, e65390 (2013)

    Article  Google Scholar 

  21. Bravo, A., et al.: Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinform. 16, 55 (2015). https://doi.org/10.1186/s12859-015-0472-9

    Article  Google Scholar 

  22. Van Mulligen, E.M., et al.: The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J. Biomed. Inform. 45, 879–884 (2012)

    Article  Google Scholar 

  23. Krallinger, M., et al.: Overview of the BioCreative VI chemical-protein interaction track. In: Proceedings of the BioCreative VI Workshop, Bethesda, MD, USA, pp. 141–146. https://doi.org/10.1093/database/bay073/5055578 (2017)

  24. Tsatsaronis, G., et al.: An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 16, 138 (2015). https://doi.org/10.1186/s12859-015-0564-6

    Article  Google Scholar 

  25. https://github.com/naver/biobert-pretrained

Download references

Acknowledgement

This work was supported by Humana.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanke Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Du, H.G., Hu, Y. (2020). SqueezeBioBERT: BioBERT Distillation for Healthcare Natural Language Processing. In: Chellappan, S., Choo, KK.R., Phan, N. (eds) Computational Data and Social Networks. CSoNet 2020. Lecture Notes in Computer Science(), vol 12575. Springer, Cham. https://doi.org/10.1007/978-3-030-66046-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66046-8_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66045-1

  • Online ISBN: 978-3-030-66046-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics