Skip to main content

External Knowledge-Based Weakly Supervised Learning Approach on Chinese Clinical Named Entity Recognition

  • Conference paper
  • First Online:
Semantic Technology (JIST 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12032))

Included in the following conference series:

Abstract

Automatic extraction of clinical named entities, such as body parts, drugs and surgeries, has been of great significance to understand clinical texts. Deep neural networks approaches have achieved remarkable success in named entity recognition task recently. However, most of these approaches train models from large, high-quality and labor-consuming labeled data. In order to reduce the labeling costs, we propose a weakly supervised learning method for clinical named entity recognition (CNER) tasks. We use a small amount of labeled data as seed corpus, and propose a bootstrapping method integrating external knowledge to iteratively generate the labels for unlabeled data. The external knowledge consists of domain specific dictionaries as well as a bunch of handcraft rules. We conduct experiments on CCKS-2018 CNER task dataset and our approach achieves competitive results comparing to the supervised approach with fully labeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kundeti, S.R., Vijayananda, J., Mujjiga, S., Kalyan, M.: Clinical named entity recognition: challenges and opportunities. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 1937–1945. Washington, DC (2016)

    Google Scholar 

  2. Jiang, Z., Zhao, F., Guan, Y., Yang, J.: Research on Chinese electronic medical record oriented lexical corpus annotation. Chin. High Technol. Lett. 24(6), 609–615 (2014)

    Google Scholar 

  3. Deleger, L., et al.: Overview of the bacteria biotope task at bionlp shared task 2016. In Proceedings of the 4th BioNLP Shared Task Workshop, pp. 12–22 (2016)

    Google Scholar 

  4. Alfonseca, E, Manandhar, S.: An unsupervised method for general named entity recognition and automated concept discovery. In: Proceedings of the 1st international conference on general WordNet, Mysore, India, pp. 34–43 (2002)

    Google Scholar 

  5. Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006). https://doi.org/10.1007/11766247_23

    Chapter  Google Scholar 

  6. Sekine, S., Nobata, C.: Definition, dictionaries and tagger for extended named entity hierarchy. In: Proceedings of the language resources and evaluation conference (LREC), pp. 1977–1980 (2004)

    Google Scholar 

  7. Friedman, C., Alderson, P.O., Austin, J.H.M., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. J. Am. Med. Inform. Assoc. 1(2), 161–174 (1994)

    Article  Google Scholar 

  8. Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pacific Symposium on Biocomputing, pp. 707–718 (1998)

    Google Scholar 

  9. Rindflesch, T.C., Tanabe, L., Weinstein, J.N., Hunter, L.: Edgar, extraction of drugs, genes and relations from the biomedical literature. In: Biocomputing 2000, pp. 517–528. World Scientific (1999)

    Google Scholar 

  10. Gaizauskas, R., Demetriou, G., Humphreys, K.: Term recognition and classification in biological science journal articles. In: Computional Terminology for Medical & Biological Applications Workshop of the 2nd International Conference on NLP, pp. 37–44 (2000)

    Google Scholar 

  11. Mccallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning, pp. 591–598 (2000)

    Google Scholar 

  12. Zhou, G.D., Su, J.: Named entity recognition using an HMM-based chunk tagger. In: Meeting on Association for Computational Linguistics, pp. 473–480 (2002)

    Google Scholar 

  13. Mccallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Conference on Natural Language Learning at Hlt-Naacl, pp. 188–191 (2003)

    Google Scholar 

  14. Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Inform. 70, 85–91 (2017)

    Article  Google Scholar 

  15. Habibi, M., Webber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14), i37–i48 (2017)

    Article  Google Scholar 

  16. Wang, Y., Ananiadou, S., Tsujii, J.: Improve Chinese clinical named entity recognition performance by using the graphical and phonetic feature. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 5386–5488 (2018)

    Google Scholar 

  17. Yang, X., Huang, W.: A conditional random fields approach to clinical name entity recognition. In: 2018 CEUR Workshop Proceedings, vol. 2242, pp. 1–6 (2018)

    Google Scholar 

  18. Luo, L., Li, N.: DUTIR at the CCKS-2018 Task1: A neural network ensemble approach for Chinese clinical named entity recognition. http://CEUR-WS.org/Vol-2242/paper02.pdf (2018)

  19. Zhang, S., Elhadad, N.: Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J. Biomed. Inform. 46(6), 1088–1098 (2013)

    Article  Google Scholar 

  20. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  21. Zhou, Z.-H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2018)

    Article  Google Scholar 

  22. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)

    Google Scholar 

  23. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  24. Zhang, J., Qin, Y., Zhang, Y., Liu, M., Ji, D.: Extracting entities and events as a single task using a transition-based neural model. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence Main track, pp. 5422–5428

    Google Scholar 

  25. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Readings Speech Recogn. 77(2), 267–296 (1990)

    Article  Google Scholar 

  26. Liu, Y., Tan, Q., Shen, K.X.: The word segmentation rules and automatic word segmentation methods for Chinese information processing. Tsinghua University Press and Guang Xi, p. 36 (1994)

    Google Scholar 

  27. Xue, N.: Chinese word segmentation as character tagging. Int. J. Comput. Linguist. Chin. Lang. Process. 8(1), 29–48 (2003). February 2003: Special Issue on Word Formation and Chinese Language Processing

    MathSciNet  Google Scholar 

  28. Zeng, Q.T., Goryachev, S., Weiss, S., Sordo, M., Murphy, S.N., Lazarus, R.: Extracting principle diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med. Inform. Decis. Mak. 6(1), 1–9 (2006)

    Article  Google Scholar 

  29. Savova, G.K., et al.: Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. JAMIA 17(5), 507–513 (2010)

    Article  Google Scholar 

  30. Song, M., Yu, H., Han, W.: Developing a hybrid dictionary-based bio-entity recognition technique. BMC Med. Inform. Decis. Mak. 15(S-1), S9 (2015)

    Article  Google Scholar 

  31. Skeppstedt, M., Kvist, M., Nilsson, G.H., Dalianis, H.: Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text. J. Biomed. Inform. 49(20140), 148–158 (2014)

    Article  Google Scholar 

  32. Ju, Z., Wang, J., Zhu, F.: Named entity recognition from biomedical text using SVM. In: International Conference on Bioinformatics and Biomedical Engineering, pp. 1–4 (2011)

    Google Scholar 

  33. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)

    MATH  Google Scholar 

  34. Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)

  35. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp. 260–270 (2016)

    Google Scholar 

  36. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)

    Google Scholar 

  37. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory. Morgan Kaufmann Publishers (1998)

    Google Scholar 

  38. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In EMNLP/VLC-99 (1999)

    Google Scholar 

  39. Kozareva, Z.: Bootstrapping named entity recognition with automatically generated gazetteer lists. In: EACL The Association for Computer Linguistics, (2006)

    Google Scholar 

  40. Teixeira, J., Sarmento, L., Oliveira, E.: A bootstrapping approach for training a ner with conditional random fields. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS (LNAI), vol. 7026, pp. 664–678. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24769-9_48

    Chapter  Google Scholar 

  41. Shen, Y.: Deep active learning for named entity recognition. In: ICLR (2018)

    Google Scholar 

  42. Kim, J., Ko, Y., Seo, J.: A bootstrapping approach with CRF and deep learning models for improving the biomedical named entity recognition in multi-domains. IEEE Access 7, 70308–70318 (2019)

    Article  Google Scholar 

Download references

Acknowledgments

We sincerely thank the reviewers for their insightful comments and valuable suggestions. Moreover, this work is supported by the National Nature Science Foundation of China under Grants no. 61772505; the National Key R&D Program of China under Grant 2018YFB1005100.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Long-Long Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Duan, Y., Ma, LL., Han, X., Sun, L., Dong, B., Jiang, S. (2020). External Knowledge-Based Weakly Supervised Learning Approach on Chinese Clinical Named Entity Recognition. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds) Semantic Technology. JIST 2019. Lecture Notes in Computer Science(), vol 12032. Springer, Cham. https://doi.org/10.1007/978-3-030-41407-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41407-8_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41406-1

  • Online ISBN: 978-3-030-41407-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics