Abstract
Automatic extraction of clinical named entities, such as body parts, drugs and surgeries, has been of great significance to understand clinical texts. Deep neural networks approaches have achieved remarkable success in named entity recognition task recently. However, most of these approaches train models from large, high-quality and labor-consuming labeled data. In order to reduce the labeling costs, we propose a weakly supervised learning method for clinical named entity recognition (CNER) tasks. We use a small amount of labeled data as seed corpus, and propose a bootstrapping method integrating external knowledge to iteratively generate the labels for unlabeled data. The external knowledge consists of domain specific dictionaries as well as a bunch of handcraft rules. We conduct experiments on CCKS-2018 CNER task dataset and our approach achieves competitive results comparing to the supervised approach with fully labeled data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kundeti, S.R., Vijayananda, J., Mujjiga, S., Kalyan, M.: Clinical named entity recognition: challenges and opportunities. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 1937–1945. Washington, DC (2016)
Jiang, Z., Zhao, F., Guan, Y., Yang, J.: Research on Chinese electronic medical record oriented lexical corpus annotation. Chin. High Technol. Lett. 24(6), 609–615 (2014)
Deleger, L., et al.: Overview of the bacteria biotope task at bionlp shared task 2016. In Proceedings of the 4th BioNLP Shared Task Workshop, pp. 12–22 (2016)
Alfonseca, E, Manandhar, S.: An unsupervised method for general named entity recognition and automated concept discovery. In: Proceedings of the 1st international conference on general WordNet, Mysore, India, pp. 34–43 (2002)
Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006). https://doi.org/10.1007/11766247_23
Sekine, S., Nobata, C.: Definition, dictionaries and tagger for extended named entity hierarchy. In: Proceedings of the language resources and evaluation conference (LREC), pp. 1977–1980 (2004)
Friedman, C., Alderson, P.O., Austin, J.H.M., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. J. Am. Med. Inform. Assoc. 1(2), 161–174 (1994)
Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pacific Symposium on Biocomputing, pp. 707–718 (1998)
Rindflesch, T.C., Tanabe, L., Weinstein, J.N., Hunter, L.: Edgar, extraction of drugs, genes and relations from the biomedical literature. In: Biocomputing 2000, pp. 517–528. World Scientific (1999)
Gaizauskas, R., Demetriou, G., Humphreys, K.: Term recognition and classification in biological science journal articles. In: Computional Terminology for Medical & Biological Applications Workshop of the 2nd International Conference on NLP, pp. 37–44 (2000)
Mccallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning, pp. 591–598 (2000)
Zhou, G.D., Su, J.: Named entity recognition using an HMM-based chunk tagger. In: Meeting on Association for Computational Linguistics, pp. 473–480 (2002)
Mccallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Conference on Natural Language Learning at Hlt-Naacl, pp. 188–191 (2003)
Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Inform. 70, 85–91 (2017)
Habibi, M., Webber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14), i37–i48 (2017)
Wang, Y., Ananiadou, S., Tsujii, J.: Improve Chinese clinical named entity recognition performance by using the graphical and phonetic feature. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 5386–5488 (2018)
Yang, X., Huang, W.: A conditional random fields approach to clinical name entity recognition. In: 2018 CEUR Workshop Proceedings, vol. 2242, pp. 1–6 (2018)
Luo, L., Li, N.: DUTIR at the CCKS-2018 Task1: A neural network ensemble approach for Chinese clinical named entity recognition. http://CEUR-WS.org/Vol-2242/paper02.pdf (2018)
Zhang, S., Elhadad, N.: Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J. Biomed. Inform. 46(6), 1088–1098 (2013)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Zhou, Z.-H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2018)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Zhang, J., Qin, Y., Zhang, Y., Liu, M., Ji, D.: Extracting entities and events as a single task using a transition-based neural model. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence Main track, pp. 5422–5428
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Readings Speech Recogn. 77(2), 267–296 (1990)
Liu, Y., Tan, Q., Shen, K.X.: The word segmentation rules and automatic word segmentation methods for Chinese information processing. Tsinghua University Press and Guang Xi, p. 36 (1994)
Xue, N.: Chinese word segmentation as character tagging. Int. J. Comput. Linguist. Chin. Lang. Process. 8(1), 29–48 (2003). February 2003: Special Issue on Word Formation and Chinese Language Processing
Zeng, Q.T., Goryachev, S., Weiss, S., Sordo, M., Murphy, S.N., Lazarus, R.: Extracting principle diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med. Inform. Decis. Mak. 6(1), 1–9 (2006)
Savova, G.K., et al.: Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. JAMIA 17(5), 507–513 (2010)
Song, M., Yu, H., Han, W.: Developing a hybrid dictionary-based bio-entity recognition technique. BMC Med. Inform. Decis. Mak. 15(S-1), S9 (2015)
Skeppstedt, M., Kvist, M., Nilsson, G.H., Dalianis, H.: Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text. J. Biomed. Inform. 49(20140), 148–158 (2014)
Ju, Z., Wang, J., Zhu, F.: Named entity recognition from biomedical text using SVM. In: International Conference on Bioinformatics and Biomedical Engineering, pp. 1–4 (2011)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)
Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp. 260–270 (2016)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory. Morgan Kaufmann Publishers (1998)
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In EMNLP/VLC-99 (1999)
Kozareva, Z.: Bootstrapping named entity recognition with automatically generated gazetteer lists. In: EACL The Association for Computer Linguistics, (2006)
Teixeira, J., Sarmento, L., Oliveira, E.: A bootstrapping approach for training a ner with conditional random fields. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS (LNAI), vol. 7026, pp. 664–678. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24769-9_48
Shen, Y.: Deep active learning for named entity recognition. In: ICLR (2018)
Kim, J., Ko, Y., Seo, J.: A bootstrapping approach with CRF and deep learning models for improving the biomedical named entity recognition in multi-domains. IEEE Access 7, 70308–70318 (2019)
Acknowledgments
We sincerely thank the reviewers for their insightful comments and valuable suggestions. Moreover, this work is supported by the National Nature Science Foundation of China under Grants no. 61772505; the National Key R&D Program of China under Grant 2018YFB1005100.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Duan, Y., Ma, LL., Han, X., Sun, L., Dong, B., Jiang, S. (2020). External Knowledge-Based Weakly Supervised Learning Approach on Chinese Clinical Named Entity Recognition. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds) Semantic Technology. JIST 2019. Lecture Notes in Computer Science(), vol 12032. Springer, Cham. https://doi.org/10.1007/978-3-030-41407-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-41407-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41406-1
Online ISBN: 978-3-030-41407-8
eBook Packages: Computer ScienceComputer Science (R0)