External Knowledge-Based Weakly Supervised Learning Approach on Chinese Clinical Named Entity Recognition

Duan, Yeheng; Ma, Long-Long; Han, Xianpei; Sun, Le; Dong, Bin; Jiang, Shanshan

doi:10.1007/978-3-030-41407-8_22

Yeheng Duan^12,13,
Long-Long Ma¹²,
Xianpei Han¹²,
Le Sun¹²,
Bin Dong¹⁴ &
…
Shanshan Jiang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12032))

Included in the following conference series:

Joint International Semantic Technology Conference

1069 Accesses
2 Citations

Abstract

Automatic extraction of clinical named entities, such as body parts, drugs and surgeries, has been of great significance to understand clinical texts. Deep neural networks approaches have achieved remarkable success in named entity recognition task recently. However, most of these approaches train models from large, high-quality and labor-consuming labeled data. In order to reduce the labeling costs, we propose a weakly supervised learning method for clinical named entity recognition (CNER) tasks. We use a small amount of labeled data as seed corpus, and propose a bootstrapping method integrating external knowledge to iteratively generate the labels for unlabeled data. The external knowledge consists of domain specific dictionaries as well as a bunch of handcraft rules. We conduct experiments on CCKS-2018 CNER task dataset and our approach achieves competitive results comparing to the supervised approach with fully labeled data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kundeti, S.R., Vijayananda, J., Mujjiga, S., Kalyan, M.: Clinical named entity recognition: challenges and opportunities. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 1937–1945. Washington, DC (2016)
Google Scholar
Jiang, Z., Zhao, F., Guan, Y., Yang, J.: Research on Chinese electronic medical record oriented lexical corpus annotation. Chin. High Technol. Lett. 24(6), 609–615 (2014)
Google Scholar
Deleger, L., et al.: Overview of the bacteria biotope task at bionlp shared task 2016. In Proceedings of the 4th BioNLP Shared Task Workshop, pp. 12–22 (2016)
Google Scholar
Alfonseca, E, Manandhar, S.: An unsupervised method for general named entity recognition and automated concept discovery. In: Proceedings of the 1st international conference on general WordNet, Mysore, India, pp. 34–43 (2002)
Google Scholar
Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006). https://doi.org/10.1007/11766247_23
Chapter Google Scholar
Sekine, S., Nobata, C.: Definition, dictionaries and tagger for extended named entity hierarchy. In: Proceedings of the language resources and evaluation conference (LREC), pp. 1977–1980 (2004)
Google Scholar
Friedman, C., Alderson, P.O., Austin, J.H.M., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. J. Am. Med. Inform. Assoc. 1(2), 161–174 (1994)
Article Google Scholar
Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pacific Symposium on Biocomputing, pp. 707–718 (1998)
Google Scholar
Rindflesch, T.C., Tanabe, L., Weinstein, J.N., Hunter, L.: Edgar, extraction of drugs, genes and relations from the biomedical literature. In: Biocomputing 2000, pp. 517–528. World Scientific (1999)
Google Scholar
Gaizauskas, R., Demetriou, G., Humphreys, K.: Term recognition and classification in biological science journal articles. In: Computional Terminology for Medical & Biological Applications Workshop of the 2nd International Conference on NLP, pp. 37–44 (2000)
Google Scholar
Mccallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning, pp. 591–598 (2000)
Google Scholar
Zhou, G.D., Su, J.: Named entity recognition using an HMM-based chunk tagger. In: Meeting on Association for Computational Linguistics, pp. 473–480 (2002)
Google Scholar
Mccallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Conference on Natural Language Learning at Hlt-Naacl, pp. 188–191 (2003)
Google Scholar
Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Inform. 70, 85–91 (2017)
Article Google Scholar
Habibi, M., Webber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14), i37–i48 (2017)
Article Google Scholar
Wang, Y., Ananiadou, S., Tsujii, J.: Improve Chinese clinical named entity recognition performance by using the graphical and phonetic feature. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 5386–5488 (2018)
Google Scholar
Yang, X., Huang, W.: A conditional random fields approach to clinical name entity recognition. In: 2018 CEUR Workshop Proceedings, vol. 2242, pp. 1–6 (2018)
Google Scholar
Luo, L., Li, N.: DUTIR at the CCKS-2018 Task1: A neural network ensemble approach for Chinese clinical named entity recognition. http://CEUR-WS.org/Vol-2242/paper02.pdf (2018)
Zhang, S., Elhadad, N.: Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J. Biomed. Inform. 46(6), 1088–1098 (2013)
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
Zhou, Z.-H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2018)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Zhang, J., Qin, Y., Zhang, Y., Liu, M., Ji, D.: Extracting entities and events as a single task using a transition-based neural model. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence Main track, pp. 5422–5428
Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Readings Speech Recogn. 77(2), 267–296 (1990)
Article Google Scholar
Liu, Y., Tan, Q., Shen, K.X.: The word segmentation rules and automatic word segmentation methods for Chinese information processing. Tsinghua University Press and Guang Xi, p. 36 (1994)
Google Scholar
Xue, N.: Chinese word segmentation as character tagging. Int. J. Comput. Linguist. Chin. Lang. Process. 8(1), 29–48 (2003). February 2003: Special Issue on Word Formation and Chinese Language Processing
MathSciNet Google Scholar
Zeng, Q.T., Goryachev, S., Weiss, S., Sordo, M., Murphy, S.N., Lazarus, R.: Extracting principle diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med. Inform. Decis. Mak. 6(1), 1–9 (2006)
Article Google Scholar
Savova, G.K., et al.: Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. JAMIA 17(5), 507–513 (2010)
Article Google Scholar
Song, M., Yu, H., Han, W.: Developing a hybrid dictionary-based bio-entity recognition technique. BMC Med. Inform. Decis. Mak. 15(S-1), S9 (2015)
Article Google Scholar
Skeppstedt, M., Kvist, M., Nilsson, G.H., Dalianis, H.: Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text. J. Biomed. Inform. 49(20140), 148–158 (2014)
Article Google Scholar
Ju, Z., Wang, J., Zhu, F.: Named entity recognition from biomedical text using SVM. In: International Conference on Bioinformatics and Biomedical Engineering, pp. 1–4 (2011)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)
MATH Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp. 260–270 (2016)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory. Morgan Kaufmann Publishers (1998)
Google Scholar
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In EMNLP/VLC-99 (1999)
Google Scholar
Kozareva, Z.: Bootstrapping named entity recognition with automatically generated gazetteer lists. In: EACL The Association for Computer Linguistics, (2006)
Google Scholar
Teixeira, J., Sarmento, L., Oliveira, E.: A bootstrapping approach for training a ner with conditional random fields. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS (LNAI), vol. 7026, pp. 664–678. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24769-9_48
Chapter Google Scholar
Shen, Y.: Deep active learning for named entity recognition. In: ICLR (2018)
Google Scholar
Kim, J., Ko, Y., Seo, J.: A bootstrapping approach with CRF and deep learning models for improving the biomedical named entity recognition in multi-domains. IEEE Access 7, 70308–70318 (2019)
Article Google Scholar

Download references

Acknowledgments

We sincerely thank the reviewers for their insightful comments and valuable suggestions. Moreover, this work is supported by the National Nature Science Foundation of China under Grants no. 61772505; the National Key R&D Program of China under Grant 2018YFB1005100.

Author information

Authors and Affiliations

Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China
Yeheng Duan, Long-Long Ma, Xianpei Han & Le Sun
University of Chinese Academy of Sciences, Beijing, China
Yeheng Duan
Ricoh Software Research Center (Beijing) Co., Ltd., Beijing, China
Bin Dong & Shanshan Jiang

Authors

Yeheng Duan
View author publications
You can also search for this author in PubMed Google Scholar
Long-Long Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xianpei Han
View author publications
You can also search for this author in PubMed Google Scholar
Le Sun
View author publications
You can also search for this author in PubMed Google Scholar
Bin Dong
View author publications
You can also search for this author in PubMed Google Scholar
Shanshan Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long-Long Ma .

Editor information

Editors and Affiliations

Tianjin University, Tianjin, China
Xin Wang
University of Bari, Bari, Italy
Francesca Alessandra Lisi
Free University of Bozen-Bolzano, Bozen-Bolzano, Italy
Guohui Xiao
Imperial College London, London, UK
Elena Botoeva

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duan, Y., Ma, LL., Han, X., Sun, L., Dong, B., Jiang, S. (2020). External Knowledge-Based Weakly Supervised Learning Approach on Chinese Clinical Named Entity Recognition. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds) Semantic Technology. JIST 2019. Lecture Notes in Computer Science(), vol 12032. Springer, Cham. https://doi.org/10.1007/978-3-030-41407-8_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-41407-8_22
Published: 14 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41406-1
Online ISBN: 978-3-030-41407-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics