Skip to main content
Log in

A weakly supervised method for named entity recognition of Chinese electronic medical records

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

The field of Chinese medical natural language processing faces a significant challenge in training accurate entity recognition models due to the limited availability of high-quality labeled data. In response, we propose a joint training model, MCBERT-GCN-CRF, which achieves high performance in identifying medical-related entities in Chinese electronic medical records. Additionally, we introduce CM-NER, a 5-step framework that effectively mitigates the effects of noise in weakly labeled data and establishes a principled connection between supervised and weakly supervised named entity recognition. We demonstrate significant improvements in recall rate and accuracy. Our approach outperforms traditional fully supervised pre-training models and other state-of-the-art methods by suppressing noise in weakly labeled data. Our proposed framework achieves an F1 score of 86.29% on the CCKS-2019 dataset, significantly higher than pre-trained model baselines ranging from 74.17 to 83.06%, and higher than the top-performing named entity recognition supervised learning models in the CCKS-2019 competition. Our results demonstrate the effectiveness of our proposed framework and highlight the potential of leveraging unlabeled data to train accurate models for named entity recognition in Chinese medical natural language processing. This research has significant implications for advancing natural language processing techniques in the medical domain and improving patient care.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Chuanhai D, Jiajun Z, Chengqing Z et al (2016) Character based LSTM-CRF with radical-level features for Chinese named entity recognition[C]//Natural Language Understanding and Intelligent Applications - 5th Conference on Natural Language Processing and Chinese Computing( NLPCC). Kunming: Springer Press, 239-250

  2. Heurix J, Fenz S, Rella A et al (2016) Recognition and pseudonymisation of medical records for secondary use. Med Biol Eng Comput 54:371–383. https://doi.org/10.1007/s11517-015-1322-7

    Article  PubMed  Google Scholar 

  3. Feng Y, Sun L, Zhang J (2005) Early results for Chinese named entity recognition using conditional random fields model, HMM and maximum entropy [C]//International Conference on Natural Language Processing and Knowledge Engineering.Wuhan,China: IEEE Press, 549–552

  4. Cen X, Yuan J, Pan C et al (2021) Contextual embedding bootstrapped neural network for medical information extraction of coronary artery disease records. Med Biol Eng Comput 59:1111–1121. https://doi.org/10.1007/s11517-021-02359-1

    Article  PubMed  Google Scholar 

  5. Cocos A, Fiks AG, Masino AJ (2017) Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts[J]. J Am Med Inform Assoc: JAMIA 24(4):813–821. https://doi.org/10.1093/jamia/ocw180

    Article  PubMed  PubMed Central  Google Scholar 

  6. Cho K,Van Merrienboer B,Gulcehre C et al (2014) Learning phrase representations using RNN encoder- decoder for statistical machine translation[J]. arXiv:1406.1078. https://doi.org/10.48550/arXiv.1406.1078

  7. Hochreiter S, Schmidhuber J (1997) Long short-term memory[J]. Neural Computation 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  CAS  PubMed  Google Scholar 

  8. Kim J, Woodland P C (2000) A rule-based named entity recognition system for speech input [C]//Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China, 528–531

  9. Asahara M, Matsumoto Y (2003) Japanese named entity extraction with redundant morphological analysis[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1:8-15

  10. Chen L, Yue Y, Haoming J et al (2020) BOND: bert-assisted open-domain named entity recognition with distant supervision. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23–27, 1054–1064

  11. Lample G, Ballesteros M, Subramanian S et al (2016) Neural architectures for named entity recognition[J]. arXiv:1603.01360. https://doi.org/10.48550/arXiv.1603.01360

  12. Xia L, Qinghua W, Hu L et al (2021) (2021) Overview of CCKS 2020 Task 3: named entity recognition and event extraction in Chinese electronic medical records. Data Intell 3(3):376–388. https://doi.org/10.1162/dint_a_00093

    Article  Google Scholar 

  13. Chen P, Zhang M, Yu X et al (2022) Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT. BMC Med Inform Decis Mak 22:315. https://doi.org/10.1186/s12911-022-02059-2

    Article  PubMed  PubMed Central  Google Scholar 

  14. Wu Y, Jiang M, Lei J et al (2015) Named entity recognition in Chinese clinical text using deep neural network[J]. Stud Health Technol Inform 216:624

    PubMed  PubMed Central  Google Scholar 

  15. Xiaonan L, Yan H, Xipeng Q, Xuanjing H (2020) FLAT: Chinese NER using flat-lattice transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 6836–6842

  16. Yangtian Y, Xinyu Z, Xian W (2020) Medical named entity recognition based on BERT and glyphs[C]//Proceedings of the Evaluation Task at the China Conference on Knowledge Graph and Cognitive Intelligence. Nanchang: CCKS

  17. Gong L, Zhang Z, Chen S (2020) Clinical named entity recognition from Chinese electronic medical records based on deep learning pretraining [J]. J Healthc Eng 8829219. https://doi.org/10.1155/2020/8829219

  18. Fangcong Z, Qiuli Q, Yong J, Runtao Z (2022) Named entity recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. Data Anal Knowl Discov 6(2/3):251–262. https://doi.org/10.11925/infotech.2096-3467.2021.0910

    Article  Google Scholar 

  19. Shenqi J, Youlin Z (2021) Recognizing clinical named entity from Chinese electronic medical record texts based on semi-supervised deep learning[J]. J Inf Resour Manag 11(6):105–115. https://doi.org/10.13365/j.jirm.2021.06.105

    Article  Google Scholar 

  20. Ma LL, Yang J, An B et al (2022) Medical named entity recognition using weakly supervised learning. Cogn Comput 14:1068–1079. https://doi.org/10.1007/s12559-022-10003-9

    Article  Google Scholar 

  21. Duan Y, Ma LL, Han X et al (2020) External knowledge-based weakly supervised learning approach on Chinese clinical named entity recognition [J]. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds) Semantic Technology. JIST 2019. Lecture Notes in Computer Science, vol 12032. Springer, Cham. https://doi.org/10.1007/978-3-030-41407-8_22

  22. Devlin J, Chang M W, Lee K et al (2018) BERT: pre-training of deep bidirectional transformers for language understanding [J]. arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805

  23. Zhang N, Jia Q, Yin K et al (2020) Conceptualized representation learning for Chinese biomedical text mining[J]. arXiv:2008.10813. https://doi.org/10.48550/arXiv.2008.10813

  24. Kipf T N, Welling M (2016) Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907. https://doi.org/10.48550/arXiv.1609.02907

  25. Zhang Y, Wang X, Hou Z, Li J (2018) Clinical named entity recognition from Chinese electronic health records via machine learning methods [J]. JMIR Med Inform 6(4):e50. https://doi.org/10.2196/medinform.9965

    Article  PubMed  PubMed Central  Google Scholar 

  26. Bianca Z and Charles E (2001) Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers[C]//In Proceedings of the Eighteenth International Conference on Machine Learning (ICML '01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 609–616

  27. Du J, Grave E , Gunel B et al (2020) Self-training improves pre-training for natural language understanding [J]. arXiv:2010.02194. https://doi.org/10.48550/arXiv.2010.02194

  28. Rui Q, Xiaoran Y, Wenkang H (2019) Medical named entity recognition based on BERT and model fusion [C]. Evaluation Paper of 2019 National Knowledge Graph and Semantic Computing Conference, CCKS 2019

  29. Minglu L, Xuesi Z, Zheng C et al (2019) Team MSIIP at CCKS 2019 Task 1[C]. Evaluation Paper of 2019 National Knowledge Graph and Semantic Computing Conference, CCKS 2019

  30. Li N, Luo L, Ding Z et al (2019) DUTIR at the CCKS-2019 Task 1: improving Chinese clinical named entity recognition using stroke ELMo and transfer learning [C]. Evaluation Paper of 2019 National Knowledge Graph and Semantic Computing Conference, CCKS 2019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Ying.

Ethics declarations

Ethics declarations

All electronic medical record data used in this study have undergone de-identification procedures to protect sensitive information in compliance with the ethical guidelines and principles of the relevant institutions involved in the collection and use of the electronic medical record data.

This research solely focuses on the study of electronic medical record text, and not on the diseases themselves. Therefore, ethical approval is not required.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M., Gao, C., Zhang, K. et al. A weakly supervised method for named entity recognition of Chinese electronic medical records. Med Biol Eng Comput 61, 2733–2743 (2023). https://doi.org/10.1007/s11517-023-02871-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-023-02871-6

Keywords

Navigation