A weakly supervised method for named entity recognition of Chinese electronic medical records

Li, Meng; Gao, Chunrong; Zhang, Kuang; Zhou, Huajian; Ying, Jing

doi:10.1007/s11517-023-02871-6

A weakly supervised method for named entity recognition of Chinese electronic medical records

Original Article
Published: 15 July 2023

Volume 61, pages 2733–2743, (2023)
Cite this article

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Meng Li¹,
Chunrong Gao²,
Kuang Zhang²,
Huajian Zhou² &
…
Jing Ying¹

270 Accesses
Explore all metrics

Abstract

The field of Chinese medical natural language processing faces a significant challenge in training accurate entity recognition models due to the limited availability of high-quality labeled data. In response, we propose a joint training model, MCBERT-GCN-CRF, which achieves high performance in identifying medical-related entities in Chinese electronic medical records. Additionally, we introduce CM-NER, a 5-step framework that effectively mitigates the effects of noise in weakly labeled data and establishes a principled connection between supervised and weakly supervised named entity recognition. We demonstrate significant improvements in recall rate and accuracy. Our approach outperforms traditional fully supervised pre-training models and other state-of-the-art methods by suppressing noise in weakly labeled data. Our proposed framework achieves an F1 score of 86.29% on the CCKS-2019 dataset, significantly higher than pre-trained model baselines ranging from 74.17 to 83.06%, and higher than the top-performing named entity recognition supervised learning models in the CCKS-2019 competition. Our results demonstrate the effectiveness of our proposed framework and highlight the potential of leveraging unlabeled data to train accurate models for named entity recognition in Chinese medical natural language processing. This research has significant implications for advancing natural language processing techniques in the medical domain and improving patient care.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overview of CCKS 2018 Task 1: Named Entity Recognition in Chinese Electronic Medical Records

Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition

Chinese medical entity recognition based on the dual-branch TENER model

Article Open access 24 July 2023

References

Chuanhai D, Jiajun Z, Chengqing Z et al (2016) Character based LSTM-CRF with radical-level features for Chinese named entity recognition[C]//Natural Language Understanding and Intelligent Applications - 5th Conference on Natural Language Processing and Chinese Computing( NLPCC). Kunming: Springer Press, 239-250
Heurix J, Fenz S, Rella A et al (2016) Recognition and pseudonymisation of medical records for secondary use. Med Biol Eng Comput 54:371–383. https://doi.org/10.1007/s11517-015-1322-7
Article PubMed Google Scholar
Feng Y, Sun L, Zhang J (2005) Early results for Chinese named entity recognition using conditional random fields model, HMM and maximum entropy [C]//International Conference on Natural Language Processing and Knowledge Engineering.Wuhan,China: IEEE Press, 549–552
Cen X, Yuan J, Pan C et al (2021) Contextual embedding bootstrapped neural network for medical information extraction of coronary artery disease records. Med Biol Eng Comput 59:1111–1121. https://doi.org/10.1007/s11517-021-02359-1
Article PubMed Google Scholar
Cocos A, Fiks AG, Masino AJ (2017) Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts[J]. J Am Med Inform Assoc: JAMIA 24(4):813–821. https://doi.org/10.1093/jamia/ocw180
Article PubMed PubMed Central Google Scholar
Cho K,Van Merrienboer B,Gulcehre C et al (2014) Learning phrase representations using RNN encoder- decoder for statistical machine translation[J]. arXiv:1406.1078. https://doi.org/10.48550/arXiv.1406.1078
Hochreiter S, Schmidhuber J (1997) Long short-term memory[J]. Neural Computation 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article CAS PubMed Google Scholar
Kim J, Woodland P C (2000) A rule-based named entity recognition system for speech input [C]//Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China, 528–531
Asahara M, Matsumoto Y (2003) Japanese named entity extraction with redundant morphological analysis[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1:8-15
Chen L, Yue Y, Haoming J et al (2020) BOND: bert-assisted open-domain named entity recognition with distant supervision. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23–27, 1054–1064
Lample G, Ballesteros M, Subramanian S et al (2016) Neural architectures for named entity recognition[J]. arXiv:1603.01360. https://doi.org/10.48550/arXiv.1603.01360
Xia L, Qinghua W, Hu L et al (2021) (2021) Overview of CCKS 2020 Task 3: named entity recognition and event extraction in Chinese electronic medical records. Data Intell 3(3):376–388. https://doi.org/10.1162/dint_a_00093
Article Google Scholar
Chen P, Zhang M, Yu X et al (2022) Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT. BMC Med Inform Decis Mak 22:315. https://doi.org/10.1186/s12911-022-02059-2
Article PubMed PubMed Central Google Scholar
Wu Y, Jiang M, Lei J et al (2015) Named entity recognition in Chinese clinical text using deep neural network[J]. Stud Health Technol Inform 216:624
PubMed PubMed Central Google Scholar
Xiaonan L, Yan H, Xipeng Q, Xuanjing H (2020) FLAT: Chinese NER using flat-lattice transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 6836–6842
Yangtian Y, Xinyu Z, Xian W (2020) Medical named entity recognition based on BERT and glyphs[C]//Proceedings of the Evaluation Task at the China Conference on Knowledge Graph and Cognitive Intelligence. Nanchang: CCKS
Gong L, Zhang Z, Chen S (2020) Clinical named entity recognition from Chinese electronic medical records based on deep learning pretraining [J]. J Healthc Eng 8829219. https://doi.org/10.1155/2020/8829219
Fangcong Z, Qiuli Q, Yong J, Runtao Z (2022) Named entity recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. Data Anal Knowl Discov 6(2/3):251–262. https://doi.org/10.11925/infotech.2096-3467.2021.0910
Article Google Scholar
Shenqi J, Youlin Z (2021) Recognizing clinical named entity from Chinese electronic medical record texts based on semi-supervised deep learning[J]. J Inf Resour Manag 11(6):105–115. https://doi.org/10.13365/j.jirm.2021.06.105
Article Google Scholar
Ma LL, Yang J, An B et al (2022) Medical named entity recognition using weakly supervised learning. Cogn Comput 14:1068–1079. https://doi.org/10.1007/s12559-022-10003-9
Article Google Scholar
Duan Y, Ma LL, Han X et al (2020) External knowledge-based weakly supervised learning approach on Chinese clinical named entity recognition [J]. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds) Semantic Technology. JIST 2019. Lecture Notes in Computer Science, vol 12032. Springer, Cham. https://doi.org/10.1007/978-3-030-41407-8_22
Devlin J, Chang M W, Lee K et al (2018) BERT: pre-training of deep bidirectional transformers for language understanding [J]. arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
Zhang N, Jia Q, Yin K et al (2020) Conceptualized representation learning for Chinese biomedical text mining[J]. arXiv:2008.10813. https://doi.org/10.48550/arXiv.2008.10813
Kipf T N, Welling M (2016) Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907. https://doi.org/10.48550/arXiv.1609.02907
Zhang Y, Wang X, Hou Z, Li J (2018) Clinical named entity recognition from Chinese electronic health records via machine learning methods [J]. JMIR Med Inform 6(4):e50. https://doi.org/10.2196/medinform.9965
Article PubMed PubMed Central Google Scholar
Bianca Z and Charles E (2001) Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers[C]//In Proceedings of the Eighteenth International Conference on Machine Learning (ICML '01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 609–616
Du J, Grave E , Gunel B et al (2020) Self-training improves pre-training for natural language understanding [J]. arXiv:2010.02194. https://doi.org/10.48550/arXiv.2010.02194
Rui Q, Xiaoran Y, Wenkang H (2019) Medical named entity recognition based on BERT and model fusion [C]. Evaluation Paper of 2019 National Knowledge Graph and Semantic Computing Conference, CCKS 2019
Minglu L, Xuesi Z, Zheng C et al (2019) Team MSIIP at CCKS 2019 Task 1[C]. Evaluation Paper of 2019 National Knowledge Graph and Semantic Computing Conference, CCKS 2019
Li N, Luo L, Ding Z et al (2019) DUTIR at the CCKS-2019 Task 1: improving Chinese clinical named entity recognition using stroke ELMo and transfer learning [C]. Evaluation Paper of 2019 National Knowledge Graph and Semantic Computing Conference, CCKS 2019

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Zhejiang Province, Hangzhou, 310027, China
Meng Li & Jing Ying
Hangzhou B-Soft Co Ltd, Zhejiang Province, Hangzhou, 310052, China
Chunrong Gao, Kuang Zhang & Huajian Zhou

Authors

Meng Li
View author publications
You can also search for this author in PubMed Google Scholar
Chunrong Gao
View author publications
You can also search for this author in PubMed Google Scholar
Kuang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huajian Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jing Ying
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Ying.

Ethics declarations

All electronic medical record data used in this study have undergone de-identification procedures to protect sensitive information in compliance with the ethical guidelines and principles of the relevant institutions involved in the collection and use of the electronic medical record data.

This research solely focuses on the study of electronic medical record text, and not on the diseases themselves. Therefore, ethical approval is not required.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, M., Gao, C., Zhang, K. et al. A weakly supervised method for named entity recognition of Chinese electronic medical records. Med Biol Eng Comput 61, 2733–2743 (2023). https://doi.org/10.1007/s11517-023-02871-6

Download citation

Received: 24 March 2023
Accepted: 16 June 2023
Published: 15 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11517-023-02871-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A weakly supervised method for named entity recognition of Chinese electronic medical records