Abstract
The field of Chinese medical natural language processing faces a significant challenge in training accurate entity recognition models due to the limited availability of high-quality labeled data. In response, we propose a joint training model, MCBERT-GCN-CRF, which achieves high performance in identifying medical-related entities in Chinese electronic medical records. Additionally, we introduce CM-NER, a 5-step framework that effectively mitigates the effects of noise in weakly labeled data and establishes a principled connection between supervised and weakly supervised named entity recognition. We demonstrate significant improvements in recall rate and accuracy. Our approach outperforms traditional fully supervised pre-training models and other state-of-the-art methods by suppressing noise in weakly labeled data. Our proposed framework achieves an F1 score of 86.29% on the CCKS-2019 dataset, significantly higher than pre-trained model baselines ranging from 74.17 to 83.06%, and higher than the top-performing named entity recognition supervised learning models in the CCKS-2019 competition. Our results demonstrate the effectiveness of our proposed framework and highlight the potential of leveraging unlabeled data to train accurate models for named entity recognition in Chinese medical natural language processing. This research has significant implications for advancing natural language processing techniques in the medical domain and improving patient care.
Graphical Abstract
Similar content being viewed by others
References
Chuanhai D, Jiajun Z, Chengqing Z et al (2016) Character based LSTM-CRF with radical-level features for Chinese named entity recognition[C]//Natural Language Understanding and Intelligent Applications - 5th Conference on Natural Language Processing and Chinese Computing( NLPCC). Kunming: Springer Press, 239-250
Heurix J, Fenz S, Rella A et al (2016) Recognition and pseudonymisation of medical records for secondary use. Med Biol Eng Comput 54:371–383. https://doi.org/10.1007/s11517-015-1322-7
Feng Y, Sun L, Zhang J (2005) Early results for Chinese named entity recognition using conditional random fields model, HMM and maximum entropy [C]//International Conference on Natural Language Processing and Knowledge Engineering.Wuhan,China: IEEE Press, 549–552
Cen X, Yuan J, Pan C et al (2021) Contextual embedding bootstrapped neural network for medical information extraction of coronary artery disease records. Med Biol Eng Comput 59:1111–1121. https://doi.org/10.1007/s11517-021-02359-1
Cocos A, Fiks AG, Masino AJ (2017) Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts[J]. J Am Med Inform Assoc: JAMIA 24(4):813–821. https://doi.org/10.1093/jamia/ocw180
Cho K,Van Merrienboer B,Gulcehre C et al (2014) Learning phrase representations using RNN encoder- decoder for statistical machine translation[J]. arXiv:1406.1078. https://doi.org/10.48550/arXiv.1406.1078
Hochreiter S, Schmidhuber J (1997) Long short-term memory[J]. Neural Computation 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Kim J, Woodland P C (2000) A rule-based named entity recognition system for speech input [C]//Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China, 528–531
Asahara M, Matsumoto Y (2003) Japanese named entity extraction with redundant morphological analysis[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1:8-15
Chen L, Yue Y, Haoming J et al (2020) BOND: bert-assisted open-domain named entity recognition with distant supervision. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23–27, 1054–1064
Lample G, Ballesteros M, Subramanian S et al (2016) Neural architectures for named entity recognition[J]. arXiv:1603.01360. https://doi.org/10.48550/arXiv.1603.01360
Xia L, Qinghua W, Hu L et al (2021) (2021) Overview of CCKS 2020 Task 3: named entity recognition and event extraction in Chinese electronic medical records. Data Intell 3(3):376–388. https://doi.org/10.1162/dint_a_00093
Chen P, Zhang M, Yu X et al (2022) Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT. BMC Med Inform Decis Mak 22:315. https://doi.org/10.1186/s12911-022-02059-2
Wu Y, Jiang M, Lei J et al (2015) Named entity recognition in Chinese clinical text using deep neural network[J]. Stud Health Technol Inform 216:624
Xiaonan L, Yan H, Xipeng Q, Xuanjing H (2020) FLAT: Chinese NER using flat-lattice transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 6836–6842
Yangtian Y, Xinyu Z, Xian W (2020) Medical named entity recognition based on BERT and glyphs[C]//Proceedings of the Evaluation Task at the China Conference on Knowledge Graph and Cognitive Intelligence. Nanchang: CCKS
Gong L, Zhang Z, Chen S (2020) Clinical named entity recognition from Chinese electronic medical records based on deep learning pretraining [J]. J Healthc Eng 8829219. https://doi.org/10.1155/2020/8829219
Fangcong Z, Qiuli Q, Yong J, Runtao Z (2022) Named entity recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. Data Anal Knowl Discov 6(2/3):251–262. https://doi.org/10.11925/infotech.2096-3467.2021.0910
Shenqi J, Youlin Z (2021) Recognizing clinical named entity from Chinese electronic medical record texts based on semi-supervised deep learning[J]. J Inf Resour Manag 11(6):105–115. https://doi.org/10.13365/j.jirm.2021.06.105
Ma LL, Yang J, An B et al (2022) Medical named entity recognition using weakly supervised learning. Cogn Comput 14:1068–1079. https://doi.org/10.1007/s12559-022-10003-9
Duan Y, Ma LL, Han X et al (2020) External knowledge-based weakly supervised learning approach on Chinese clinical named entity recognition [J]. In: Wang, X., Lisi, F., Xiao, G., Botoeva, E. (eds) Semantic Technology. JIST 2019. Lecture Notes in Computer Science, vol 12032. Springer, Cham. https://doi.org/10.1007/978-3-030-41407-8_22
Devlin J, Chang M W, Lee K et al (2018) BERT: pre-training of deep bidirectional transformers for language understanding [J]. arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
Zhang N, Jia Q, Yin K et al (2020) Conceptualized representation learning for Chinese biomedical text mining[J]. arXiv:2008.10813. https://doi.org/10.48550/arXiv.2008.10813
Kipf T N, Welling M (2016) Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907. https://doi.org/10.48550/arXiv.1609.02907
Zhang Y, Wang X, Hou Z, Li J (2018) Clinical named entity recognition from Chinese electronic health records via machine learning methods [J]. JMIR Med Inform 6(4):e50. https://doi.org/10.2196/medinform.9965
Bianca Z and Charles E (2001) Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers[C]//In Proceedings of the Eighteenth International Conference on Machine Learning (ICML '01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 609–616
Du J, Grave E , Gunel B et al (2020) Self-training improves pre-training for natural language understanding [J]. arXiv:2010.02194. https://doi.org/10.48550/arXiv.2010.02194
Rui Q, Xiaoran Y, Wenkang H (2019) Medical named entity recognition based on BERT and model fusion [C]. Evaluation Paper of 2019 National Knowledge Graph and Semantic Computing Conference, CCKS 2019
Minglu L, Xuesi Z, Zheng C et al (2019) Team MSIIP at CCKS 2019 Task 1[C]. Evaluation Paper of 2019 National Knowledge Graph and Semantic Computing Conference, CCKS 2019
Li N, Luo L, Ding Z et al (2019) DUTIR at the CCKS-2019 Task 1: improving Chinese clinical named entity recognition using stroke ELMo and transfer learning [C]. Evaluation Paper of 2019 National Knowledge Graph and Semantic Computing Conference, CCKS 2019
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics declarations
All electronic medical record data used in this study have undergone de-identification procedures to protect sensitive information in compliance with the ethical guidelines and principles of the relevant institutions involved in the collection and use of the electronic medical record data.
This research solely focuses on the study of electronic medical record text, and not on the diseases themselves. Therefore, ethical approval is not required.
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, M., Gao, C., Zhang, K. et al. A weakly supervised method for named entity recognition of Chinese electronic medical records. Med Biol Eng Comput 61, 2733–2743 (2023). https://doi.org/10.1007/s11517-023-02871-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-023-02871-6