Abstract
Entity and relational corpus construction is a key part of information extraction and knowledge graph construction. Based on the existing norms of medical entity relationship at home and abroad, we established a disease-centered entity and relationship classification schema according to the characteristics of examination and treatment of cancer-related diseases under the guidance of medical experts. Combined with dictionary, rules, T-Roberta-BiLSTM-CRF entity recognition model and RoBERTa-GSI-PM relation extraction model, we annotate medical texts from multiple sources through multiple rounds of iteration. A Critical Illness entities and relationships Corpus (CIC) was constructed, guided by professional doctors throughout the process, and regular spot checks and consistency checks were taken to ensure the quality of the corpus. Finally, a corpus containing 64,735 entities and 47,222 triplets was constructed, and the consistency of entity and relation annotation reached 0.84 and 0.93, respectively. This corpus provides data basis for medical text information extraction of critical diseases and further research on a series of medical knowledge graph applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wikipedia. Knowledge graph. https://en.wikipedia.org/wiki/Knowledge_Graph. Accessed 23 Nov 2023
Wild, C.P., Weiderpass, E., Stewart, B.W.: World Cancer Report: Cancer Research for Cancer Prevention. International Agency for Research on Cancer, Lyon (2020)
Medical Encyclopedia Website. https://www.wiki8.com/. Accessed 23 Nov 2023
39 Disease Encyclopedia Website. https://jbk.39.net/. Accessed 23 Nov 2023
Uzuner, Ö., South, B.R., Shen, S., et al.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18(5), 552–556 (2011)
Morita, M., Kano, Y., Ohkuma, T., et al.: Overview of the NTCIR-10 MedNLP Task. NTCIR (2013)
Yang, J.F., et al.: Corpus construction for named entities and entity relations on Chinese electronic medical records. Ruan Jian Xue Bao/J. Softw. 27(11), 2725–2746 (2016)
Ye, Y., Hu, B., Zhang, K., et al.: Construction of corpus for entity and relation annotation of diabetes electronic medical records. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 622–632 (2021)
Chang, H., Zan, H., Ma, Y., et al.: Corpus construction for named-entity and entity relations for electronic medical records of stroke disease. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 633–642 (2021)
Zan, H., Dou, H., Jia, Y., et al.: Construction of chinese medical knowledge graph bsed on multi-source Corpus. J. Zhengzhou Univ. (Nat. Sci. Edn.) 52(2), 45–51 (2020)
Zan, H., Liu, T., Niu, C., Zhao, Y., Zhang, K., Sui, Z.: Construction and application of named entity and entity relations corpus for pediatric diseases. J. Chin. Inf. Process. 34(5), 19–26 (2020)
Zhang, K., Hu, C., Song, Y., Zan, H., Zhao, Y., Chu, W.: Construction of chinese obstetrics knowledge graph based on the multiple sources data. In: Dong, M., Gu, Y., Hong, J.-F. (eds.) Chinese Lexical Semantics: 22nd Workshop, CLSW 2021, Nanjing, 15–16 May 2021, Revised Selected Papers, Part II, pp. 399–410. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06547-7_31
Zan, H., Han, Y., Fan, Y., et al.: Construction and analysis of symptom knowledge base in Chinese. J. Chin. Inf. Process. 34(4), 30–37 (2020)
Yue, D., Zhang, K., Zhuang, L., Zhao, X., Byambasuren, O., Zan, H.: Annotation scheme and specification for named entities and relations on Chinese medical knowledge graph. In: Hong, J.-F., Zhang, Y., Liu, P. (eds.) Chinese Lexical Semantics: 20th Workshop, CLSW 2019, Beijing, China, June 28–30, 2019, Revised Selected Papers, pp. 563–574. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38189-9_58
Xia, F., Yetisgen-Yildiz, M.: Clinical corpus annotation: challenges and strategies. In: Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) in Conjunction with the International Conference on Language Resources and Evaluation (LREC), Istanbul, pp. 21–27 (2012)
Zhang, K., Zhao, X., Guan, T., et al.: A Platform for entity and entity relationship labeling in medical texts. J. Chin. Inf. Process. 34(6), 117–125 (2020)
Ge Junbo, X., Yongjian, W.C.: Internal Medicine, 9th edn. People’s Medical Publishing House, Beijing (2018)
Xiaoping, C., Jianping, W., Jizong, Z.: Surgery, 9th edn. People’s Medical Publishing House, Beijing (2018)
Hong, B., Yilei, L.: Pathology, 9th edn. People’s Medical Publishing House, Beijing (2018)
Ke, X., Qiyong, G., Ping, H.: Medical Imaging, 9th edn. People’s Medical Publishing House, Beijing (2019)
Benyao, L.: Breast Cancer Breast-Preserving Therapy. Tsinghua University Publishing House, Beijing (2004)
Zhang, K., Zhang, C., Ye, Y., et al.: Named entity recognition in electronic medical records based on transfer learning. In: Proc. Int. Conf. Intell. Med. Health 2022, 91–98 (2022)
Song, Y., Zhang, W., Ye, Y., et al.: Knowledge-enhanced relation extraction in Chinese EMRs. In: 2022 5th International Conference on Machine Learning and Natural Language Processing, pp. 196–201 (2022)
Jean, C.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Acknowledgments
We thank the anonymous reviewers for their constructive comments, and gratefully acknowledge the support of Major Science and Technology Project of Yunnan Province (202102AA100021), and Henan Province Science and Technology Department Science and Technology Tackling Project(232102211033).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, K., Zhang, C., Zhang, W., Zan, H. (2024). Corpus Construction of Critical Illness Entities and Relationships. In: Dong, M., Hong, JF., Lin, J., Jin, P. (eds) Chinese Lexical Semantics. CLSW 2023. Lecture Notes in Computer Science(), vol 14515. Springer, Singapore. https://doi.org/10.1007/978-981-97-0586-3_6
Download citation
DOI: https://doi.org/10.1007/978-981-97-0586-3_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0585-6
Online ISBN: 978-981-97-0586-3
eBook Packages: Computer ScienceComputer Science (R0)