Skip to main content

Corpus Construction of Critical Illness Entities and Relationships

  • Conference paper
  • First Online:
Chinese Lexical Semantics (CLSW 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14515))

Included in the following conference series:

  • 331 Accesses

Abstract

Entity and relational corpus construction is a key part of information extraction and knowledge graph construction. Based on the existing norms of medical entity relationship at home and abroad, we established a disease-centered entity and relationship classification schema according to the characteristics of examination and treatment of cancer-related diseases under the guidance of medical experts. Combined with dictionary, rules, T-Roberta-BiLSTM-CRF entity recognition model and RoBERTa-GSI-PM relation extraction model, we annotate medical texts from multiple sources through multiple rounds of iteration. A Critical Illness entities and relationships Corpus (CIC) was constructed, guided by professional doctors throughout the process, and regular spot checks and consistency checks were taken to ensure the quality of the corpus. Finally, a corpus containing 64,735 entities and 47,222 triplets was constructed, and the consistency of entity and relation annotation reached 0.84 and 0.93, respectively. This corpus provides data basis for medical text information extraction of critical diseases and further research on a series of medical knowledge graph applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wikipedia. Knowledge graph. https://en.wikipedia.org/wiki/Knowledge_Graph. Accessed 23 Nov 2023

  2. Wild, C.P., Weiderpass, E., Stewart, B.W.: World Cancer Report: Cancer Research for Cancer Prevention. International Agency for Research on Cancer, Lyon (2020)

    Google Scholar 

  3. Medical Encyclopedia Website. https://www.wiki8.com/. Accessed 23 Nov 2023

  4. 39 Disease Encyclopedia Website. https://jbk.39.net/. Accessed 23 Nov 2023

  5. Uzuner, Ö., South, B.R., Shen, S., et al.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18(5), 552–556 (2011)

    Article  Google Scholar 

  6. Morita, M., Kano, Y., Ohkuma, T., et al.: Overview of the NTCIR-10 MedNLP Task. NTCIR (2013)

    Google Scholar 

  7. Yang, J.F., et al.: Corpus construction for named entities and entity relations on Chinese electronic medical records. Ruan Jian Xue Bao/J. Softw. 27(11), 2725–2746 (2016)

    Google Scholar 

  8. Ye, Y., Hu, B., Zhang, K., et al.: Construction of corpus for entity and relation annotation of diabetes electronic medical records. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 622–632 (2021)

    Google Scholar 

  9. Chang, H., Zan, H., Ma, Y., et al.: Corpus construction for named-entity and entity relations for electronic medical records of stroke disease. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 633–642 (2021)

    Google Scholar 

  10. Zan, H., Dou, H., Jia, Y., et al.: Construction of chinese medical knowledge graph bsed on multi-source Corpus. J. Zhengzhou Univ. (Nat. Sci. Edn.) 52(2), 45–51 (2020)

    Google Scholar 

  11. Zan, H., Liu, T., Niu, C., Zhao, Y., Zhang, K., Sui, Z.: Construction and application of named entity and entity relations corpus for pediatric diseases. J. Chin. Inf. Process. 34(5), 19–26 (2020)

    Google Scholar 

  12. Zhang, K., Hu, C., Song, Y., Zan, H., Zhao, Y., Chu, W.: Construction of chinese obstetrics knowledge graph based on the multiple sources data. In: Dong, M., Gu, Y., Hong, J.-F. (eds.) Chinese Lexical Semantics: 22nd Workshop, CLSW 2021, Nanjing, 15–16 May 2021, Revised Selected Papers, Part II, pp. 399–410. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06547-7_31

  13. Zan, H., Han, Y., Fan, Y., et al.: Construction and analysis of symptom knowledge base in Chinese. J. Chin. Inf. Process. 34(4), 30–37 (2020)

    Google Scholar 

  14. Yue, D., Zhang, K., Zhuang, L., Zhao, X., Byambasuren, O., Zan, H.: Annotation scheme and specification for named entities and relations on Chinese medical knowledge graph. In: Hong, J.-F., Zhang, Y., Liu, P. (eds.) Chinese Lexical Semantics: 20th Workshop, CLSW 2019, Beijing, China, June 28–30, 2019, Revised Selected Papers, pp. 563–574. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38189-9_58

  15. Xia, F., Yetisgen-Yildiz, M.: Clinical corpus annotation: challenges and strategies. In: Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) in Conjunction with the International Conference on Language Resources and Evaluation (LREC), Istanbul, pp. 21–27 (2012)

    Google Scholar 

  16. Zhang, K., Zhao, X., Guan, T., et al.: A Platform for entity and entity relationship labeling in medical texts. J. Chin. Inf. Process. 34(6), 117–125 (2020)

    Google Scholar 

  17. Ge Junbo, X., Yongjian, W.C.: Internal Medicine, 9th edn. People’s Medical Publishing House, Beijing (2018)

    Google Scholar 

  18. Xiaoping, C., Jianping, W., Jizong, Z.: Surgery, 9th edn. People’s Medical Publishing House, Beijing (2018)

    Google Scholar 

  19. Hong, B., Yilei, L.: Pathology, 9th edn. People’s Medical Publishing House, Beijing (2018)

    Google Scholar 

  20. Ke, X., Qiyong, G., Ping, H.: Medical Imaging, 9th edn. People’s Medical Publishing House, Beijing (2019)

    Google Scholar 

  21. Benyao, L.: Breast Cancer Breast-Preserving Therapy. Tsinghua University Publishing House, Beijing (2004)

    Google Scholar 

  22. Zhang, K., Zhang, C., Ye, Y., et al.: Named entity recognition in electronic medical records based on transfer learning. In: Proc. Int. Conf. Intell. Med. Health 2022, 91–98 (2022)

    Google Scholar 

  23. Song, Y., Zhang, W., Ye, Y., et al.: Knowledge-enhanced relation extraction in Chinese EMRs. In: 2022 5th International Conference on Machine Learning and Natural Language Processing, pp. 196–201 (2022)

    Google Scholar 

  24. Jean, C.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)

    Google Scholar 

  25. Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)

    Article  Google Scholar 

  26. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

We thank the anonymous reviewers for their constructive comments, and gratefully acknowledge the support of Major Science and Technology Project of Yunnan Province (202102AA100021), and Henan Province Science and Technology Department Science and Technology Tackling Project(232102211033).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kunli Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, K., Zhang, C., Zhang, W., Zan, H. (2024). Corpus Construction of Critical Illness Entities and Relationships. In: Dong, M., Hong, JF., Lin, J., Jin, P. (eds) Chinese Lexical Semantics. CLSW 2023. Lecture Notes in Computer Science(), vol 14515. Springer, Singapore. https://doi.org/10.1007/978-981-97-0586-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0586-3_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0585-6

  • Online ISBN: 978-981-97-0586-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics