Abstract
We describe a system for automatic extraction of semantic relations between entities in a medical corpus of clinical cases. It builds upon a previously developed module for entity extraction and upon a morphosyntactic parser. It uses experimentally designed rules based on syntactic dependencies and trigger words, as well as on sequencing and nesting of entities of particular types. The results obtained on a small corpus are promising. Our larger perspective is transforming information extracted from medical texts into knowledge graphs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Extract from PubMed: https://pubmed.ncbi.nlm.nih.gov/35365471/.
- 2.
- 3.
Conditional Random Fields [15] are probabilistic models often used in NLP for sequence labelling tasks as they take into account the context of the samples to label.
- 4.
Entity types are listed in French and with the English translation, if it differs from French. In the rest of the paper, the English names are used in the core of the texts, while the French ones appear in figures. The entity types substance, anatomy and treatment will sometimes be abbreviated by sub, anat and treat, respectively.
- 5.
- 6.
- 7.
- 8.
Word \(w_1\) is a syntactic head of word \(w_2\) if there is a syntactic dependency link outgoing from \(w_1\) and incoming in \(w_2\). Most dependency parsing models ensure that each word (except the root of the sentence) has exactly one head, i.e. the dependency graph is a tree.
- 9.
An asterisk following an entity type signals a negated entity occurrence.
References
Abacha, A.B., Zweigenbaum, P.: Automatic extraction of semantic relations between medical entities: a rule based approach. J. Biomed. Semant. 2(Suppl 5), S4+ (2011)
Amavi, J., Halfeld Ferrari, M., Hiot, N.: Natural language querying system through entity enrichment. In: Bellatreche, L., et al. (eds.) TPDL/ADBIS -2020. CCIS, vol. 1260, pp. 36–48. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55814-7_3
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database-Issue), 267–270 (2004)
Campillos, L., Deléger, L., Grouin, C., Hamon, T., Ligozat, A.L., Névéol, A.: A French clinical corpus with comprehensive semantic annotations: development of the medical entity and relation LIMSI annotated text corpus (MERLOT). Lang. Resour. Eval. 52(2), 571–601 (2017)
Cardon, R., Grabar, N., Grouin, C., Hamon, T.: Présentation de la campagne d’évaluation DEFT 2020 : similarité textuelle en domaine ouvert et extraction d’information précise dans des cas cliniques. In: Cardon, R., Grabar, N., Grouin, C., Hamon, T. (eds.) 6e conférence conjointe Journées d’Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes, pp. 1–13. ATALA, Nancy, France (2020). https://hal.archives-ouvertes.fr/hal-02784737
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-2004), pp. 423–429 (2004)
Embarek, M., Ferret, O.: Learning patterns for building resources about semantic relations in the medical domain. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, 26 May–1 June 2008, Marrakech, Morocco (2008)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics, Edinburgh, Scotland, UK, July 2011. https://aclanthology.org/D11-1142
Francis, N., et al.: Cypher: an evolving query language for property graphs. In: Das, G., Jermaine, C.M., Bernstein, P.A. (eds.) Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, 10–15 June 2018, pp. 1433–1445. ACM (2018)
Franciscus, N., Ren, X., Stantic, B.: Dependency graph for short text extraction and summarization. J. Inf. Telecommun. 3(4), 413–429 (2019)
Fundel, K., Küffner, R., Zimmer, R.: RelEx-relation extraction using dependency parse trees. Bioinformatics 23, 365–371 (2007)
Grabar, N., Grouin, C., Hamon, T., Claveau, V.: Corpus annoté de cas cliniques en français. In: TALN 2019–26e Conference on Traitement Automatique des Langues Naturelles, pp. 1–14. Toulouse, France, July 2019. https://hal.archives-ouvertes.fr/hal-02391878
Grouin, C., Grabar, N., Illouz, G.: Classification de cas cliniques et évaluation automatique de réponses d’étudiants : présentation de la campagne DEFT 2021. In: Denis, P., et al. (eds.) Traitement Automatique des Langues Naturelles, pp. 1–13. ATALA, Lille, France (2021). https://hal.archives-ouvertes.fr/hal-03265926
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics (1992). https://aclanthology.org/C92-2082
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. ICML 2001, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001)
Li, Z., Yang, Z., Shen, C., Xu, J., Zhang, Y., Xu, H.: Integrating shortest dependency path and sentence sequence into a deep learning framework for relation extraction in clinical text. BMC Med. Inform. Decis. Mak. 19, 22 (2019)
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, A meeting of SIGDAT, a Special Interest Group of the ACL, held in conjunction with ACL 2004, 25–26 July 2004, Barcelona, Spain, pp. 404–411. ACL (2004)
Minard, A.L., Ligozat, A.L., Grau, B.: Multi-class SVM for relation extraction from clinical reports. In: Recent Advances in Natural Language Processing, RANLP 2011, 12–14 September, 2011, Hissar, Bulgaria, pp. 604–609 (2011)
Minard, A.L., Roques, A., Hiot, N., Halfeld Ferrari, M., Savary, A.: DOING@DEFT: cascade de CRF pour l’annotation d’entités cliniques imbriquées. In: Cardon, R., Grabar, N., Grouin, C., Hamon, T. (eds.) 6e conférence conjointe Journées d’Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes, pp. 66–78. ATALA, Nancy, France (2020). https://hal.archives-ouvertes.fr/hal-02784743
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011. Association for Computational Linguistics, Suntec, Singapore, August 2009. https://aclanthology.org/P09-1113
Ramadier, L., Lafourcade, M.: Patrons sémantiques pour l’extraction de relations entre termes - Application aux comptes rendus radiologiques. In: TALN: Traitement Automatique des Langues Naturelles. jep-taln2016, Paris, France, July 2016. https://hal.archives-ouvertes.fr/hal-01382323
Rindflesch, T.C., Bean, C.A., Sneiderman, C.A.: Argument identification for arterial branching predications asserted in cardiac catheterization reports. In: AMIA Annual Symposium Proceedings, pp. 704–708 (2000)
Uzuner, O., Mailoa, J., Ryan, R., Sibanda, T.: Semantic relations for problem-oriented medical records. Artif. Intell. Med. 50, 63–73 (2010)
Acknowledgements
Work partly supported by the ICVL federation and RTR-DIAMS. It is done in the context of DOING action of the GDR-MADICS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Savary, A., Silvanovich, A., Minard, AL., Hiot, N., Halfeld Ferrari, M. (2022). Relation Extraction from Clinical Cases for a Knowledge Graph. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-15743-1_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15742-4
Online ISBN: 978-3-031-15743-1
eBook Packages: Computer ScienceComputer Science (R0)