CODER: Knowledge-infused cross-lingual medical term embedding for term normalization

https://doi.org/10.1016/j.jbi.2021.103983Get rights and content
Under an Elsevier user license
open archive

Highlights

  • Medical term embedding infused with relational medical knowledge.

  • State-of-the-art zero-shot medical term normalization performance.

  • Cross-lingual medical term embedding capability.

Abstract

Objective

This paper aims to propose knowledge-aware embedding, a critical tool for medical term normalization.

Methods

We develop CODER (Cross-lingual knowledge-infused medical term embedding) via contrastive learning based on a medical knowledge graph (KG) named the Unified Medical Language System, and similarities are calculated utilizing both terms and relation triplets from the KG. Training with relations injects medical knowledge into embeddings and can potentially improve their performance as machine learning features.

Results

We evaluate CODER based on zero-shot term normalization, semantic similarity, and relation classification benchmarks, and the results show that CODER outperforms various state-of-the-art biomedical word embeddings, concept embeddings, and contextual embeddings.

Conclusion

CODER embeddings excellently reflect semantic similarity and relatedness of medical concepts. One can use CODER for embedding-based medical term normalization or to provide features for machine learning. Similar to other pretrained language models, CODER can also be fine-tuned for specific tasks. Codes and models are available at https://github.com/GanjinZero/CODER.

Keywords

Medical term normalization
Cross-lingual
Medical term representation
Knowledge graph embedding
Contrastive learning

Cited by (0)