Skip to main content

Combining Contrastive Learning and Knowledge Graph Embeddings to Develop Medical Word Embeddings for the Italian Language

  • Conference paper
  • First Online:
AIxIA 2023 – Advances in Artificial Intelligence (AIxIA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14318))

  • 787 Accesses

Abstract

Word embeddings play a significant role in today’s Natural Language Processing tasks and applications. However, there is a significant gap in the availability of high quality-word embeddings specific to the Italian medical domain. This study aims to address this gap by proposing a tailored solution that combines Contrastive Learning (CL) methods and Knowledge Graph Embedding (KGE), introducing a new variant of the loss function. Given the limited availability of medical texts and controlled vocabularies in the Italian language, traditional approaches for word embedding generation may not yield adequate results. To overcome this challenge, our approach leverages the synergistic benefits of CL and KGE techniques. We achieve a significant performance boost compared to the initial model, while using a considerably smaller amount of data. This work establishes a solid foundation for further investigations aimed at improving the accuracy and coverage of word embeddings in low-resource languages and specialized domains.

D. A. Bondarenko and R. Ferrod—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    UMLS is a collection of controlled vocabularies which comprises a comprehensive thesaurus and ontology of the biomedical sciences; it is available at https://www.nlm.nih.gov/research/umls.

  2. 2.

    Body Part, Organ, or Organ Component (BP), Body Substance (BS), Chemical (C), Medical Device (MD), Finding (F), Sign or Symptom (SS), Health Care Activity (HCA), Diagnostic Procedure (DP), Laboratory Procedure (LP), Therapeutic or Preventive Procedure (TPP), Pathologic Function (PF), Physiologic Function (PhF), and Injury or Poisoning (IP).

  3. 3.

    cambridgeltl/SapBERT-UMLS-2020AB-all-lang-from-XLMR-large.

  4. 4.

    GanjinZero/coder_all.

References

  1. Beltagy, I., Lo, K., Cohan, A.: Scibert: a pretrained language model for scientific text. In: EMNLP (2019)

    Google Scholar 

  2. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc. (2013). https://proceedings.neurips.cc/paper_files/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf

  3. Choi, Y., Chiu, C.Y.I., Sontag, D.A.: Learning low-dimensional representations of medical concepts. AMIA Summits Transl. Sci. Proc. 2016, 41–50 (2016)

    Google Scholar 

  4. Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3(1) (2021). https://doi.org/10.1145/3458754

  5. Huang, K., Altosaar, J., Ranganath, R.: Clinicalbert: modeling clinical notes and predicting hospital readmission. ArXiv abs/1904.05342 (2019)

    Google Scholar 

  6. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)

    Article  Google Scholar 

  7. Kazemi, S.M., Poole, D.: Simple embedding for link prediction in knowledge graphs. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 4289–4300. NIPS’18, Curran Associates Inc., Red Hook, NY, USA (2018)

    Google Scholar 

  8. Lee, J., et al.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020)

    Article  Google Scholar 

  9. Liu, F., Shareghi, E., Meng, Z., Basaldella, M., Collier, N.: Self-alignment pretraining for biomedical entity representations. In: NAACL (2021)

    Google Scholar 

  10. Liu, F., Vulić, I., Korhonen, A., Collier, N.: Learning domain-specialised representations for cross-lingual biomedical entity linking. In: Proceedings of ACL-IJCNLP 2021, August 2021

    Google Scholar 

  11. Liu, H., Cheng, J., Wang, W., Su, Y.: The general pair-based weighting loss for deep metric learning. arXiv preprint arXiv:1905.12837 (2019)

  12. Magnini, B., Altuna, B., Lavelli, A., Speranza, M., Zanoli, R.: The e3c project: European clinical case corpus. In: SEPLN (2021)

    Google Scholar 

  13. Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., Melton, G.: Semantic similarity and relatedness between clinical terms: an experimental study. In: AMIA ... Annual Symposium Proceedings/AMIA Symposium. AMIA Symposium 2010, pp. 572–576, November 2010

    Google Scholar 

  14. Pakhomov, S.V.S., Pedersen, T., McInnes, B.T., Melton, G.B., Ruggieri, A.P., Chute, C.G.: Towards a framework for developing semantic relatedness reference standards. J. Biomed. Inform. 44(2), 251–65 (2011)

    Article  Google Scholar 

  15. Polignano, M., Basile, P., Degemmis, M., Semeraro, G., Basile, V.: Alberto: Italian BERT language understanding model for NLP challenging tasks based on tweets. In: CLiC-it (2019)

    Google Scholar 

  16. Ronzani, M., et al.: Unstructured data in predictive process monitoring: lexicographic and semantic mapping to ICD-9-CM codes for the home hospitalization service. In: Bandini, S., Gasparini, F., Mascardi, V., Palmonari, M., Vizzari, G. (eds.) AIxIA 2021 – Advances in Artificial Intelligence. AIxIA 2021. LNCS, vol. 13196, pp. 700–715. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08421-8_48

  17. Sun, Z., Deng, Z., Nie, J., Tang, J.: Rotate: Knowledge graph embedding by relational rotation in complex space. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019). https://openreview.net/forum?id=HkgEQnRqYQ

  18. Trouillon, T., Welbl, J., Riedel, S., Gaussier, E., Bouchard, G.: Complex embeddings for simple link prediction. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 2071–2080. PMLR, New York, New York, USA, 20–22 June 2016. https://proceedings.mlr.press/v48/trouillon16.html

  19. Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5017–5025 (2019)

    Google Scholar 

  20. Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). https://arxiv.org/abs/1412.6575

  21. Yuan, Z., Zhao, Z., Yu, S.: Coder: knowledge infused cross-lingual medical term embedding for term normalization. J. Biomed. Inform. 103983 (2022)

    Google Scholar 

  22. Zeng, S., Yuan, Z., Yu, S.: Automatic biomedical term clustering by learning fine-grained term representations. In: BIONLP (2022)

    Google Scholar 

  23. Zhang, R., Ji, Y., Zhang, Y., Passonneau, R.J.: Contrastive data and learning for natural language processing. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts, pp. 39–47. Association for Computational Linguistics, Seattle, United States, July 2022. https://doi.org/10.18653/v1/2022.naacl-tutorials.6, https://aclanthology.org/2022.naacl-tutorials.6

  24. Zhang, S., et al.: Knowledge-rich self-supervised entity linking. ArXiv abs/2112.07887 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roger Ferrod .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bondarenko, D.A., Ferrod, R., Caro, L.D. (2023). Combining Contrastive Learning and Knowledge Graph Embeddings to Develop Medical Word Embeddings for the Italian Language. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham. https://doi.org/10.1007/978-3-031-47546-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47546-7_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47545-0

  • Online ISBN: 978-3-031-47546-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics