Combining Contrastive Learning and Knowledge Graph Embeddings to Develop Medical Word Embeddings for the Italian Language

Bondarenko, Denys Amore; Ferrod, Roger; Caro, Luigi Di

doi:10.1007/978-3-031-47546-7_28

Denys Amore Bondarenko¹¹,
Roger Ferrod¹¹ &
Luigi Di Caro¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14318))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

787 Accesses

Abstract

Word embeddings play a significant role in today’s Natural Language Processing tasks and applications. However, there is a significant gap in the availability of high quality-word embeddings specific to the Italian medical domain. This study aims to address this gap by proposing a tailored solution that combines Contrastive Learning (CL) methods and Knowledge Graph Embedding (KGE), introducing a new variant of the loss function. Given the limited availability of medical texts and controlled vocabularies in the Italian language, traditional approaches for word embedding generation may not yield adequate results. To overcome this challenge, our approach leverages the synergistic benefits of CL and KGE techniques. We achieve a significant performance boost compared to the initial model, while using a considerably smaller amount of data. This work establishes a solid foundation for further investigations aimed at improving the accuracy and coverage of word embeddings in low-resource languages and specialized domains.

D. A. Bondarenko and R. Ferrod—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases

Model Agnostic Knowledge Transfer Methods for Sentence Embedding Models

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Article 04 July 2020

Notes

1.
UMLS is a collection of controlled vocabularies which comprises a comprehensive thesaurus and ontology of the biomedical sciences; it is available at https://www.nlm.nih.gov/research/umls.
2.
Body Part, Organ, or Organ Component (BP), Body Substance (BS), Chemical (C), Medical Device (MD), Finding (F), Sign or Symptom (SS), Health Care Activity (HCA), Diagnostic Procedure (DP), Laboratory Procedure (LP), Therapeutic or Preventive Procedure (TPP), Pathologic Function (PF), Physiologic Function (PhF), and Injury or Poisoning (IP).
3.
cambridgeltl/SapBERT-UMLS-2020AB-all-lang-from-XLMR-large.
4.
GanjinZero/coder_all.

References

Beltagy, I., Lo, K., Cohan, A.: Scibert: a pretrained language model for scientific text. In: EMNLP (2019)
Google Scholar
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 26. Curran Associates, Inc. (2013). https://proceedings.neurips.cc/paper_files/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf
Choi, Y., Chiu, C.Y.I., Sontag, D.A.: Learning low-dimensional representations of medical concepts. AMIA Summits Transl. Sci. Proc. 2016, 41–50 (2016)
Google Scholar
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3(1) (2021). https://doi.org/10.1145/3458754
Huang, K., Altosaar, J., Ranganath, R.: Clinicalbert: modeling clinical notes and predicting hospital readmission. ArXiv abs/1904.05342 (2019)
Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)
Article Google Scholar
Kazemi, S.M., Poole, D.: Simple embedding for link prediction in knowledge graphs. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 4289–4300. NIPS’18, Curran Associates Inc., Red Hook, NY, USA (2018)
Google Scholar
Lee, J., et al.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020)
Article Google Scholar
Liu, F., Shareghi, E., Meng, Z., Basaldella, M., Collier, N.: Self-alignment pretraining for biomedical entity representations. In: NAACL (2021)
Google Scholar
Liu, F., Vulić, I., Korhonen, A., Collier, N.: Learning domain-specialised representations for cross-lingual biomedical entity linking. In: Proceedings of ACL-IJCNLP 2021, August 2021
Google Scholar
Liu, H., Cheng, J., Wang, W., Su, Y.: The general pair-based weighting loss for deep metric learning. arXiv preprint arXiv:1905.12837 (2019)
Magnini, B., Altuna, B., Lavelli, A., Speranza, M., Zanoli, R.: The e3c project: European clinical case corpus. In: SEPLN (2021)
Google Scholar
Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., Melton, G.: Semantic similarity and relatedness between clinical terms: an experimental study. In: AMIA ... Annual Symposium Proceedings/AMIA Symposium. AMIA Symposium 2010, pp. 572–576, November 2010
Google Scholar
Pakhomov, S.V.S., Pedersen, T., McInnes, B.T., Melton, G.B., Ruggieri, A.P., Chute, C.G.: Towards a framework for developing semantic relatedness reference standards. J. Biomed. Inform. 44(2), 251–65 (2011)
Article Google Scholar
Polignano, M., Basile, P., Degemmis, M., Semeraro, G., Basile, V.: Alberto: Italian BERT language understanding model for NLP challenging tasks based on tweets. In: CLiC-it (2019)
Google Scholar
Ronzani, M., et al.: Unstructured data in predictive process monitoring: lexicographic and semantic mapping to ICD-9-CM codes for the home hospitalization service. In: Bandini, S., Gasparini, F., Mascardi, V., Palmonari, M., Vizzari, G. (eds.) AIxIA 2021 – Advances in Artificial Intelligence. AIxIA 2021. LNCS, vol. 13196, pp. 700–715. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08421-8_48
Sun, Z., Deng, Z., Nie, J., Tang, J.: Rotate: Knowledge graph embedding by relational rotation in complex space. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019). https://openreview.net/forum?id=HkgEQnRqYQ
Trouillon, T., Welbl, J., Riedel, S., Gaussier, E., Bouchard, G.: Complex embeddings for simple link prediction. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 2071–2080. PMLR, New York, New York, USA, 20–22 June 2016. https://proceedings.mlr.press/v48/trouillon16.html
Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5017–5025 (2019)
Google Scholar
Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). https://arxiv.org/abs/1412.6575
Yuan, Z., Zhao, Z., Yu, S.: Coder: knowledge infused cross-lingual medical term embedding for term normalization. J. Biomed. Inform. 103983 (2022)
Google Scholar
Zeng, S., Yuan, Z., Yu, S.: Automatic biomedical term clustering by learning fine-grained term representations. In: BIONLP (2022)
Google Scholar
Zhang, R., Ji, Y., Zhang, Y., Passonneau, R.J.: Contrastive data and learning for natural language processing. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts, pp. 39–47. Association for Computational Linguistics, Seattle, United States, July 2022. https://doi.org/10.18653/v1/2022.naacl-tutorials.6, https://aclanthology.org/2022.naacl-tutorials.6
Zhang, S., et al.: Knowledge-rich self-supervised entity linking. ArXiv abs/2112.07887 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Turin, Turin, Italy
Denys Amore Bondarenko, Roger Ferrod & Luigi Di Caro

Authors

Denys Amore Bondarenko
View author publications
You can also search for this author in PubMed Google Scholar
Roger Ferrod
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Di Caro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roger Ferrod .

Editor information

Editors and Affiliations

University of Rome Tor Vergata, Rome, Italy
Roberto Basili
Sapienza University of Rome, Rome, Italy
Domenico Lembo
Roma Tre University, Rome, Italy
Carla Limongelli
National Research Council, Rome, Italy
Andrea Orlandini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bondarenko, D.A., Ferrod, R., Caro, L.D. (2023). Combining Contrastive Learning and Knowledge Graph Embeddings to Develop Medical Word Embeddings for the Italian Language. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham. https://doi.org/10.1007/978-3-031-47546-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-47546-7_28
Published: 02 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47545-0
Online ISBN: 978-3-031-47546-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Combining Contrastive Learning and Knowledge Graph Embeddings to Develop Medical Word Embeddings for the Italian Language

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases

Model Agnostic Knowledge Transfer Methods for Sentence Embedding Models

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Combining Contrastive Learning and Knowledge Graph Embeddings to Develop Medical Word Embeddings for the Italian Language

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases

Model Agnostic Knowledge Transfer Methods for Sentence Embedding Models

Radical and Stroke-Enhanced Chinese Word Embeddings Based on Neural Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation