Abstract
Integrating heterogeneous and complementary data in clinical decision support systems (e.g., electronic health records, drug databases, scientific articles, etc.) could improve the accuracy of these systems. Based on this finding, the PreDiBioOntoL (Predicting Clinical Diagnosis by combining BioMedical Ontologies and Language Models) project aims at developing a computer-aided clinical and predictive diagnosis tool to help clinicians to better handle their patients. This tool will combine deep neural networks trained on heterogeneous data sources and biomedical ontologies. The first obtained results of PreDiBioOntoL are presented in this paper. We propose new siamese neural models (BioSTransformers and BioS-MiniLM) that embed texts to be compared in a vector space and then find their similarities. The models optimize an objective self-supervised contrastive learning function on articles from the scientific literature (MEDLINE bibliographic database) associated with their MeSH (Medical Subject Headings) keywords. The obtained results on several benchmarks show that the proposed models can solve different biomedical tasks without examples (zero-shot). These results are comparable to those of other biomedical transformers that are fine-tuned on supervised data specific to the problems being addressed. Moreover, we show in this paper how these new siamese models are exploited in order to semantically map entities from several biomedical ontologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
References
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 72–78. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019)
Ayalew, M.B., Tegegn, H.G., Abdela, O.A.: Drug related hospital admissions; a systematic review of the recent literatures. Bull. Emerg. Trauma 7(4), 339 (2019)
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)
Chicco, D.: Siamese neural networks: an overview. In: Artificial Neural Networks, pp. 73–94 (2021)
Chua, W.W.K., Jae Kim, J.: BOAT: automatic alignment of biomedical ontologies using term informativeness and candidate selection. J. Biomed. Inform. 45(2), 337–349 (2012)
Cohan, A., Feldman, S., Beltagy, I., Downey, D., Weld, D.S.: Specter: Document-level representation learning using citation-informed transformers. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2270–2282 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38721-0
Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910 (2021)
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3(1), 1–23 (2022)
Hanahan, D., Weinberg, R.A.: The hallmarks of cancer. Cell 100(1), 57–70 (2000)
Hertling, S., Portisch, J., Paulheim, H.: Matching with transformers in melt (2021)
Jin, Q., Dhingra, B., Liu, Z., Cohen, W., Lu, X.: PubMedQA: a dataset for biomedical research question answering. In: Proceedings of (EMNLP-IJCNLP), pp. 2567–2577 (2019)
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Kanakarajan, K.r., Kundumani, B., Sankarasubbu, M.: BioELECTRA: pretrained biomedical text encoder using discriminators. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 143–154. Association for Computational Linguistics, Online (2021)
Kolyvakis, P., Kalousis, A., Kiritsis, D.: DeepAlignment: unsupervised ontology matching with refined word vectors. In: Proceedings of NAACL-HLT, 787–798, pp. 787–798 (2018)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)
Liu, F., Shareghi, E., Meng, Z., Basaldella, M., Collier, N.: Self-alignment pretraining for biomedical entity representations. In: Proceedings of NAACL-HLT, pp. 4228–4238 (2021)
Mary, M., Soualmia, L., Gansel, X., Darmoni, S., Karlsson, D., Schulz, S.: Ontological representation of laboratory test observables: challenges and perspectives in the snomed CT observable entity model adoption, pp. 14–23 (2017)
Muennighoff, N., Tazi, N., Magne, L., Reimers, N.: MTEB: massive text embedding benchmark. arXiv preprint arXiv:2210.07316 (2022)
Nentidis, A., Bougiatiotis, K., Krithara, A., Paliouras, G.: Results of the seventh edition of the BioASQ challenge. In: Cellier, P., Driessens, K. (eds.) ECML PKDD 2019. CCIS, vol. 1168, pp. 553–568. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43887-6_51
Ormerod, M., Martínez del Rincón, J., Devereux, B.: Predicting semantic similarity between clinical sentence pairs using transformer models: evaluation and representational analysis. JMIR Med. Inform. 9(5), e23099 (2021)
Osman, I., Ben Yahia, S., Diallo, G.: Ontology integration: approaches and challenging issues. Inf. Fusion 71, 38–63 (2021)
Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 58–65 (2019)
Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 58–65. Association for Computational Linguistics, Florence, Italy (2019)
Portisch, J., Hladik, M., Paulheim, H.: Background knowledge in ontology matching: a survey. Semantic Web, pp. 1–55 (2022)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China (2019)
Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25, 158–176 (2013)
Vela, J., Gracia, J.: Cross-lingual ontology matching with CIDER-LM: results for OAEI 2022 (2022)
Wang, K., Reimers, N., Gurevych, I.: TSDAE: using transformer-based sequential denoising auto-encoderfor unsupervised sentence embedding learning. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 671–688 (2021)
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv. Neural. Inf. Process. Syst. 33, 5776–5788 (2020)
Wu, J., Lv, J., Guo, H., Ma, S.: DAEOM: a deep attentional embedding approach for biomedical ontology matching. Appl. Sci. 10(21) (2020)
Zimmermann, A., Euzenat, J.: Three semantics for distributed systems and their relations with alignment composition. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 16–29. Springer, Heidelberg (2006). https://doi.org/10.1007/11926078_2
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Menad, S., Laddada, W., Abdeddaïm, S., Soualmia, L.F. (2023). New Siamese Neural Networks for Text Classification and Ontologies Alignment. In: Collet, P., Gardashova, L., El Zant, S., Abdulkarimova, U. (eds) Complex Computational Ecosystems. CCE 2023. Lecture Notes in Computer Science, vol 13927. Springer, Cham. https://doi.org/10.1007/978-3-031-44355-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-44355-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44354-1
Online ISBN: 978-3-031-44355-8
eBook Packages: Computer ScienceComputer Science (R0)