Skip to main content

New Siamese Neural Networks for Text Classification and Ontologies Alignment

  • Conference paper
  • First Online:
Complex Computational Ecosystems (CCE 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13927))

Included in the following conference series:

Abstract

Integrating heterogeneous and complementary data in clinical decision support systems (e.g., electronic health records, drug databases, scientific articles, etc.) could improve the accuracy of these systems. Based on this finding, the PreDiBioOntoL (Predicting Clinical Diagnosis by combining BioMedical Ontologies and Language Models) project aims at developing a computer-aided clinical and predictive diagnosis tool to help clinicians to better handle their patients. This tool will combine deep neural networks trained on heterogeneous data sources and biomedical ontologies. The first obtained results of PreDiBioOntoL are presented in this paper. We propose new siamese neural models (BioSTransformers and BioS-MiniLM) that embed texts to be compared in a vector space and then find their similarities. The models optimize an objective self-supervised contrastive learning function on articles from the scientific literature (MEDLINE bibliographic database) associated with their MeSH (Medical Subject Headings) keywords. The obtained results on several benchmarks show that the proposed models can solve different biomedical tasks without examples (zero-shot). These results are comparable to those of other biomedical transformers that are fine-tuned on supervised data specific to the problems being addressed. Moreover, we show in this paper how these new siamese models are exploited in order to semantically map entities from several biomedical ontologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.

  2. 2.

    https://www.sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss.

  3. 3.

    https://bioportal.bioontology.org/ontologies/DOID.

  4. 4.

    https://bioportal.bioontology.org/ontologies/DRON.

  5. 5.

    http://purl.obolibrary.org/obo/.

  6. 6.

    http://purl.obolibrary.org/obo/IAO_0000115.

  7. 7.

    https://www.nlm.nih.gov/research/umls/index.html.

  8. 8.

    https://uts-ws.nlm.nih.gov/rest/content/current/CUI/code/relations?includeAdditionalRelationLabels=may_be_treated_by &apiKey.

References

  1. Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 72–78. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019)

    Google Scholar 

  2. Ayalew, M.B., Tegegn, H.G., Abdela, O.A.: Drug related hospital admissions; a systematic review of the recent literatures. Bull. Emerg. Trauma 7(4), 339 (2019)

    Article  Google Scholar 

  3. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)

    Google Scholar 

  4. Chicco, D.: Siamese neural networks: an overview. In: Artificial Neural Networks, pp. 73–94 (2021)

    Google Scholar 

  5. Chua, W.W.K., Jae Kim, J.: BOAT: automatic alignment of biomedical ontologies using term informativeness and candidate selection. J. Biomed. Inform. 45(2), 337–349 (2012)

    Google Scholar 

  6. Cohan, A., Feldman, S., Beltagy, I., Downey, D., Weld, D.S.: Specter: Document-level representation learning using citation-informed transformers. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2270–2282 (2020)

    Google Scholar 

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)

    Google Scholar 

  8. Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38721-0

    Book  MATH  Google Scholar 

  9. Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910 (2021)

    Google Scholar 

  10. Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3(1), 1–23 (2022)

    Article  Google Scholar 

  11. Hanahan, D., Weinberg, R.A.: The hallmarks of cancer. Cell 100(1), 57–70 (2000)

    Article  Google Scholar 

  12. Hertling, S., Portisch, J., Paulheim, H.: Matching with transformers in melt (2021)

    Google Scholar 

  13. Jin, Q., Dhingra, B., Liu, Z., Cohen, W., Lu, X.: PubMedQA: a dataset for biomedical research question answering. In: Proceedings of (EMNLP-IJCNLP), pp. 2567–2577 (2019)

    Google Scholar 

  14. Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)

    Article  MathSciNet  Google Scholar 

  15. Kanakarajan, K.r., Kundumani, B., Sankarasubbu, M.: BioELECTRA: pretrained biomedical text encoder using discriminators. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 143–154. Association for Computational Linguistics, Online (2021)

    Google Scholar 

  16. Kolyvakis, P., Kalousis, A., Kiritsis, D.: DeepAlignment: unsupervised ontology matching with refined word vectors. In: Proceedings of NAACL-HLT, 787–798, pp. 787–798 (2018)

    Google Scholar 

  17. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)

    Article  MathSciNet  Google Scholar 

  18. Liu, F., Shareghi, E., Meng, Z., Basaldella, M., Collier, N.: Self-alignment pretraining for biomedical entity representations. In: Proceedings of NAACL-HLT, pp. 4228–4238 (2021)

    Google Scholar 

  19. Mary, M., Soualmia, L., Gansel, X., Darmoni, S., Karlsson, D., Schulz, S.: Ontological representation of laboratory test observables: challenges and perspectives in the snomed CT observable entity model adoption, pp. 14–23 (2017)

    Google Scholar 

  20. Muennighoff, N., Tazi, N., Magne, L., Reimers, N.: MTEB: massive text embedding benchmark. arXiv preprint arXiv:2210.07316 (2022)

  21. Nentidis, A., Bougiatiotis, K., Krithara, A., Paliouras, G.: Results of the seventh edition of the BioASQ challenge. In: Cellier, P., Driessens, K. (eds.) ECML PKDD 2019. CCIS, vol. 1168, pp. 553–568. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43887-6_51

    Chapter  Google Scholar 

  22. Ormerod, M., Martínez del Rincón, J., Devereux, B.: Predicting semantic similarity between clinical sentence pairs using transformer models: evaluation and representational analysis. JMIR Med. Inform. 9(5), e23099 (2021)

    Google Scholar 

  23. Osman, I., Ben Yahia, S., Diallo, G.: Ontology integration: approaches and challenging issues. Inf. Fusion 71, 38–63 (2021)

    Article  Google Scholar 

  24. Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 58–65 (2019)

    Google Scholar 

  25. Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 58–65. Association for Computational Linguistics, Florence, Italy (2019)

    Google Scholar 

  26. Portisch, J., Hladik, M., Paulheim, H.: Background knowledge in ontology matching: a survey. Semantic Web, pp. 1–55 (2022)

    Google Scholar 

  27. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China (2019)

    Google Scholar 

  28. Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25, 158–176 (2013)

    Article  Google Scholar 

  29. Vela, J., Gracia, J.: Cross-lingual ontology matching with CIDER-LM: results for OAEI 2022 (2022)

    Google Scholar 

  30. Wang, K., Reimers, N., Gurevych, I.: TSDAE: using transformer-based sequential denoising auto-encoderfor unsupervised sentence embedding learning. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 671–688 (2021)

    Google Scholar 

  31. Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv. Neural. Inf. Process. Syst. 33, 5776–5788 (2020)

    Google Scholar 

  32. Wu, J., Lv, J., Guo, H., Ma, S.: DAEOM: a deep attentional embedding approach for biomedical ontology matching. Appl. Sci. 10(21) (2020)

    Google Scholar 

  33. Zimmermann, A., Euzenat, J.: Three semantics for distributed systems and their relations with alignment composition. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 16–29. Springer, Heidelberg (2006). https://doi.org/10.1007/11926078_2

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Safaa Menad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Menad, S., Laddada, W., Abdeddaïm, S., Soualmia, L.F. (2023). New Siamese Neural Networks for Text Classification and Ontologies Alignment. In: Collet, P., Gardashova, L., El Zant, S., Abdulkarimova, U. (eds) Complex Computational Ecosystems. CCE 2023. Lecture Notes in Computer Science, vol 13927. Springer, Cham. https://doi.org/10.1007/978-3-031-44355-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44355-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44354-1

  • Online ISBN: 978-3-031-44355-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics