Skip to main content

Abstracts Embeddings Evaluation: A Case Study of Artificial Intelligence and Medical Imaging for the COVID-19 Infection

  • Conference paper
  • First Online:
Image Analysis and Processing - ICIAP 2023 Workshops (ICIAP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14365))

Included in the following conference series:

  • 204 Accesses

Abstract

During the COVID-19 pandemic, a huge amount of literature was produced covering different aspects of infection. The use of artificial intelligence (AI) in medical imaging has been shown to improve screening, diagnosis, treatment, and medication for the COVID-19 virus. Applying natural language processing (NLP) solutions to COVID-19 literature has contributed to infer significant COVID-19-related topics and correlated diseases. In this paper, we aim at evaluating biomedical transformer-based NLP techniques in COVID-19 research to understand if they are able to classify problems related to COVID-19. Particularly, once collected COVID-19 publications encompassing the terms AI and medical imaging, fifteen BERT-based models have been compared with respect to modality prediction and task prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 72–78. Association for Computational Linguistics, Minneapolis, Minnesota, USA, June 2019. https://doi.org/10.18653/v1/W19-1909

  2. An, X., et al.: An active learning-based approach for screening scholarly articles about the origins of sars-cov-2. PLOS ONE 17, e0273725 (2022). https://doi.org/10.1371/journal.pone.0273725

  3. Bakarov, A.: A survey of word embeddings evaluation methods. CoRR abs/1801.09536 (2018)

    Google Scholar 

  4. Beltagy, I., et al.: SciBERT: a pretrained language model for scientific text. In: EMNLP. Association for Computational Linguistics (2019). https://www.aclweb.org/anthology/D19-1371

  5. Bhatia, P., et al.: AWS CORD19-search: A scientific literature search engine for COVID-19. CoRR abs/2007.09186 (2020)

    Google Scholar 

  6. Born, J., et al.: On the role of artificial intelligence in medical imaging of covid-19. Patterns 2(6), 100269 (2021). https://doi.org/10.1016/j.patter.2021.100269

    Article  Google Scholar 

  7. Camacho-Collados, J., Pilehvar, M.T.: From word to sense embeddings: a survey on vector representations of meaning. J. Artif. Int. Res. 63(1), 743–788 (2018). https://doi.org/10.1613/jair.1.11259

  8. Chambon, P., et al.: Improved fine-tuning of in-domain transformer model for inferring covid-19 presence in multi-institutional radiology reports. J. Digit. Imaging 36, 164–177 (2022)

    Article  Google Scholar 

  9. Cohan, A., et al.: Specter: Document-level representation learning using citation-informed transformers (2020)

    Google Scholar 

  10. Deepset: covid_bert_base (2020). https://huggingface.co/deepset/covid_bert_base

  11. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)

    Google Scholar 

  12. Esteva, A., et al.: Co-search: COVID-19 information retrieval with semantic search, question answering, and abstractive summarization. CoRR abs/2006.09595 (2020)

    Google Scholar 

  13. González-Márquez, R., et al.: The landscape of biomedical research. bioRxiv (2023). https://doi.org/10.1101/2023.04.10.536208

    Article  Google Scholar 

  14. Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. CoRR abs/2007.15779 (2020)

    Google Scholar 

  15. Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), May 2016. https://doi.org/10.1038/sdata.2016.35

  16. Kaplan, J., et al.: Scaling laws for neural language models. CoRR abs/2001.08361 (2020)

    Google Scholar 

  17. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)

    Article  MathSciNet  Google Scholar 

  18. Melamud, O., et al.: context2vec: learning generic context embedding with bidirectional LSTM. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. pp. 51–61. Association for Computational Linguistics, Berlin, Germany, August 2016. https://doi.org/10.18653/v1/K16-1006

  19. Muennighoff, N., et al.: MTEB: massive text embedding benchmark. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2014–2037. Association for Computational Linguistics, Dubrovnik, Croatia, May 2023. https://aclanthology.org/2023.eacl-main.148

  20. Mysore, S., et al.: CSFCube - a test collection of computer science research articles for faceted query by example. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021). https://openreview.net/forum?id=8Y50dBbmGU

  21. National Institutes of Health Office of Extramural Research: Open Mike: New NIH Resource to Analyze COVID-19 Literature: The COVID-19 Portfolio Tool. Retrieved April 2, 2021. https://nexus.od.nih.gov/all/2020/04/15/new-nih-resource-to-analyze-covid-19-literature-the-covid-19-portfolio-tool/ (2020)

  22. Newton, A.J.H., et al.: A pipeline for the retrieval and extraction of domain-specific information with application to covid-19 immune signatures. BMC Bioinform. 24(1), July 2023. https://doi.org/10.1186/s12859-023-05397-8

  23. NIH OPA: iSearch COVID-19 Portfolio, Query#1 (2023). https://icite.od.nih.gov/covid19/search/#search:searchId=64b824d13089f55f525505be

  24. NIH OPA: iSearch COVID-19 Portfolio, Query#2 (2023). https://icite.od.nih.gov/covid19/search/#search:searchId=647e4bf03089f55f5254e28b

  25. NLM (U.S. Natl. Lib. Med.): COVID-19 and SARS-CoV-2 MeSH Terms - 2021. NLM Technical Bulletin, Dec. 04 (2020). https://www.nlm.nih.gov/pubs/techbull/nd20/nd20_mesh_covid_terms.html

  26. Page, M.J., et al.: The prisma 2020 statement: an updated guideline for reporting systematic reviews. Systematic Rev. 10(1), March 2021. https://doi.org/10.1186/s13643-021-01626-4

  27. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  28. Peters, M.E., et al.: Dissecting contextual word embeddings: Architecture and representation. CoRR abs/1808.08949 (2018)

    Google Scholar 

  29. Reimers, N., et al.: Classification and clustering of arguments with contextualized word embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 567–578. Association for Computational Linguistics, Florence, Italy, July 2019. https://doi.org/10.18653/v1/P19-1054

  30. Schnabel, T., et al.: Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 298–307. Association for Computational Linguistics, Lisbon, Portugal, September 2015. https://doi.org/10.18653/v1/D15-1036

  31. Singh, A., et al.: SciRepEval: a multi-format benchmark for scientific document representations. ArXiv abs/2211.13308 (2022)

    Google Scholar 

  32. Thakur, T.: Covid-scibert: a small language modelling expansion of scibert, a bert model trained on scientific text. https://github.com/lordtt13/word-embeddings/tree/master/COVID-19 (2020)

  33. Tonneau, M.: clinicalcovid-bert-base-cased (2020). https://doi.org/10.57967/hf/0867

  34. Tonneau, M.: biocovid-bert-large-cased (2023). https://doi.org/10.57967/hf/0869

  35. Voorhees, E.M., et al.: TREC-COVID: constructing a pandemic information retrieval test collection. CoRR abs/2005.04474 (2020)

    Google Scholar 

  36. Wang, L.L., et al.: CORD-19: the COVID-19 open research dataset. In: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020. Association for Computational Linguistics, Online, July 2020. https://www.aclweb.org/anthology/2020.nlpcovid19-acl.1

  37. Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. CoRR abs/1910.03771 (2019)

    Google Scholar 

  38. World Health Organization: COVID-19 update for ICD-10. Publication (2020). https://www.who.int/publications/m/item/covid-19-update-for-icd-10

  39. Xiao, H.: bert-as-service (2018). https://github.com/hanxiao/bert-as-service.git, read the documentation at: https://bert-as-service.readthedocs.io/en/latest/section/faq.html#frequently-asked-questions

  40. Yan, A., et al.: RadBERT: adapting transformer-based language models to radiology. Radiol. Artif. Intell. 4(4), July 2022. https://doi.org/10.1148/ryai.210258

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elisabetta Ronchieri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zurlo, G., Ronchieri, E. (2024). Abstracts Embeddings Evaluation: A Case Study of Artificial Intelligence and Medical Imaging for the COVID-19 Infection. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing - ICIAP 2023 Workshops. ICIAP 2023. Lecture Notes in Computer Science, vol 14365. Springer, Cham. https://doi.org/10.1007/978-3-031-51023-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-51023-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-51022-9

  • Online ISBN: 978-3-031-51023-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics