Abstracts Embeddings Evaluation: A Case Study of Artificial Intelligence and Medical Imaging for the COVID-19 Infection

Zurlo, Giovanni; Ronchieri, Elisabetta

doi:10.1007/978-3-031-51023-6_18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14365))

Included in the following conference series:

International Conference on Image Analysis and Processing

204 Accesses

Abstract

During the COVID-19 pandemic, a huge amount of literature was produced covering different aspects of infection. The use of artificial intelligence (AI) in medical imaging has been shown to improve screening, diagnosis, treatment, and medication for the COVID-19 virus. Applying natural language processing (NLP) solutions to COVID-19 literature has contributed to infer significant COVID-19-related topics and correlated diseases. In this paper, we aim at evaluating biomedical transformer-based NLP techniques in COVID-19 research to understand if they are able to classify problems related to COVID-19. Particularly, once collected COVID-19 publications encompassing the terms AI and medical imaging, fifteen BERT-based models have been compared with respect to modality prediction and task prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 72–78. Association for Computational Linguistics, Minneapolis, Minnesota, USA, June 2019. https://doi.org/10.18653/v1/W19-1909
An, X., et al.: An active learning-based approach for screening scholarly articles about the origins of sars-cov-2. PLOS ONE 17, e0273725 (2022). https://doi.org/10.1371/journal.pone.0273725
Bakarov, A.: A survey of word embeddings evaluation methods. CoRR abs/1801.09536 (2018)
Google Scholar
Beltagy, I., et al.: SciBERT: a pretrained language model for scientific text. In: EMNLP. Association for Computational Linguistics (2019). https://www.aclweb.org/anthology/D19-1371
Bhatia, P., et al.: AWS CORD19-search: A scientific literature search engine for COVID-19. CoRR abs/2007.09186 (2020)
Google Scholar
Born, J., et al.: On the role of artificial intelligence in medical imaging of covid-19. Patterns 2(6), 100269 (2021). https://doi.org/10.1016/j.patter.2021.100269
Article Google Scholar
Camacho-Collados, J., Pilehvar, M.T.: From word to sense embeddings: a survey on vector representations of meaning. J. Artif. Int. Res. 63(1), 743–788 (2018). https://doi.org/10.1613/jair.1.11259
Chambon, P., et al.: Improved fine-tuning of in-domain transformer model for inferring covid-19 presence in multi-institutional radiology reports. J. Digit. Imaging 36, 164–177 (2022)
Article Google Scholar
Cohan, A., et al.: Specter: Document-level representation learning using citation-informed transformers (2020)
Google Scholar
Deepset: covid_bert_base (2020). https://huggingface.co/deepset/covid_bert_base
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)
Google Scholar
Esteva, A., et al.: Co-search: COVID-19 information retrieval with semantic search, question answering, and abstractive summarization. CoRR abs/2006.09595 (2020)
Google Scholar
González-Márquez, R., et al.: The landscape of biomedical research. bioRxiv (2023). https://doi.org/10.1101/2023.04.10.536208
Article Google Scholar
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. CoRR abs/2007.15779 (2020)
Google Scholar
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), May 2016. https://doi.org/10.1038/sdata.2016.35
Kaplan, J., et al.: Scaling laws for neural language models. CoRR abs/2001.08361 (2020)
Google Scholar
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)
Article MathSciNet Google Scholar
Melamud, O., et al.: context2vec: learning generic context embedding with bidirectional LSTM. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. pp. 51–61. Association for Computational Linguistics, Berlin, Germany, August 2016. https://doi.org/10.18653/v1/K16-1006
Muennighoff, N., et al.: MTEB: massive text embedding benchmark. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2014–2037. Association for Computational Linguistics, Dubrovnik, Croatia, May 2023. https://aclanthology.org/2023.eacl-main.148
Mysore, S., et al.: CSFCube - a test collection of computer science research articles for faceted query by example. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021). https://openreview.net/forum?id=8Y50dBbmGU
National Institutes of Health Office of Extramural Research: Open Mike: New NIH Resource to Analyze COVID-19 Literature: The COVID-19 Portfolio Tool. Retrieved April 2, 2021. https://nexus.od.nih.gov/all/2020/04/15/new-nih-resource-to-analyze-covid-19-literature-the-covid-19-portfolio-tool/ (2020)
Newton, A.J.H., et al.: A pipeline for the retrieval and extraction of domain-specific information with application to covid-19 immune signatures. BMC Bioinform. 24(1), July 2023. https://doi.org/10.1186/s12859-023-05397-8
NIH OPA: iSearch COVID-19 Portfolio, Query#1 (2023). https://icite.od.nih.gov/covid19/search/#search:searchId=64b824d13089f55f525505be
NIH OPA: iSearch COVID-19 Portfolio, Query#2 (2023). https://icite.od.nih.gov/covid19/search/#search:searchId=647e4bf03089f55f5254e28b
NLM (U.S. Natl. Lib. Med.): COVID-19 and SARS-CoV-2 MeSH Terms - 2021. NLM Technical Bulletin, Dec. 04 (2020). https://www.nlm.nih.gov/pubs/techbull/nd20/nd20_mesh_covid_terms.html
Page, M.J., et al.: The prisma 2020 statement: an updated guideline for reporting systematic reviews. Systematic Rev. 10(1), March 2021. https://doi.org/10.1186/s13643-021-01626-4
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Peters, M.E., et al.: Dissecting contextual word embeddings: Architecture and representation. CoRR abs/1808.08949 (2018)
Google Scholar
Reimers, N., et al.: Classification and clustering of arguments with contextualized word embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 567–578. Association for Computational Linguistics, Florence, Italy, July 2019. https://doi.org/10.18653/v1/P19-1054
Schnabel, T., et al.: Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 298–307. Association for Computational Linguistics, Lisbon, Portugal, September 2015. https://doi.org/10.18653/v1/D15-1036
Singh, A., et al.: SciRepEval: a multi-format benchmark for scientific document representations. ArXiv abs/2211.13308 (2022)
Google Scholar
Thakur, T.: Covid-scibert: a small language modelling expansion of scibert, a bert model trained on scientific text. https://github.com/lordtt13/word-embeddings/tree/master/COVID-19 (2020)
Tonneau, M.: clinicalcovid-bert-base-cased (2020). https://doi.org/10.57967/hf/0867
Tonneau, M.: biocovid-bert-large-cased (2023). https://doi.org/10.57967/hf/0869
Voorhees, E.M., et al.: TREC-COVID: constructing a pandemic information retrieval test collection. CoRR abs/2005.04474 (2020)
Google Scholar
Wang, L.L., et al.: CORD-19: the COVID-19 open research dataset. In: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020. Association for Computational Linguistics, Online, July 2020. https://www.aclweb.org/anthology/2020.nlpcovid19-acl.1
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. CoRR abs/1910.03771 (2019)
Google Scholar
World Health Organization: COVID-19 update for ICD-10. Publication (2020). https://www.who.int/publications/m/item/covid-19-update-for-icd-10
Xiao, H.: bert-as-service (2018). https://github.com/hanxiao/bert-as-service.git, read the documentation at: https://bert-as-service.readthedocs.io/en/latest/section/faq.html#frequently-asked-questions
Yan, A., et al.: RadBERT: adapting transformer-based language models to radiology. Radiol. Artif. Intell. 4(4), July 2022. https://doi.org/10.1148/ryai.210258

Download references

Author information

Authors and Affiliations

Department of Statistical Sciences, University of Bologna, Bologna, Italy
Giovanni Zurlo & Elisabetta Ronchieri
INFN CNAF, Bologna, Italy
Elisabetta Ronchieri

Authors

Giovanni Zurlo
View author publications
You can also search for this author in PubMed Google Scholar
Elisabetta Ronchieri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elisabetta Ronchieri .

Editor information

Editors and Affiliations

University of Udine, Udine, Italy
Gian Luca Foresti
University of Udine, Udine, Italy
Andrea Fusiello
University of York, York, UK
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zurlo, G., Ronchieri, E. (2024). Abstracts Embeddings Evaluation: A Case Study of Artificial Intelligence and Medical Imaging for the COVID-19 Infection. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing - ICIAP 2023 Workshops. ICIAP 2023. Lecture Notes in Computer Science, vol 14365. Springer, Cham. https://doi.org/10.1007/978-3-031-51023-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-51023-6_18
Published: 24 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51022-9
Online ISBN: 978-3-031-51023-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstracts Embeddings Evaluation: A Case Study of Artificial Intelligence and Medical Imaging for the COVID-19 Infection