Abstract
During the COVID-19 pandemic, a huge amount of literature was produced covering different aspects of infection. The use of artificial intelligence (AI) in medical imaging has been shown to improve screening, diagnosis, treatment, and medication for the COVID-19 virus. Applying natural language processing (NLP) solutions to COVID-19 literature has contributed to infer significant COVID-19-related topics and correlated diseases. In this paper, we aim at evaluating biomedical transformer-based NLP techniques in COVID-19 research to understand if they are able to classify problems related to COVID-19. Particularly, once collected COVID-19 publications encompassing the terms AI and medical imaging, fifteen BERT-based models have been compared with respect to modality prediction and task prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 72–78. Association for Computational Linguistics, Minneapolis, Minnesota, USA, June 2019. https://doi.org/10.18653/v1/W19-1909
An, X., et al.: An active learning-based approach for screening scholarly articles about the origins of sars-cov-2. PLOS ONE 17, e0273725 (2022). https://doi.org/10.1371/journal.pone.0273725
Bakarov, A.: A survey of word embeddings evaluation methods. CoRR abs/1801.09536 (2018)
Beltagy, I., et al.: SciBERT: a pretrained language model for scientific text. In: EMNLP. Association for Computational Linguistics (2019). https://www.aclweb.org/anthology/D19-1371
Bhatia, P., et al.: AWS CORD19-search: A scientific literature search engine for COVID-19. CoRR abs/2007.09186 (2020)
Born, J., et al.: On the role of artificial intelligence in medical imaging of covid-19. Patterns 2(6), 100269 (2021). https://doi.org/10.1016/j.patter.2021.100269
Camacho-Collados, J., Pilehvar, M.T.: From word to sense embeddings: a survey on vector representations of meaning. J. Artif. Int. Res. 63(1), 743–788 (2018). https://doi.org/10.1613/jair.1.11259
Chambon, P., et al.: Improved fine-tuning of in-domain transformer model for inferring covid-19 presence in multi-institutional radiology reports. J. Digit. Imaging 36, 164–177 (2022)
Cohan, A., et al.: Specter: Document-level representation learning using citation-informed transformers (2020)
Deepset: covid_bert_base (2020). https://huggingface.co/deepset/covid_bert_base
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)
Esteva, A., et al.: Co-search: COVID-19 information retrieval with semantic search, question answering, and abstractive summarization. CoRR abs/2006.09595 (2020)
González-Márquez, R., et al.: The landscape of biomedical research. bioRxiv (2023). https://doi.org/10.1101/2023.04.10.536208
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. CoRR abs/2007.15779 (2020)
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), May 2016. https://doi.org/10.1038/sdata.2016.35
Kaplan, J., et al.: Scaling laws for neural language models. CoRR abs/2001.08361 (2020)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)
Melamud, O., et al.: context2vec: learning generic context embedding with bidirectional LSTM. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. pp. 51–61. Association for Computational Linguistics, Berlin, Germany, August 2016. https://doi.org/10.18653/v1/K16-1006
Muennighoff, N., et al.: MTEB: massive text embedding benchmark. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2014–2037. Association for Computational Linguistics, Dubrovnik, Croatia, May 2023. https://aclanthology.org/2023.eacl-main.148
Mysore, S., et al.: CSFCube - a test collection of computer science research articles for faceted query by example. In: Thirty-Fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021). https://openreview.net/forum?id=8Y50dBbmGU
National Institutes of Health Office of Extramural Research: Open Mike: New NIH Resource to Analyze COVID-19 Literature: The COVID-19 Portfolio Tool. Retrieved April 2, 2021. https://nexus.od.nih.gov/all/2020/04/15/new-nih-resource-to-analyze-covid-19-literature-the-covid-19-portfolio-tool/ (2020)
Newton, A.J.H., et al.: A pipeline for the retrieval and extraction of domain-specific information with application to covid-19 immune signatures. BMC Bioinform. 24(1), July 2023. https://doi.org/10.1186/s12859-023-05397-8
NIH OPA: iSearch COVID-19 Portfolio, Query#1 (2023). https://icite.od.nih.gov/covid19/search/#search:searchId=64b824d13089f55f525505be
NIH OPA: iSearch COVID-19 Portfolio, Query#2 (2023). https://icite.od.nih.gov/covid19/search/#search:searchId=647e4bf03089f55f5254e28b
NLM (U.S. Natl. Lib. Med.): COVID-19 and SARS-CoV-2 MeSH Terms - 2021. NLM Technical Bulletin, Dec. 04 (2020). https://www.nlm.nih.gov/pubs/techbull/nd20/nd20_mesh_covid_terms.html
Page, M.J., et al.: The prisma 2020 statement: an updated guideline for reporting systematic reviews. Systematic Rev. 10(1), March 2021. https://doi.org/10.1186/s13643-021-01626-4
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Peters, M.E., et al.: Dissecting contextual word embeddings: Architecture and representation. CoRR abs/1808.08949 (2018)
Reimers, N., et al.: Classification and clustering of arguments with contextualized word embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 567–578. Association for Computational Linguistics, Florence, Italy, July 2019. https://doi.org/10.18653/v1/P19-1054
Schnabel, T., et al.: Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 298–307. Association for Computational Linguistics, Lisbon, Portugal, September 2015. https://doi.org/10.18653/v1/D15-1036
Singh, A., et al.: SciRepEval: a multi-format benchmark for scientific document representations. ArXiv abs/2211.13308 (2022)
Thakur, T.: Covid-scibert: a small language modelling expansion of scibert, a bert model trained on scientific text. https://github.com/lordtt13/word-embeddings/tree/master/COVID-19 (2020)
Tonneau, M.: clinicalcovid-bert-base-cased (2020). https://doi.org/10.57967/hf/0867
Tonneau, M.: biocovid-bert-large-cased (2023). https://doi.org/10.57967/hf/0869
Voorhees, E.M., et al.: TREC-COVID: constructing a pandemic information retrieval test collection. CoRR abs/2005.04474 (2020)
Wang, L.L., et al.: CORD-19: the COVID-19 open research dataset. In: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020. Association for Computational Linguistics, Online, July 2020. https://www.aclweb.org/anthology/2020.nlpcovid19-acl.1
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. CoRR abs/1910.03771 (2019)
World Health Organization: COVID-19 update for ICD-10. Publication (2020). https://www.who.int/publications/m/item/covid-19-update-for-icd-10
Xiao, H.: bert-as-service (2018). https://github.com/hanxiao/bert-as-service.git, read the documentation at: https://bert-as-service.readthedocs.io/en/latest/section/faq.html#frequently-asked-questions
Yan, A., et al.: RadBERT: adapting transformer-based language models to radiology. Radiol. Artif. Intell. 4(4), July 2022. https://doi.org/10.1148/ryai.210258
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zurlo, G., Ronchieri, E. (2024). Abstracts Embeddings Evaluation: A Case Study of Artificial Intelligence and Medical Imaging for the COVID-19 Infection. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing - ICIAP 2023 Workshops. ICIAP 2023. Lecture Notes in Computer Science, vol 14365. Springer, Cham. https://doi.org/10.1007/978-3-031-51023-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-51023-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51022-9
Online ISBN: 978-3-031-51023-6
eBook Packages: Computer ScienceComputer Science (R0)