skip to main content
10.1145/3477495.3531667acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

A2A-API: A Prototype for Biomedical Information Retrieval Research and Benchmarking

Published: 07 July 2022 Publication History

Abstract

Finding relevant literature is crucial for biomedical research and in the practice of evidence-based medicine, making biomedical search an important application area within the field of information retrieval. This is recognised by the broader IR community, and in particular by the organisers of Text Retrieval Conference (TREC) as early as 2003. While TREC provides crucial evaluation resources, to get started in biomedical IR one needs to tackle an important software engineering hurdle of parsing, indexing, and deploying several large document collections. Moreover, many newcomers to the field often face a steep learning curve, where theoretical concepts are tangled up with technical aspects. Finally, many of the existing baselines and systems are difficult to reproduce.
We aim to alleviate all three of these bottlenecks with the launch of A2A-API. It is a RESTful API which serves as an easy-to-use and programming-language-independent interface to existing biomedical TREC collections. It builds upon A2A, our system for biomedical information retrieval benchmarking, and extends it with additional functionalities. Apart from providing programmatic access to the features of the original A2A system - focused principally on benchmarking - A2A-API supports biomedical IR researchers in development of systems featuring reranking and query reformulation components. In this demonstration, we illustrate the capabilities of A2A-API with comprehensive use cases.

References

[1]
Apache. 2016. http://lucene.apache.org/solr/. [Version: 6.0.1].
[2]
William Hersh and Ellen Voorhees. 2009. TREC Genomics Special Issue Overview. Information Retrieval 12, 1 (2009), 1--15.
[3]
Sarvnaz Karimi, Vincent Nguyen, Falk Scholer, Brian Jin, and Sara Fala- maki. 2018. A2A: Benchmark Your Clinical Decision Support Search. In SIGIR. Ann Arbor, MI, 1277--1280.
[4]
Bevan Koopman and Guido Zuccon. 2016. A test collection for matching patients to clinical trials. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 669--672.
[5]
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In SIGIR. 2356--2362.
[6]
Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Ar- man Cohan, and Nazli Goharian. 2021. Simplified Data Wrangling with ir_datasets. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2429--2436.
[7]
Vincent Nguyen, Sarvnaz Karimi, and Brian Jin. 2019. An Experimentation Platform for Precision Medicine. In SIGIR. Paris, France, 1357--1360.
[8]
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085 (2019). arXiv:1901.04085 [cs.IR]
[9]
Iadh Ounis, Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, and Douglas Johnson. 2005. Terrier Information Retrieval Platform. In ECIR, Vol. 3408. 517--519.
[10]
Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen Voorhees, Lucy Lu Wang, and William Hersh. 2020. TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19. The Journal of the American Medical Informatics Association 27, 9 (2020), 1431--1436.
[11]
Kirk Roberts, Dina Demner-Fushman, Ellen Voorhees, William R. Hersh, Steven Bedrick, Alexander Lazar, and Shubham Pant. 2017. Overview of the TREC 2017 Precision Medicine Track. In TREC. Gaithersburg, MD.
[12]
Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, Steven Bedrick, and William R Hersh. 2020. Overview of the TREC 2020 Precision Medicine Track. In TREC.
[13]
Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, and William R. Hersh. 2016. Overview of the TREC 2016 Clinical Decision Support Track. In TREC. Gaithersburg, MD.
[14]
Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, William R. Hersh, Steven Bedrick, and Alexander J. Lazar. 2018. Overview of the TREC 2018 Precision Medicine Track. In TREC. Gaithersburg, MD.
[15]
Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, William R. Hersh, Steven Bedrick, Alexander J. Lazar, Shubham Pant, and Funda Meric-Bernstam. 2019. Overview of the TREC 2019 Precision Medicine Track. In TREC. Gaithersburg, MD.
[16]
Kirk Roberts, Matthew S. Simpson, Ellen Voorhees, and William R. Hersh. 2015. Overview of the TREC 2015 Clinical Decision Support Track. In Text REtrieval Conference. Gaithersburg, MD.
[17]
Maciej Rybinski, Sarvnaz Karimi, and Aleney Khoo. 2021. Science2Cure: A Clinical Trial Search Prototype. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2620--2624.
[18]
Maceij Rybinski, Sarvnaz Karimi, Vincent Nguyen, and Cecile Paris. 2020. A2A: A platform for research in biomedical literature search. BMC Bioin-formatics 21, 572 (2020).
[19]
Maciej Rybinski, Vincent Nguyen, and Sarvnaz Karimi. 2021. CSIROmed Team Report of TREC 2021 Clinical Trials track: Experiments with BERT Reranking Methods. In TREC. Online.
[20]
M. Simpson, E. Voorhees, and W. Hersh. 2014. Overview of the TREC 2014 Clinical Decision Support Track. In TREC. Gaithersburg, MD.
[21]
Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William Merrill, Paul Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Brandon Stilson, Alex D. Wade, Kuansan Wang, Chris Wilhelm, Boya Xie, Douglas Raymond, Daniel S. Weld, Oren Etzioni, and Sebastian Kohlmeier. 2020. CORD-19: The Covid- 19 Open Research Dataset. In ACL NLP-COVID Workshop. Online. https: //arxiv.org/abs/2004.10706
[22]
Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In SIGIR. Tokyo, Japan, 1253--1256.

Cited By

View all
  • (2024)Learning to match patients to clinical trials using large language modelsJournal of Biomedical Informatics10.1016/j.jbi.2024.104734159:COnline publication date: 1-Nov-2024
  • (2023)SCHash: Speedy Simplicial Complex Neural Networks via Randomized HashingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591762(1609-1618)Online publication date: 19-Jul-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clinical trials search
  2. evidence-based medicine
  3. learning-to-rank
  4. medical information retrieval
  5. precision medicine

Qualifiers

  • Short-paper

Conference

SIGIR '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)4
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Learning to match patients to clinical trials using large language modelsJournal of Biomedical Informatics10.1016/j.jbi.2024.104734159:COnline publication date: 1-Nov-2024
  • (2023)SCHash: Speedy Simplicial Complex Neural Networks via Randomized HashingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591762(1609-1618)Online publication date: 19-Jul-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media