short-paper

A2A-API: A Prototype for Biomedical Information Retrieval Research and Benchmarking

Authors:

Maciej Rybinski,

Sarvnaz KarimiAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 3318 - 3322

https://doi.org/10.1145/3477495.3531667

Published: 07 July 2022 Publication History

Abstract

Finding relevant literature is crucial for biomedical research and in the practice of evidence-based medicine, making biomedical search an important application area within the field of information retrieval. This is recognised by the broader IR community, and in particular by the organisers of Text Retrieval Conference (TREC) as early as 2003. While TREC provides crucial evaluation resources, to get started in biomedical IR one needs to tackle an important software engineering hurdle of parsing, indexing, and deploying several large document collections. Moreover, many newcomers to the field often face a steep learning curve, where theoretical concepts are tangled up with technical aspects. Finally, many of the existing baselines and systems are difficult to reproduce.

We aim to alleviate all three of these bottlenecks with the launch of A2A-API. It is a RESTful API which serves as an easy-to-use and programming-language-independent interface to existing biomedical TREC collections. It builds upon A2A, our system for biomedical information retrieval benchmarking, and extends it with additional functionalities. Apart from providing programmatic access to the features of the original A2A system - focused principally on benchmarking - A2A-API supports biomedical IR researchers in development of systems featuring reranking and query reformulation components. In this demonstration, we illustrate the capabilities of A2A-API with comprehensive use cases.

References

[1]

Apache. 2016. http://lucene.apache.org/solr/. [Version: 6.0.1].

[2]

William Hersh and Ellen Voorhees. 2009. TREC Genomics Special Issue Overview. Information Retrieval 12, 1 (2009), 1--15.

Digital Library

[3]

Sarvnaz Karimi, Vincent Nguyen, Falk Scholer, Brian Jin, and Sara Fala- maki. 2018. A2A: Benchmark Your Clinical Decision Support Search. In SIGIR. Ann Arbor, MI, 1277--1280.

[4]

Bevan Koopman and Guido Zuccon. 2016. A test collection for matching patients to clinical trials. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 669--672.

Digital Library

[5]

Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In SIGIR. 2356--2362.

[6]

Sean MacAvaney, Andrew Yates, Sergey Feldman, Doug Downey, Ar- man Cohan, and Nazli Goharian. 2021. Simplified Data Wrangling with ir_datasets. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2429--2436.

Digital Library

[7]

Vincent Nguyen, Sarvnaz Karimi, and Brian Jin. 2019. An Experimentation Platform for Precision Medicine. In SIGIR. Paris, France, 1357--1360.

[8]

Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv:1901.04085 (2019). arXiv:1901.04085 [cs.IR]

[9]

Iadh Ounis, Gianni Amati, Vassilis Plachouras, Ben He, Craig Macdonald, and Douglas Johnson. 2005. Terrier Information Retrieval Platform. In ECIR, Vol. 3408. 517--519.

[10]

Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen Voorhees, Lucy Lu Wang, and William Hersh. 2020. TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19. The Journal of the American Medical Informatics Association 27, 9 (2020), 1431--1436.

[11]

Kirk Roberts, Dina Demner-Fushman, Ellen Voorhees, William R. Hersh, Steven Bedrick, Alexander Lazar, and Shubham Pant. 2017. Overview of the TREC 2017 Precision Medicine Track. In TREC. Gaithersburg, MD.

[12]

Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, Steven Bedrick, and William R Hersh. 2020. Overview of the TREC 2020 Precision Medicine Track. In TREC.

[13]

Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, and William R. Hersh. 2016. Overview of the TREC 2016 Clinical Decision Support Track. In TREC. Gaithersburg, MD.

[14]

Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, William R. Hersh, Steven Bedrick, and Alexander J. Lazar. 2018. Overview of the TREC 2018 Precision Medicine Track. In TREC. Gaithersburg, MD.

[15]

Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees, William R. Hersh, Steven Bedrick, Alexander J. Lazar, Shubham Pant, and Funda Meric-Bernstam. 2019. Overview of the TREC 2019 Precision Medicine Track. In TREC. Gaithersburg, MD.

[16]

Kirk Roberts, Matthew S. Simpson, Ellen Voorhees, and William R. Hersh. 2015. Overview of the TREC 2015 Clinical Decision Support Track. In Text REtrieval Conference. Gaithersburg, MD.

[17]

Maciej Rybinski, Sarvnaz Karimi, and Aleney Khoo. 2021. Science2Cure: A Clinical Trial Search Prototype. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2620--2624.

Digital Library

[18]

Maceij Rybinski, Sarvnaz Karimi, Vincent Nguyen, and Cecile Paris. 2020. A2A: A platform for research in biomedical literature search. BMC Bioin-formatics 21, 572 (2020).

[19]

Maciej Rybinski, Vincent Nguyen, and Sarvnaz Karimi. 2021. CSIROmed Team Report of TREC 2021 Clinical Trials track: Experiments with BERT Reranking Methods. In TREC. Online.

[20]

M. Simpson, E. Voorhees, and W. Hersh. 2014. Overview of the TREC 2014 Clinical Decision Support Track. In TREC. Gaithersburg, MD.

[21]

Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu, William Merrill, Paul Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Brandon Stilson, Alex D. Wade, Kuansan Wang, Chris Wilhelm, Boya Xie, Douglas Raymond, Daniel S. Weld, Oren Etzioni, and Sebastian Kohlmeier. 2020. CORD-19: The Covid- 19 Open Research Dataset. In ACL NLP-COVID Workshop. Online. https: //arxiv.org/abs/2004.10706

[22]

Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In SIGIR. Tokyo, Japan, 1253--1256.

Digital Library

Cited By

Rybinski MKusa WKarimi SHanbury A(2024)Learning to match patients to clinical trials using large language modelsJournal of Biomedical Informatics10.1016/j.jbi.2024.104734159:COnline publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1016/j.jbi.2024.104734
Tan XWu WLuo CChen HDuh WHuang HKato MMothe JPoblete B(2023)SCHash: Speedy Simplicial Complex Neural Networks via Randomized HashingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591762(1609-1618)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591762

Index Terms

A2A-API: A Prototype for Biomedical Information Retrieval Research and Benchmarking
1. Applied computing
  1. Life and medical sciences
    1. Health informatics
2. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Specialized information retrieval
      1. Structure and multilingual text search
        Chemical and biochemical retrieval

Recommendations

Science2Cure: A Clinical Trial Search Prototype
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

With the advances in precision medicine, identifying clinical trials relevant to a specific patient profile becomes more challenging. Often very specific molecular-level patient features need to be matched for the trial to be deemed relevant. Clinical ...
Will Sorafenib Help?: Treatment-aware Reranking in Precision Medicine Search
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

High-quality evidence from the biomedical literature is crucial for decision making of oncologists who treat cancer patients. Search for evidence on a specific treatment for a patient is the challenge set by the precision medicine track of TREC in 2020. ...
A Self-Learning Resource-Efficient Re-Ranking Method for Clinical Trials Search
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Complex search scenarios, such as those in biomedical settings, can be challenging. One such scenario is matching a patient's profile to relevant clinical trials. There are multiple criteria that should match for a document (clinical trial) to be ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
173
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)4

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rybinski MKusa WKarimi SHanbury A(2024)Learning to match patients to clinical trials using large language modelsJournal of Biomedical Informatics10.1016/j.jbi.2024.104734159:COnline publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1016/j.jbi.2024.104734
Tan XWu WLuo CChen HDuh WHuang HKato MMothe JPoblete B(2023)SCHash: Speedy Simplicial Complex Neural Networks via Randomized HashingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591762(1609-1618)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591762

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten