short-paper

Clinical Trial Retrieval via Multi-grained Similarity Learning

Authors:

Fenglong MaAuthors Info & Claims

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2950 - 2954

https://doi.org/10.1145/3626772.3661366

Published: 11 July 2024 Publication History

Abstract

Clinical trial analysis is one of the main business directions and services in IQVIA, and reviewing past similar studies is one of the most critical steps before starting a commercial clinical trial. The current review process is manual and time-consuming, requiring a clinical trial analyst to manually search through an extensive clinical trial database and then review all candidate studies. Therefore, it is of great interest to develop an automatic retrieval algorithm to select similar studies by giving new study information. To achieve this goal, we propose a novel group-based trial similarity learning network named GTSLNet, consisting of two kinds of similarity learning modules. The pair-wise section-level similarity learning module aims to compare the query trial and the candidate trial from the abstract semantic level via the proposed section transformer. Meanwhile, a word-level similarity learning module uses the word similarly matrix to capture the low-level similarity information. Additionally, an aggregation module combines these similarities. To address potential false negatives and noisy data, we introduce a variance-regularized group distance loss function. Experiment results show that the proposed GTSLNet significantly and consistently outperforms state-of-the-art baselines.

References

[1]

Colene Bentley, Sonya Cressman, Kim van der Hoek, Karen Arts, Janet Dancey, and Stuart Peacock. 2019. Conducting clinical trials-costs, impacts, and the value of clinical trials networks: a scoping review. Clinical Trials, Vol. 16, 2 (2019), 183--193.

[2]

Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd ICML. 89--96.

Digital Library

[3]

Aurelia Bustos and Antonio Pertusa. 2018. Learning eligibility in cancer clinical trials using deep neural networks. Applied Sciences, Vol. 8, 7 (2018), 1206.

[4]

Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR. 985--988.

Digital Library

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[6]

Chengzhen Fu, Enrui Hu, Letian Feng, Zhicheng Dou, Yantao Jia, Lei Chen, Fan Yu, and Zhao Cao. 2022. Leveraging Multi-view Inter-passage Interactions for Neural Document Ranking. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 298--306.

Digital Library

[7]

Junyi Gao, Cao Xiao, Lucas M Glass, and Jimeng Sun. 2020. COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching. In KDD. 803--812.

[8]

Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021. COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3030--3042.

[9]

Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Benjamin Van Durme, and Jamie Callan. 2021. Complement lexical retrieval model with semantic residual embeddings. In European Conference on Information Retrieval. Springer, 146--160.

Digital Library

[10]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

[11]

Sebastian Hofstätter, Bhaskar Mitra, Hamed Zamani, Nick Craswell, and Allan Hanbury. 2021. Intra-document cascading: learning to select passages for neural document ranking. In Proceedings of the 44th International ACM SIGIR. 1349--1358.

Digital Library

[12]

Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2019. Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring. In ICLR.

[13]

Silis Y Jiang and Chunhua Weng. 2014. Cross-system evaluation of clinical trial search engines. AMIA Summits on Translational Science Proceedings, Vol. 2014 (2014), 223.

[14]

Tian Kang, Shaodian Zhang, Youlan Tang, Gregory W Hruby, Alexander Rusanov, Noémie Elhadad, and Chunhua Weng. 2017. EliIE: An open-source information extraction system for clinical trial eligibility criteria. AMIA, Vol. 24, 6 (2017), 1062--1071.

[15]

Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR. 39--48.

Digital Library

[16]

Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, and Yingfei Sun. 2020. Parade: Passage representation aggregation for document reranking. arXiv preprint arXiv:2008.09093 (2020).

[17]

Minghan Li and Eric Gaussier. 2021. Keybld: Selecting key blocks with local pre-ranking for long document information retrieval. In Proceedings of the 44th International ACM SIGIR. 2207--2211.

Digital Library

[18]

Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained transformers for text ranking: Bert and beyond. Synthesis Lectures on Human Language Technologies, Vol. 14, 4 (2021), 1--325.

[19]

Junyu Luo, Cheng Qian, Xiaochen Wang, Lucas Glass, and Fenglong Ma. 2023. pADR: Towards Personalized Adverse Drug Reaction Prediction by Modeling Multi-sourced Data. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4724--4730.

Digital Library

[20]

Junyu Luo, Zhi Qiao, Lucas Glass, Cao Xiao, and Fenglong Ma. 2023. ClinicalRisk: A New Therapy-related Clinical Trial Dataset for Predicting Trial Status and Failure Reasons. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 5356--5360.

Digital Library

[21]

Junyu Luo, Cao Xiao, Lucas Glass, Jimeng Sun, and Fenglong Ma. 2021. Fusion: towards automated ICD coding via feature compression. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2096--2101.

[22]

Junyu Luo, Muchao Ye, Cao Xiao, and Fenglong Ma. 2020. Hitanet: Hierarchical time-aware attention networks for risk prediction on electronic health records. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 647--656.

Digital Library

[23]

Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized embeddings for document ranking. In Proceedings of the 42nd international ACM SIGIR. 1101--1104.

Digital Library

[24]

Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-stage document ranking with BERT. arXiv preprint arXiv:1910.14424 (2019).

[25]

Junseok Park, Seongkuk Park, Kwangmin Kim, Woochang Hwang, Sunyong Yoo, Gwan-su Yi, and Doheon Lee. 2020. An interactive retrieval system for clinical trial studies with context-dependent protocol elements. PloS one, Vol. 15, 9 (2020), e0238290.

[26]

Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).

[27]

Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at TREC-3. Nist Special Publication Sp, Vol. 109 (1995), 109.

[28]

Soumyadeep Roy, Koustav Rudra, Nikhil Agrawal, Shamik Sural, and Niloy Ganguly. 2019. Towards an Aspect-Based Ranking Model for Clinical Trial Search. In Computational Data and Social Networks: 8th International Conference, CSoNet 2019, Ho Chi Minh City, Vietnam, November 18-20, 2019, Proceedings. Springer, 209--222.

[29]

Maciej Rybinski, Sarvnaz Karimi, and Aleney Khoo. 2021. Science2Cure: A Clinical Trial Search Prototype. In Proceedings of the 44th International ACM SIGIR. 2620--2624.

Digital Library

[30]

Asba Tasneem, Laura Aberle, Hari Ananth, Swati Chakraborty, Karen Chiswell, Brian J McCourt, and Ricardo Pietrobon. 2012. The database for aggregate analysis of Clinical Trials. gov (AACT) and subsequent regrouping by clinical specialty. PloS one, Vol. 7, 3 (2012), e33677.

[31]

Zifeng Wang and Jimeng Sun. 2022. Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision. arXiv preprint arXiv:2206.14719 (2022).

[32]

Chunhua Weng, Xiaoying Wu, Zhihui Luo, Mary Regina Boland, Dimitri Theodoratos, and Stephen B Johnson. 2011. EliXR: an approach to eligibility criteria extraction and representation. Journal of the American Medical Informatics Association, Vol. 18, Supplement_1 (2011), i116--i124.

[33]

Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. Simple applications of BERT for ad hoc document retrieval. arXiv preprint arXiv:1903.10972 (2019).

[34]

Yingrui Yang, Yifan Qiao, Jinjin Shao, Xifeng Yan, and Tao Yang. 2022. Lightweight composite re-ranking for efficient keyword search with BERT. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1234--1244.

Digital Library

[35]

Chi Yuan, Patrick B Ryan, Casey Ta, Yixuan Guo, Ziran Li, Jill Hardin, Rupa Makadia, Peng Jin, Ning Shang, Tian Kang, et al. 2019. Criteria2Query: a natural language interface to clinical databases for cohort definition. Journal of the American Medical Informatics Association, Vol. 26, 4 (2019), 294--305.

[36]

Xingyao Zhang, Cao Xiao, Lucas M Glass, and Jimeng Sun. 2020. Deepenroll: Patient-trial matching with deep embedding and entailment prediction. In WWW 2020. 1029--1037.

Digital Library

[37]

Ye Zhang and Byron Wallace. 2015. A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015).

Index Terms

Clinical Trial Retrieval via Multi-grained Similarity Learning
1. Applied computing
  1. Life and medical sciences
    1. Health informatics

Recommendations

Learning similarity with cosine similarity ensemble

This paper proposes a cosine similarity ensemble (CSE) method to learn similarity.CSE is a selective ensemble and combines multiple cosine similarity learners.A learner redefines the pattern vectors and determines its threshold adaptively.Experimental ...
Multi-view Similarity Learning of Manifold Data
Image and Graphics
Abstract
In recent years, multi-view learning methods have developed rapidly where graph-based approaches have achieved good performance. Usually, these learning methods construct information graph for each view or fuse different views into one graph. In ...
Multiperspective Graph-Theoretic Similarity Measure
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Determining the similarity between two objects is pertinent to many applications. When the basis for similarity is a set of object-to-object relationships, it is natural to rely on graph-theoretic measures. One seminal technique for measuring the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2024

3164 pages

ISBN:9798400704314

DOI:10.1145/3626772

General Chairs:
Grace Hui Yang
Georgetown University, USA
,
Hongning Wang
Tsinghua University, China
,
Sam Han
The Washington Post, USA
,
Program Chairs:
Claudia Hauff
Spotify, Netherlands
,
Guido Zuccon
The University of Queensland, Australia
,
Yi Zhang
University of California Santa Cruz, USA

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR 2024

Sponsor:

SIGIR

SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 14 - 18, 2024

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
163
Total Downloads

Downloads (Last 12 months)163
Downloads (Last 6 weeks)44

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten