skip to main content
10.1145/3626772.3661366acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Clinical Trial Retrieval via Multi-grained Similarity Learning

Published: 11 July 2024 Publication History

Abstract

Clinical trial analysis is one of the main business directions and services in IQVIA, and reviewing past similar studies is one of the most critical steps before starting a commercial clinical trial. The current review process is manual and time-consuming, requiring a clinical trial analyst to manually search through an extensive clinical trial database and then review all candidate studies. Therefore, it is of great interest to develop an automatic retrieval algorithm to select similar studies by giving new study information. To achieve this goal, we propose a novel group-based trial similarity learning network named GTSLNet, consisting of two kinds of similarity learning modules. The pair-wise section-level similarity learning module aims to compare the query trial and the candidate trial from the abstract semantic level via the proposed section transformer. Meanwhile, a word-level similarity learning module uses the word similarly matrix to capture the low-level similarity information. Additionally, an aggregation module combines these similarities. To address potential false negatives and noisy data, we introduce a variance-regularized group distance loss function. Experiment results show that the proposed GTSLNet significantly and consistently outperforms state-of-the-art baselines.

References

[1]
Colene Bentley, Sonya Cressman, Kim van der Hoek, Karen Arts, Janet Dancey, and Stuart Peacock. 2019. Conducting clinical trials-costs, impacts, and the value of clinical trials networks: a scoping review. Clinical Trials, Vol. 16, 2 (2019), 183--193.
[2]
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd ICML. 89--96.
[3]
Aurelia Bustos and Antonio Pertusa. 2018. Learning eligibility in cancer clinical trials using deep neural networks. Applied Sciences, Vol. 8, 7 (2018), 1206.
[4]
Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR. 985--988.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[6]
Chengzhen Fu, Enrui Hu, Letian Feng, Zhicheng Dou, Yantao Jia, Lei Chen, Fan Yu, and Zhao Cao. 2022. Leveraging Multi-view Inter-passage Interactions for Neural Document Ranking. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 298--306.
[7]
Junyi Gao, Cao Xiao, Lucas M Glass, and Jimeng Sun. 2020. COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching. In KDD. 803--812.
[8]
Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021. COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3030--3042.
[9]
Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Benjamin Van Durme, and Jamie Callan. 2021. Complement lexical retrieval model with semantic residual embeddings. In European Conference on Information Retrieval. Springer, 146--160.
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[11]
Sebastian Hofstätter, Bhaskar Mitra, Hamed Zamani, Nick Craswell, and Allan Hanbury. 2021. Intra-document cascading: learning to select passages for neural document ranking. In Proceedings of the 44th International ACM SIGIR. 1349--1358.
[12]
Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2019. Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring. In ICLR.
[13]
Silis Y Jiang and Chunhua Weng. 2014. Cross-system evaluation of clinical trial search engines. AMIA Summits on Translational Science Proceedings, Vol. 2014 (2014), 223.
[14]
Tian Kang, Shaodian Zhang, Youlan Tang, Gregory W Hruby, Alexander Rusanov, Noémie Elhadad, and Chunhua Weng. 2017. EliIE: An open-source information extraction system for clinical trial eligibility criteria. AMIA, Vol. 24, 6 (2017), 1062--1071.
[15]
Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR. 39--48.
[16]
Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, and Yingfei Sun. 2020. Parade: Passage representation aggregation for document reranking. arXiv preprint arXiv:2008.09093 (2020).
[17]
Minghan Li and Eric Gaussier. 2021. Keybld: Selecting key blocks with local pre-ranking for long document information retrieval. In Proceedings of the 44th International ACM SIGIR. 2207--2211.
[18]
Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained transformers for text ranking: Bert and beyond. Synthesis Lectures on Human Language Technologies, Vol. 14, 4 (2021), 1--325.
[19]
Junyu Luo, Cheng Qian, Xiaochen Wang, Lucas Glass, and Fenglong Ma. 2023. pADR: Towards Personalized Adverse Drug Reaction Prediction by Modeling Multi-sourced Data. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4724--4730.
[20]
Junyu Luo, Zhi Qiao, Lucas Glass, Cao Xiao, and Fenglong Ma. 2023. ClinicalRisk: A New Therapy-related Clinical Trial Dataset for Predicting Trial Status and Failure Reasons. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 5356--5360.
[21]
Junyu Luo, Cao Xiao, Lucas Glass, Jimeng Sun, and Fenglong Ma. 2021. Fusion: towards automated ICD coding via feature compression. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2096--2101.
[22]
Junyu Luo, Muchao Ye, Cao Xiao, and Fenglong Ma. 2020. Hitanet: Hierarchical time-aware attention networks for risk prediction on electronic health records. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 647--656.
[23]
Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized embeddings for document ranking. In Proceedings of the 42nd international ACM SIGIR. 1101--1104.
[24]
Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-stage document ranking with BERT. arXiv preprint arXiv:1910.14424 (2019).
[25]
Junseok Park, Seongkuk Park, Kwangmin Kim, Woochang Hwang, Sunyong Yoo, Gwan-su Yi, and Doheon Lee. 2020. An interactive retrieval system for clinical trial studies with context-dependent protocol elements. PloS one, Vol. 15, 9 (2020), e0238290.
[26]
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
[27]
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at TREC-3. Nist Special Publication Sp, Vol. 109 (1995), 109.
[28]
Soumyadeep Roy, Koustav Rudra, Nikhil Agrawal, Shamik Sural, and Niloy Ganguly. 2019. Towards an Aspect-Based Ranking Model for Clinical Trial Search. In Computational Data and Social Networks: 8th International Conference, CSoNet 2019, Ho Chi Minh City, Vietnam, November 18-20, 2019, Proceedings. Springer, 209--222.
[29]
Maciej Rybinski, Sarvnaz Karimi, and Aleney Khoo. 2021. Science2Cure: A Clinical Trial Search Prototype. In Proceedings of the 44th International ACM SIGIR. 2620--2624.
[30]
Asba Tasneem, Laura Aberle, Hari Ananth, Swati Chakraborty, Karen Chiswell, Brian J McCourt, and Ricardo Pietrobon. 2012. The database for aggregate analysis of Clinical Trials. gov (AACT) and subsequent regrouping by clinical specialty. PloS one, Vol. 7, 3 (2012), e33677.
[31]
Zifeng Wang and Jimeng Sun. 2022. Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision. arXiv preprint arXiv:2206.14719 (2022).
[32]
Chunhua Weng, Xiaoying Wu, Zhihui Luo, Mary Regina Boland, Dimitri Theodoratos, and Stephen B Johnson. 2011. EliXR: an approach to eligibility criteria extraction and representation. Journal of the American Medical Informatics Association, Vol. 18, Supplement_1 (2011), i116--i124.
[33]
Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. Simple applications of BERT for ad hoc document retrieval. arXiv preprint arXiv:1903.10972 (2019).
[34]
Yingrui Yang, Yifan Qiao, Jinjin Shao, Xifeng Yan, and Tao Yang. 2022. Lightweight composite re-ranking for efficient keyword search with BERT. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1234--1244.
[35]
Chi Yuan, Patrick B Ryan, Casey Ta, Yixuan Guo, Ziran Li, Jill Hardin, Rupa Makadia, Peng Jin, Ning Shang, Tian Kang, et al. 2019. Criteria2Query: a natural language interface to clinical databases for cohort definition. Journal of the American Medical Informatics Association, Vol. 26, 4 (2019), 294--305.
[36]
Xingyao Zhang, Cao Xiao, Lucas M Glass, and Jimeng Sun. 2020. Deepenroll: Patient-trial matching with deep embedding and entailment prediction. In WWW 2020. 1029--1037.
[37]
Ye Zhang and Byron Wallace. 2015. A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015).

Index Terms

  1. Clinical Trial Retrieval via Multi-grained Similarity Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2024
    3164 pages
    ISBN:9798400704314
    DOI:10.1145/3626772
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clinical trial retrieval
    2. deep neural network
    3. similarity learning

    Qualifiers

    • Short-paper

    Conference

    SIGIR 2024
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 163
      Total Downloads
    • Downloads (Last 12 months)163
    • Downloads (Last 6 weeks)44
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media