skip to main content
10.1145/3632410.3632442acmotherconferencesArticle/Chapter ViewAbstractPublication PagescomadConference Proceedingsconference-collections
short-paper

BLINKtextsubscriptLSTM: BioLinkBERT and LSTM based approach for extraction of PICO frame from Clinical Trial Text

Authors Info & Claims
Published:04 January 2024Publication History

ABSTRACT

Rapid growth in publication of clinical trial reports has made it extremely challenging to conduct systematic reviews. Automatic extraction of Population, Intervention, Comparator, and Outcome (PICO) from clinical trial reports can alleviate the traditionally adopted time-consuming process of systematic reviews. In this paper, we propose a novel approach for automatically detecting the PICO-related terms from clinical trial reports. Our techniques use BioLinkBERT as our base model with two Bi-LSTM layers. Our proposed model, when trained on either coarse-grained datasets like EBM-NLP, EBM-COMET, and RedHOT, or fine-grained datasets such as EBM-NLPrev and EBM-NLPh, irrespective of the nature of the dataset, achieved state-of-the-art (SOTA) results. We have also shown with our framework that ensembling technique can be highly effective for the token span prediction task and empirical findings demonstrate that our ensemble approach, when applied on EBM-NLP dataset, achieves new state-of-the-art.

References

  1. Micheal Abaho, Danushka Bollegala, Paula Williamson, and Susanna Dodd. 2019. Correcting crowdsourced annotations to improve detection of outcome types in evidence based medicine. In CEUR Workshop Proceedings, Vol. 2429. 1–5.Google ScholarGoogle Scholar
  2. Micheal Abaho, Danushka Bollegala, Paula R Williamson, and Susanna Dodd. 2022. Assessment of contextualised representations in detecting outcome phrases in clinical trials. arXiv preprint arXiv:2203.03547 (2022).Google ScholarGoogle Scholar
  3. Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR: An easy-to-use framework for state-of-the-art NLP. In NAACL 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 54–59.Google ScholarGoogle Scholar
  4. Emily Alsentzer, John Murphy, William Boag, Wei-Hung Weng, Di Jindi, Tristan Naumann, and Matthew McDermott. 2019. Publicly Available Clinical BERT Embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop. ACL, Minneapolis, Minnesota, USA, 72–78.Google ScholarGoogle ScholarCross RefCross Ref
  5. Marco Basaldella, Fangyu Liu, Ehsan Shareghi, and Nigel Collier. 2020. COMETA: A Corpus for Medical Entity Linking in the Social Media. In Proceedings of the 2020 (EMNLP). ACL, Online, 3122–3137.Google ScholarGoogle ScholarCross RefCross Ref
  6. Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 EMNLP. ACL, Hong Kong, China, 3615–3620.Google ScholarGoogle ScholarCross RefCross Ref
  7. Hui Chen, Zijia Lin, Guiguang Ding, Jianguang Lou, Yusen Zhang, and Borje Karlsson. 2019. GRN: Gated relation network to enhance convolutional neural network for named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6236–6243.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the ACL. ACL, Online, 8440–8451.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 ACL, Volume 1 (Long and Short Papers). ACL, Minneapolis, Minnesota, 4171–4186.Google ScholarGoogle Scholar
  10. Susanna Dodd, Mike Clarke, Lorne Becker, Chris Mavergames, Rebecca Fish, and Paula R Williamson. 2018. A taxonomy has been developed for outcomes in medical research to help improve knowledge discovery. Journal of clinical epidemiology 96 (2018), 84–92.Google ScholarGoogle ScholarCross RefCross Ref
  11. Shlomo Hoory, Amir Feder, Avichai Tendler, Alon Cohen, Sofia Erell, Itay Laish, Hootan Nakhost, Uri Stemmer, Ayelet Benjamini, Avinatan Hassidim, and Yossi Matias. 2021. Learning and Evaluating a Differentially Private Pre-trained Language Model. In Proceedings of the Third Workshop on Privacy in Natural Language Processing. ACL, Online, 21–29.Google ScholarGoogle ScholarCross RefCross Ref
  12. Xiaoli Huang, Jimmy Lin, and Dina Demner-Fushman. 2006. Evaluation of PICO as a knowledge representation for clinical questions. In AMIA, Vol. 2006. American Medical Informatics Association, 359.Google ScholarGoogle Scholar
  13. Di Jin and Peter Szolovits. 2018. Pico element detection in medical text via long short-term memory neural networks. In Proceedings of the BioNLP 2018 workshop. 67–75.Google ScholarGoogle ScholarCross RefCross Ref
  14. Su Nam Kim, David Martinez, Lawrence Cavedon, and Lars Yencken. 2011. Automatic classification of sentences to support evidence based medicine. In BMC bioinformatics, Vol. 12. BioMed Central, 1–10.Google ScholarGoogle Scholar
  15. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jiao Li, Yueping Sun, Robin J Johnson, Daniela Sciaky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn J Mattingly, Thomas C Wiegers, and Zhiyong Lu. 2016. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016).Google ScholarGoogle Scholar
  17. Peng-Hsuan Li, Tsu-Jui Fu, and Wei-Yun Ma. 2020. Why attention? Analyze BiLSTM deficiency and its remedies in the case of NER. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8236–8244.Google ScholarGoogle ScholarCross RefCross Ref
  18. Shifeng Liu, Yifang Sun, Bing Li, Wei Wang, Florence T. Bourgeois, and Adam G. Dunn. 2021. Sent2Span: Span Detection for PICO Extraction in the Biomedical Text without Span Annotations. In Findings of the ACL: EMNLP 2021. ACL, Punta Cana, Dominican Republic, 1705–1715.Google ScholarGoogle ScholarCross RefCross Ref
  19. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  20. Stephen Mayhew, Gupta Nitish, and Dan Roth. 2020. Robust named entity recognition with truecasing pretraining. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8480–8487.Google ScholarGoogle ScholarCross RefCross Ref
  21. Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain Marshall, Ani Nenkova, and Byron Wallace. 2018. A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature. ACL, Melbourne, Australia.Google ScholarGoogle Scholar
  22. Amber Stubbs, Christopher Kotfila, and Özlem Uzuner. 2015. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. Journal of biomedical informatics 58 (2015), S11–S19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chul Sung, Vaibhava Goel, Etienne Marcheret, Steven Rennie, and David Nahamoo. 2021. CNNBiF: CNN-based Bigram Features for Named Entity Recognition. ACL.Google ScholarGoogle Scholar
  24. Robert Tinn, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2021. Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing. arXiv preprint arXiv:2112.07869 (2021).Google ScholarGoogle Scholar
  25. Özlem Uzuner, Yuan Luo, and Peter Szolovits. 2007. Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association 14, 5 (2007), 550–563.Google ScholarGoogle ScholarCross RefCross Ref
  26. Somin Wadhwa, Vivek Khetan, Silvio Amir, and Byron Wallace. 2022. RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media. arXiv preprint arXiv:2210.06331 (2022).Google ScholarGoogle Scholar
  27. Xinyu Wang, Yongliang Shen, Jiong Cai, Tao Wang, Xiaobin Wang, Pengjun Xie, Fei Huang, Weiming Lu, Yueting Zhuang, Kewei Tu, Wei Lu, and Yong Jiang. 2022. DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition. ACL.Google ScholarGoogle Scholar
  28. Yaqing Wang, Subhabrata Mukherjee, Haoda Chu, Yuancheng Tu, Ming Wu, Jing Gao, and Ahmed Hassan Awadallah. 2020. Adaptive self-training for few-shot neural sequence labeling. arXiv preprint arXiv:2010.03680 (2020).Google ScholarGoogle Scholar
  29. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).Google ScholarGoogle Scholar
  30. Michihiro Yasunaga, Jure Leskovec, and Percy Liang. 2022. LinkBERT: Pretraining Language Models with Document Links. In Proceedings of the 60th Annual Meeting of the ACL (Volume 1: Long Papers). ACL, Dublin, Ireland, 8003–8016.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. BLINKtextsubscriptLSTM: BioLinkBERT and LSTM based approach for extraction of PICO frame from Clinical Trial Text
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
            January 2024
            627 pages

            Copyright © 2024 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 4 January 2024

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)42
            • Downloads (Last 6 weeks)19

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format