ABSTRACT
Rapid growth in publication of clinical trial reports has made it extremely challenging to conduct systematic reviews. Automatic extraction of Population, Intervention, Comparator, and Outcome (PICO) from clinical trial reports can alleviate the traditionally adopted time-consuming process of systematic reviews. In this paper, we propose a novel approach for automatically detecting the PICO-related terms from clinical trial reports. Our techniques use BioLinkBERT as our base model with two Bi-LSTM layers. Our proposed model, when trained on either coarse-grained datasets like EBM-NLP, EBM-COMET, and RedHOT, or fine-grained datasets such as EBM-NLPrev and EBM-NLPh, irrespective of the nature of the dataset, achieved state-of-the-art (SOTA) results. We have also shown with our framework that ensembling technique can be highly effective for the token span prediction task and empirical findings demonstrate that our ensemble approach, when applied on EBM-NLP dataset, achieves new state-of-the-art.
- Micheal Abaho, Danushka Bollegala, Paula Williamson, and Susanna Dodd. 2019. Correcting crowdsourced annotations to improve detection of outcome types in evidence based medicine. In CEUR Workshop Proceedings, Vol. 2429. 1–5.Google Scholar
- Micheal Abaho, Danushka Bollegala, Paula R Williamson, and Susanna Dodd. 2022. Assessment of contextualised representations in detecting outcome phrases in clinical trials. arXiv preprint arXiv:2203.03547 (2022).Google Scholar
- Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR: An easy-to-use framework for state-of-the-art NLP. In NAACL 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 54–59.Google Scholar
- Emily Alsentzer, John Murphy, William Boag, Wei-Hung Weng, Di Jindi, Tristan Naumann, and Matthew McDermott. 2019. Publicly Available Clinical BERT Embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop. ACL, Minneapolis, Minnesota, USA, 72–78.Google ScholarCross Ref
- Marco Basaldella, Fangyu Liu, Ehsan Shareghi, and Nigel Collier. 2020. COMETA: A Corpus for Medical Entity Linking in the Social Media. In Proceedings of the 2020 (EMNLP). ACL, Online, 3122–3137.Google ScholarCross Ref
- Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 EMNLP. ACL, Hong Kong, China, 3615–3620.Google ScholarCross Ref
- Hui Chen, Zijia Lin, Guiguang Ding, Jianguang Lou, Yusen Zhang, and Borje Karlsson. 2019. GRN: Gated relation network to enhance convolutional neural network for named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6236–6243.Google ScholarDigital Library
- Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the ACL. ACL, Online, 8440–8451.Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 ACL, Volume 1 (Long and Short Papers). ACL, Minneapolis, Minnesota, 4171–4186.Google Scholar
- Susanna Dodd, Mike Clarke, Lorne Becker, Chris Mavergames, Rebecca Fish, and Paula R Williamson. 2018. A taxonomy has been developed for outcomes in medical research to help improve knowledge discovery. Journal of clinical epidemiology 96 (2018), 84–92.Google ScholarCross Ref
- Shlomo Hoory, Amir Feder, Avichai Tendler, Alon Cohen, Sofia Erell, Itay Laish, Hootan Nakhost, Uri Stemmer, Ayelet Benjamini, Avinatan Hassidim, and Yossi Matias. 2021. Learning and Evaluating a Differentially Private Pre-trained Language Model. In Proceedings of the Third Workshop on Privacy in Natural Language Processing. ACL, Online, 21–29.Google ScholarCross Ref
- Xiaoli Huang, Jimmy Lin, and Dina Demner-Fushman. 2006. Evaluation of PICO as a knowledge representation for clinical questions. In AMIA, Vol. 2006. American Medical Informatics Association, 359.Google Scholar
- Di Jin and Peter Szolovits. 2018. Pico element detection in medical text via long short-term memory neural networks. In Proceedings of the BioNLP 2018 workshop. 67–75.Google ScholarCross Ref
- Su Nam Kim, David Martinez, Lawrence Cavedon, and Lars Yencken. 2011. Automatic classification of sentences to support evidence based medicine. In BMC bioinformatics, Vol. 12. BioMed Central, 1–10.Google Scholar
- Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.Google ScholarCross Ref
- Jiao Li, Yueping Sun, Robin J Johnson, Daniela Sciaky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn J Mattingly, Thomas C Wiegers, and Zhiyong Lu. 2016. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016).Google Scholar
- Peng-Hsuan Li, Tsu-Jui Fu, and Wei-Yun Ma. 2020. Why attention? Analyze BiLSTM deficiency and its remedies in the case of NER. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8236–8244.Google ScholarCross Ref
- Shifeng Liu, Yifang Sun, Bing Li, Wei Wang, Florence T. Bourgeois, and Adam G. Dunn. 2021. Sent2Span: Span Detection for PICO Extraction in the Biomedical Text without Span Annotations. In Findings of the ACL: EMNLP 2021. ACL, Punta Cana, Dominican Republic, 1705–1715.Google ScholarCross Ref
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations.Google Scholar
- Stephen Mayhew, Gupta Nitish, and Dan Roth. 2020. Robust named entity recognition with truecasing pretraining. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8480–8487.Google ScholarCross Ref
- Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain Marshall, Ani Nenkova, and Byron Wallace. 2018. A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature. ACL, Melbourne, Australia.Google Scholar
- Amber Stubbs, Christopher Kotfila, and Özlem Uzuner. 2015. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. Journal of biomedical informatics 58 (2015), S11–S19.Google ScholarDigital Library
- Chul Sung, Vaibhava Goel, Etienne Marcheret, Steven Rennie, and David Nahamoo. 2021. CNNBiF: CNN-based Bigram Features for Named Entity Recognition. ACL.Google Scholar
- Robert Tinn, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2021. Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing. arXiv preprint arXiv:2112.07869 (2021).Google Scholar
- Özlem Uzuner, Yuan Luo, and Peter Szolovits. 2007. Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association 14, 5 (2007), 550–563.Google ScholarCross Ref
- Somin Wadhwa, Vivek Khetan, Silvio Amir, and Byron Wallace. 2022. RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media. arXiv preprint arXiv:2210.06331 (2022).Google Scholar
- Xinyu Wang, Yongliang Shen, Jiong Cai, Tao Wang, Xiaobin Wang, Pengjun Xie, Fei Huang, Weiming Lu, Yueting Zhuang, Kewei Tu, Wei Lu, and Yong Jiang. 2022. DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition. ACL.Google Scholar
- Yaqing Wang, Subhabrata Mukherjee, Haoda Chu, Yuancheng Tu, Ming Wu, Jing Gao, and Ahmed Hassan Awadallah. 2020. Adaptive self-training for few-shot neural sequence labeling. arXiv preprint arXiv:2010.03680 (2020).Google Scholar
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).Google Scholar
- Michihiro Yasunaga, Jure Leskovec, and Percy Liang. 2022. LinkBERT: Pretraining Language Models with Document Links. In Proceedings of the 60th Annual Meeting of the ACL (Volume 1: Long Papers). ACL, Dublin, Ireland, 8003–8016.Google ScholarCross Ref
Index Terms
- BLINKtextsubscriptLSTM: BioLinkBERT and LSTM based approach for extraction of PICO frame from Clinical Trial Text
Recommendations
Learning Multimodal Attention LSTM Networks for Video Captioning
MM '17: Proceedings of the 25th ACM international conference on MultimediaAutomatic generation of video caption is a challenging task as video is an information-intensive media with complex variations. Most existing methods, either based on language templates or sequence learning, have treated video as a flat data sequence ...
Residual attention-based LSTM for video captioning
Recently great success has been achieved by proposing a framework with hierarchical LSTMs in video captioning, such as stacked LSTM networks. When deeper LSTM layers are able to start converging, a degradation problem has been exposed. With the number ...
Ensembles of Deep LSTM Learners for Activity Recognition using Wearables
Recently, deep learning (DL) methods have been introduced very successfully into human activity recognition (HAR) scenarios in ubiquitous and wearable computing. Especially the prospect of overcoming the need for manual feature design combined with ...
Comments