short-paper

BLINKtextsubscriptLSTM: BioLinkBERT and LSTM based approach for extraction of PICO frame from Clinical Trial Text

Authors:
Madhusudan Ghosh

School of Mathematical and Computational Sciences, Indian Association for the Cultivation of Science, India

School of Mathematical and Computational Sciences, Indian Association for the Cultivation of Science, India

0000-0002-8330-2703
View Profile

,
Shrimon Mukherjee

School of Mathematical and Computational Sciences, Indian Association for the Cultivation of Science, India

School of Mathematical and Computational Sciences, Indian Association for the Cultivation of Science, India

0009-0008-3138-631X
View Profile

,
Payel Santra

School of Mathematical and Computational Sciences, Indian Association for the Cultivation of Science, India

School of Mathematical and Computational Sciences, Indian Association for the Cultivation of Science, India

0009-0005-5721-248X
View Profile

,
Girish Na

School of Mathematical and Computational Sciences, Indian Association for the Cultivation of Science, India

School of Mathematical and Computational Sciences, Indian Association for the Cultivation of Science, India

0009-0009-1836-5992
View Profile

,
Partha Basuchowdhuri

School of Mathematical and Computational Sciences, Indian Association for the Cultivation of Science, India

School of Mathematical and Computational Sciences, Indian Association for the Cultivation of Science, India

0000-0001-7655-7591
View Profile

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)January 2024Pages 227–231https://doi.org/10.1145/3632410.3632442

Published:04 January 2024Publication History

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

Pages 227–231

ABSTRACT

Rapid growth in publication of clinical trial reports has made it extremely challenging to conduct systematic reviews. Automatic extraction of Population, Intervention, Comparator, and Outcome (PICO) from clinical trial reports can alleviate the traditionally adopted time-consuming process of systematic reviews. In this paper, we propose a novel approach for automatically detecting the PICO-related terms from clinical trial reports. Our techniques use BioLinkBERT as our base model with two Bi-LSTM layers. Our proposed model, when trained on either coarse-grained datasets like EBM-NLP, EBM-COMET, and RedHOT, or fine-grained datasets such as EBM-NLPrev and EBM-NLPh, irrespective of the nature of the dataset, achieved state-of-the-art (SOTA) results. We have also shown with our framework that ensembling technique can be highly effective for the token span prediction task and empirical findings demonstrate that our ensemble approach, when applied on EBM-NLP dataset, achieves new state-of-the-art.

References

Micheal Abaho, Danushka Bollegala, Paula Williamson, and Susanna Dodd. 2019. Correcting crowdsourced annotations to improve detection of outcome types in evidence based medicine. In CEUR Workshop Proceedings, Vol. 2429. 1–5.Google Scholar
Micheal Abaho, Danushka Bollegala, Paula R Williamson, and Susanna Dodd. 2022. Assessment of contextualised representations in detecting outcome phrases in clinical trials. arXiv preprint arXiv:2203.03547 (2022).Google Scholar
Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR: An easy-to-use framework for state-of-the-art NLP. In NAACL 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 54–59.Google Scholar
Emily Alsentzer, John Murphy, William Boag, Wei-Hung Weng, Di Jindi, Tristan Naumann, and Matthew McDermott. 2019. Publicly Available Clinical BERT Embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop. ACL, Minneapolis, Minnesota, USA, 72–78.Google ScholarCross Ref
Marco Basaldella, Fangyu Liu, Ehsan Shareghi, and Nigel Collier. 2020. COMETA: A Corpus for Medical Entity Linking in the Social Media. In Proceedings of the 2020 (EMNLP). ACL, Online, 3122–3137.Google ScholarCross Ref
Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 EMNLP. ACL, Hong Kong, China, 3615–3620.Google ScholarCross Ref
Hui Chen, Zijia Lin, Guiguang Ding, Jianguang Lou, Yusen Zhang, and Borje Karlsson. 2019. GRN: Gated relation network to enhance convolutional neural network for named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6236–6243.Google ScholarDigital Library
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the ACL. ACL, Online, 8440–8451.Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 ACL, Volume 1 (Long and Short Papers). ACL, Minneapolis, Minnesota, 4171–4186.Google Scholar
Susanna Dodd, Mike Clarke, Lorne Becker, Chris Mavergames, Rebecca Fish, and Paula R Williamson. 2018. A taxonomy has been developed for outcomes in medical research to help improve knowledge discovery. Journal of clinical epidemiology 96 (2018), 84–92.Google ScholarCross Ref
Shlomo Hoory, Amir Feder, Avichai Tendler, Alon Cohen, Sofia Erell, Itay Laish, Hootan Nakhost, Uri Stemmer, Ayelet Benjamini, Avinatan Hassidim, and Yossi Matias. 2021. Learning and Evaluating a Differentially Private Pre-trained Language Model. In Proceedings of the Third Workshop on Privacy in Natural Language Processing. ACL, Online, 21–29.Google ScholarCross Ref
Xiaoli Huang, Jimmy Lin, and Dina Demner-Fushman. 2006. Evaluation of PICO as a knowledge representation for clinical questions. In AMIA, Vol. 2006. American Medical Informatics Association, 359.Google Scholar
Di Jin and Peter Szolovits. 2018. Pico element detection in medical text via long short-term memory neural networks. In Proceedings of the BioNLP 2018 workshop. 67–75.Google ScholarCross Ref
Su Nam Kim, David Martinez, Lawrence Cavedon, and Lars Yencken. 2011. Automatic classification of sentences to support evidence based medicine. In BMC bioinformatics, Vol. 12. BioMed Central, 1–10.Google Scholar
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.Google ScholarCross Ref
Jiao Li, Yueping Sun, Robin J Johnson, Daniela Sciaky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn J Mattingly, Thomas C Wiegers, and Zhiyong Lu. 2016. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016).Google Scholar
Peng-Hsuan Li, Tsu-Jui Fu, and Wei-Yun Ma. 2020. Why attention? Analyze BiLSTM deficiency and its remedies in the case of NER. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8236–8244.Google ScholarCross Ref
Shifeng Liu, Yifang Sun, Bing Li, Wei Wang, Florence T. Bourgeois, and Adam G. Dunn. 2021. Sent2Span: Span Detection for PICO Extraction in the Biomedical Text without Span Annotations. In Findings of the ACL: EMNLP 2021. ACL, Punta Cana, Dominican Republic, 1705–1715.Google ScholarCross Ref
Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations.Google Scholar
Stephen Mayhew, Gupta Nitish, and Dan Roth. 2020. Robust named entity recognition with truecasing pretraining. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8480–8487.Google ScholarCross Ref
Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain Marshall, Ani Nenkova, and Byron Wallace. 2018. A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature. ACL, Melbourne, Australia.Google Scholar
Amber Stubbs, Christopher Kotfila, and Özlem Uzuner. 2015. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. Journal of biomedical informatics 58 (2015), S11–S19.Google ScholarDigital Library
Chul Sung, Vaibhava Goel, Etienne Marcheret, Steven Rennie, and David Nahamoo. 2021. CNNBiF: CNN-based Bigram Features for Named Entity Recognition. ACL.Google Scholar
Robert Tinn, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2021. Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing. arXiv preprint arXiv:2112.07869 (2021).Google Scholar
Özlem Uzuner, Yuan Luo, and Peter Szolovits. 2007. Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association 14, 5 (2007), 550–563.Google ScholarCross Ref
Somin Wadhwa, Vivek Khetan, Silvio Amir, and Byron Wallace. 2022. RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media. arXiv preprint arXiv:2210.06331 (2022).Google Scholar
Xinyu Wang, Yongliang Shen, Jiong Cai, Tao Wang, Xiaobin Wang, Pengjun Xie, Fei Huang, Weiming Lu, Yueting Zhuang, Kewei Tu, Wei Lu, and Yong Jiang. 2022. DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition. ACL.Google Scholar
Yaqing Wang, Subhabrata Mukherjee, Haoda Chu, Yuancheng Tu, Ming Wu, Jing Gao, and Ahmed Hassan Awadallah. 2020. Adaptive self-training for few-shot neural sequence labeling. arXiv preprint arXiv:2010.03680 (2020).Google Scholar
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).Google Scholar
Michihiro Yasunaga, Jure Leskovec, and Percy Liang. 2022. LinkBERT: Pretraining Language Models with Document Links. In Proceedings of the 60th Annual Meeting of the ACL (Volume 1: Long Papers). ACL, Dublin, Ireland, 8003–8016.Google ScholarCross Ref

Index Terms

BLINKtextsubscriptLSTM: BioLinkBERT and LSTM based approach for extraction of PICO frame from Clinical Trial Text

Index terms have been assigned to the content through auto-classification.

Recommendations

Learning Multimodal Attention LSTM Networks for Video Captioning
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Automatic generation of video caption is a challenging task as video is an information-intensive media with complex variations. Most existing methods, either based on language templates or sequence learning, have treated video as a flat data sequence ...
Read More
Residual attention-based LSTM for video captioning

Recently great success has been achieved by proposing a framework with hierarchical LSTMs in video captioning, such as stacked LSTM networks. When deeper LSTM layers are able to start converging, a degradation problem has been exposed. With the number ...
Read More
Ensembles of Deep LSTM Learners for Activity Recognition using Wearables

Recently, deep learning (DL) methods have been introduced very successfully into human activity recognition (HAR) scenarios in ubiquitous and wearable computing. Especially the prospect of overcoming the need for manual feature design combined with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
January 2024
627 pages
ISBN:9798400716348
DOI:10.1145/3632410
Editors:
Sriraam Natarajan,
Indrajit Bhattacharya,
Richa Singh,
Arun Kumar,
Sayan Ranu,
Kalika Bali,
Abinaya K
Copyright © 2024 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 January 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bio-Medical NER
BioLinkBERT
LSTM
PICO frame extraction
Transfer Learning
Qualifiers
- short-paper
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 42
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

BLINKtextsubscriptLSTM: BioLinkBERT and LSTM based approach for extraction of PICO frame from Clinical Trial Text

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning Multimodal Attention LSTM Networks for Video Captioning

Residual attention-based LSTM for video captioning

Ensembles of Deep LSTM Learners for Activity Recognition using Wearables

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media