ABSTRACT
Capturing the semantic similarity between sentences plays a vital role in several primary applications in biomedical and clinical domains: biomedical sentence search, evidence attribution, question-answering and text summarization. In this pilot study, we evaluated the effectiveness of five representative sentence similarity models, ranging from traditional machine learning methods to the latest bidirectional transformers in the clinical domain. The evaluation was performed on a dataset consisting of over 1K sentence pairs from EMRs - the largest public dataset in this domain by far. The results show that embeddings on large biomedical corpora are the most effective methods. It also demonstrates that CNN and BERT are effective to capture sentence similarity under relatively small datasets.
- Chen Q, Du J, Kim S, Wilbur WJ, Lu Z. Combining rich features and deep learning for finding similar sentences in electronic medical records. Proceedings of Biocreative/OHNLP challenge 2018 2018.Google Scholar
- Shao Y. HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate semantic textual similarity. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) 2017:130--3.Google ScholarCross Ref
- Mueller J, Thyagarajan A. Siamese Recurrent Architectures for Learning Sentence Similarity. AAAI 2016;16:2786--92. Google ScholarDigital Library
- Chen Q, Peng Y, Lu Z. BioSentVec: creating sentence embeddings for biomedical texts. arXiv preprint arXiv:1810.09302 2018.Google Scholar
- Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: pre-trained biomedical language representation model for biomedical text mining. arXiv preprint arXiv:1901.08746 2019.Google Scholar
- Wang Y, Afzal N, Fu S, Wang L, Shen F, Rastegar-Mojarad M, et al. MedSTS: A Resource for Clinical Semantic Textual Similarity. arXiv preprint arXiv:1808.09397 2018.Google Scholar
Index Terms
- Evaluation of Five Sentence Similarity Models on Electronic Medical Records
Recommendations
Sentence Similarity Measures Revisited: Ranking Sentences in PubMed Documents
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsWhile various measures are available for computing sentence similarity, few studies have examined their performance in the biomedical domain. Motivated by BIOSSES, an earlier study for biomedical sentence similarity, we here explore the effectiveness of ...
A New Sentence Similarity Method Based on a Three-Layer Sentence Representation
WI-IAT '14: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01Sentence similarity methods are used to assess the degree of likelihood between phrases. Many natural language applications such as text summarization, information retrieval, text categorization, and machine translation employ measures of sentence ...
Towards Human-Machine Collaboration in Creating an Evaluation Corpus for Adverse Drug Events in Discharge Summaries of Electronic Medical Records
Adverse drug events (ADEs) contribute significantly to morbidity and mortality in the healthcare system. The availability of digitalised hospitals' narrative clinical data offers a potentially rich resource to enhance pharmacovigilance efforts to manage ...
Comments