skip to main content
10.1145/3307339.3343239acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster

Evaluation of Five Sentence Similarity Models on Electronic Medical Records

Authors Info & Claims
Published:04 September 2019Publication History

ABSTRACT

Capturing the semantic similarity between sentences plays a vital role in several primary applications in biomedical and clinical domains: biomedical sentence search, evidence attribution, question-answering and text summarization. In this pilot study, we evaluated the effectiveness of five representative sentence similarity models, ranging from traditional machine learning methods to the latest bidirectional transformers in the clinical domain. The evaluation was performed on a dataset consisting of over 1K sentence pairs from EMRs - the largest public dataset in this domain by far. The results show that embeddings on large biomedical corpora are the most effective methods. It also demonstrates that CNN and BERT are effective to capture sentence similarity under relatively small datasets.

References

  1. Chen Q, Du J, Kim S, Wilbur WJ, Lu Z. Combining rich features and deep learning for finding similar sentences in electronic medical records. Proceedings of Biocreative/OHNLP challenge 2018 2018.Google ScholarGoogle Scholar
  2. Shao Y. HCTI at SemEval-2017 Task 1: Use convolutional neural network to evaluate semantic textual similarity. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) 2017:130--3.Google ScholarGoogle ScholarCross RefCross Ref
  3. Mueller J, Thyagarajan A. Siamese Recurrent Architectures for Learning Sentence Similarity. AAAI 2016;16:2786--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chen Q, Peng Y, Lu Z. BioSentVec: creating sentence embeddings for biomedical texts. arXiv preprint arXiv:1810.09302 2018.Google ScholarGoogle Scholar
  5. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: pre-trained biomedical language representation model for biomedical text mining. arXiv preprint arXiv:1901.08746 2019.Google ScholarGoogle Scholar
  6. Wang Y, Afzal N, Fu S, Wang L, Shen F, Rastegar-Mojarad M, et al. MedSTS: A Resource for Clinical Semantic Textual Similarity. arXiv preprint arXiv:1808.09397 2018.Google ScholarGoogle Scholar

Index Terms

  1. Evaluation of Five Sentence Similarity Models on Electronic Medical Records

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
        September 2019
        716 pages
        ISBN:9781450366663
        DOI:10.1145/3307339

        Copyright © 2019 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 September 2019

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        BCB '19 Paper Acceptance Rate42of157submissions,27%Overall Acceptance Rate254of885submissions,29%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader