Summarization of biomedical articles using domain-specific word embeddings and graph ranking

doi:10.1016/j.jbi.2020.103452

Journal of Biomedical Informatics

Volume 107, July 2020, 103452

https://doi.org/10.1016/j.jbi.2020.103452 Get rights and content

Under an Elsevier user license

open archive

Highlights

•
Domain-specific embeddings are used to model the text as a graph.
•
Graph-ranking is used to extract the most important sentences.
•
Context-sensitive embeddings can effectively make relations between sentences.
•
Our graph-based biomedical summarizer outperforms the other methods.

Abstract

Text summarization tools can help biomedical researchers and clinicians reduce the time and effort needed for acquiring important information from numerous documents. It has been shown that the input text can be modeled as a graph, and important sentences can be selected by identifying central nodes within the graph. However, the effective representation of documents, quantifying the relatedness of sentences, and selecting the most informative sentences are main challenges that need to be addressed in graph-based summarization. In this paper, we address these challenges in the context of biomedical text summarization. We evaluate the efficacy of a graph-based summarizer using different types of context-free and contextualized embeddings. The word representations are produced by pre-training neural language models on large corpora of biomedical texts. The summarizer models the input text as a graph in which the strength of relations between sentences is measured using the domain specific vector representations. We also assess the usefulness of different graph ranking techniques in the sentence selection step of our summarization method. Using the common Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, we evaluate the performance of our summarizer against various comparison methods. The results show that when the summarizer utilizes proper combinations of context-free and contextualized embeddings, along with an effective ranking method, it can outperform the other methods. We demonstrate that the best settings of our graph-based summarizer can efficiently improve the informative content of summaries and decrease the redundancy.

Graphical abstract

Keywords

Natural language processing

Medical text mining

Text summarization

Word embedding

Graph ranking

Deep learning