Abstract:
Topic extraction from Electronic Health Records is a sensitive step of the knowledge extraction process. As the meaning of a discourse can be completely distorted by nega...Show MoreMetadata
Abstract:
Topic extraction from Electronic Health Records is a sensitive step of the knowledge extraction process. As the meaning of a discourse can be completely distorted by negations, the correct identification of terms vs negated terms is manda ton". Our work is an attempt of automated negation identification in unstructured health records. We analyzed a corpus of medical documents containing 5103 sentences and we found that while adverbs have a distribution of 3%, the negation covers almost 2% of the words used in the corpus, justifying an in depth analysis of negation. The main contribution of the paper addresses the existing drawback of negation identification approaches in the literature that do not consider negation represented with negation prefixes. In this paper we address the tasks of syntactic and morphologic negation identification. In order to identify morphologic negation we propose the PreNex algorithm that consists in breaking down the terms into prefix and root word and the analysis of the root's validity using additional available resources (WordNet). The syntactic negation identification relies on a pattern matching approach where the negated concepts are identified based on a predefined Ust of negation identifiers. The results we obtained are promising and ensure a reliable negation identification approach for medical documents. We report a precision of 92.62% and recall of 93.60% in case of the morphologic negation identification and an overall performance in the morphologic and syntactic negation identification of 95.96% precision and 94.23% recall.
Date of Conference: 22-24 May 2014
Date Added to IEEE Xplore: 17 July 2014
ISBN Information: