Intra-axiom redundancies in SNOMED CT

https://doi.org/10.1016/j.artmed.2014.10.003Get rights and content

Highlights

  • We present a method to automatically detect intra-axiom redundancies.

  • In SNOMED CT, intra-axiom redundancies are continuously introduced and removed.

  • In SNOMED CT, a consistent proportion of about 12 overlooked redundancies may result in suboptimal maintenance.

  • Redundancy detection and elimination should be part of terminology maintenance.

Abstract

Objective

Intra-axiom redundancies are elements of concept definitions that are redundant as they are entailed by other elements of the concept definition. While such redundancies are harmless from a logical point of view, they make concept definitions hard to maintain, and they might lead to content-related problems when concepts evolve. The objective of this study is to develop a fully automated method to detect intra-axiom redundancies in OWL 2 EL and apply it to SNOMED Clinical Terms (SNOMED CT).

Materials and methods

We developed a software program in which we implemented, adapted and extended readily existing rules for redundancy elimination. With this, we analysed occurence of redundancy in 11 releases of SNOMED CT (January 2009 to January 2014). We used the ELK reasoner to classify SNOMED CT, and Pellet for explanation of equivalence. We analysed the completeness and soundness of the results by an in-depth examination of the identified redundant elements in the July 2012 release of SNOMED CT. To determine if concepts with redundant elements lead to maintenance issues, we analysed a small sample of solved redundancies.

Results

Analyses showed that the amount of redundantly defined concepts in SNOMED CT is consistently around 35,000. In the July 2012 version of SNOMED CT, 35,010 (12%) of the 296,433 concepts contained redundant elements in their definitions. The results of applying our method are sound and complete with respect to our evaluation. Analysis of solved redundancies suggests that redundancies in concept definitions lead to inadequate maintenance of SNOMED CT.

Conclusions

Our analysis revealed that redundant elements are continuously introduced and removed, and that redundant elements may be overlooked when concept definitions are corrected. Applying our redundancy detection method to remove intra-axiom redundancies from the stated form of SNOMED CT and to point knowledge modellers to newly introduced redundancies can support creating and maintaining a redundancy-free version of SNOMED CT.

Introduction

SNOMED Clinical Terms (SNOMED CT) allows for meaning-based recording and retrieval of clinical information, which thereby becomes (re)usable. One of the advantages of SNOMED CT is its large size and coverage, which on the other hand makes defining new and maintaining existing concepts a challenging task.

Spackman [1] indicated back in 2001 that concept modellers have been uncertain about which elements are inherited from supertypes and therefore do not have to be added explicitly to a concept definition. Such intra-axiom redundancies, i.e. elements that are already entailed by other elements of the concept definition, are harmless from a logical point of view. However, they impede the maintainability of a terminology [2], [3], as they misleadingly suggest that new, meaningful information has been added to a concept.

Moreover, redundant elements might lead to content-related problems when concepts evolve. For example, the rolegroup in the subconcept Thyroid uptake with thyroid stimulation was redundant in the July 2012 version of SNOMED CT, as it repeated a rolegroup already contained in the definition of the superconcept Non-imaging thyroid uptake test, see Example 1.1. In the subsequent version of SNOMED CT, the method Radionuclide imaging was removed from the rolegroup in the superconcept, which makes sense for a concept with the name Non-imaging thyroid uptake test. However, the method was not removed from the rolegroup in the subconcept, as shown in Example 1.2, which is apparently incorrect. In this paper, we inventory redundant elements in SNOMED CT concept definitions.

Example 1.1

Two concept definitions in the July 2012 version of SNOMED CT. The definition of Thyroid uptake with thyroid stimulation contains a redundant element, the rolegroup (RG).

Example 1.2

Definitions of the concepts from Example 1.1 in the January 2013 version of SNOMED CT. The definition of Non-imaging thyroid uptake test has been corrected, but the previously redundant rolegroup is left unchanged.

Section snippets

SNOMED CT concept definitions and rolegroups

SNOMED CT is based on the lightweight Description Logic EL+ [4]. Its concepts are defined by conjunctions of other concepts as well as role-value pairs that are represented as exists restrictions (∃). These exists restrictions can be either ungrouped or grouped in so-called rolegroups “to add clarity to concept definitions. A rolegroup combines an attribute-value pair with one or more other attribute-value pairs. Rolegroups originated to add clarity to Clinical finding concepts which require

Materials and methods

We employed all 11 versions of SNOMED CT that were convertible to OWL, i.e. the January 2009 version to the January 2014 version. We converted these versions with the Perl script that is provided with each release of SNOMED CT. This script makes use of two tables: concepts and stated relationships. The latter faithfully represents the information as it was specified by modellers, and has been released since 2009.

We relied on the high-performance reasoner ELK [7] to classify SNOMED CT, and to

Redundant elements in concept definitions in the July 2012 version

Applying the four rules of redundancy detection on the July 2012 version of SNOMED CT, 35,010 (12%) of the 296,433 concepts were identified to contain redundant elements in their definitions. Table 1 gives an overview of the results, only regarding the first explanation for these redundancies (the rules were applied in the same order as they are presented in this paper). 11,858 of these concepts are fully defined, and 23,152 non-trivially primitive.

Example 4.1

Parenteral form thymoxamine.

Example 4.2

Closed skull

Related and future work

Campbell et al. [13] proposed a semantics-based conflict identification method for the distributed development of logic-based terminologies. Conflicts that can be detected are multiply-defined term conflicts and non-unique definition conflicts. Multiply-defined terms refer to the same term, but do not have the same definitions. They can be sub-classified into semantically-conflicting definitions and semantically equivalent definitions. When the definitions are semantically equivalent, it is

Discussion and conclusions

Our results show that 35,010 (12%) of all 296,433 SNOMED CT concepts of the July 2012 version were defined redundantly. These redundancies unnecessarily impede the work of concept modellers, and ultimately the quality of a terminology. Redundant elements in concept definitions are introduced and solved in comparable amounts in all versions of SNOMED CT between January 2009 and January 2014. On average, about three quarters of the introduced redundancies are caused by changes in definitions of

References (18)

There are more references available in the full text version of this article.

Cited by (0)

1

Ronald Cornet is a member of the Technical Committee of the International Health Terminology Standards Development Organisation (IHTSDO), which publishes SNOMED CT. His position at the IHTSDO, however, had no bearing on the research study or results.

View full text