The MedRed Ontology for Representing Clinical Data Acquisition Metadata

Calbimonte, Jean-Paul; Dubosson, Fabien; Hilfiker, Roger; Cotting, Alexandre; Schumacher, Michael

doi:10.1007/978-3-319-68204-4_4

The MedRed Ontology for Representing Clinical Data Acquisition Metadata

Jean-Paul Calbimonte²¹,
Fabien Dubosson²¹,
Roger Hilfiker²²,
Alexandre Cotting²¹ &
…
Michael Schumacher²¹

Conference paper
First Online: 04 October 2017

2279 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10588))

Abstract

Electronic Data Capture (EDC) software solutions are progressively being adopted for conducting clinical trials and studies, carried out by biomedical, pharmaceutical and health-care research teams. In this paper we present the MedRed Ontology, whose goal is to represent the metadata of these studies, using well-established standards, and reusing related vocabularies to describe essential aspects, such as validation rules, composability, or provenance. The paper describes the design principles behind the ontology and how it relates to existing models and formats used in the industry. We also reuse well-known vocabularies and W3C recommendations. Furthermore, we have validated the ontology with existing clinical studies in the context of the MedRed project, as well as a collection of metadata of well-known studies. Finally, we have made the ontology available publicly following best practices and vocabulary sharing guidelines.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Clinical research activities require the involvement of heterogeneous individuals of a given population, needed to assess and validate biomedical hypotheses concerning behavior, treatments, interventions and other studies. Clinical trials and other such studies can be complex and span long periods of time, and the data acquisition process requires careful management and accuracy. Although in the past, manually filled forms were the norm for acquiring data in this context, nowadays the use of Electronic Data Capture (EDC) solutions has shown to improve the efficiency of the process, while maintaining quality and accuracy standards [3, 19]. In particular, EDC helps reducing and/or eliminating data transcription and transmission times, providing data validation and input enforcement, or helping scheduling the site visits [5, 7]. Furthermore, EDC provides faster access to data in running studies, which can help to perform live-analytics over the acquired datasets. Due to these benefits, clinical research organizations, pharmaceutical companies, and university hospitals, among others, make use of EDC and related systems such as OpenClinica, REDCap, TrialDB, InForm, Medidata Rave or Datatrak [16].

As an example, consider an osteoarthritis study performed by the Physiotherapy Lab at HES-SO Valais-Wallis, on the local population. The implementation of this study may include the usage of several instruments, such as questionnaires over a selected group of patients, each of which contains several sections, questions, and variables to be annotated and recorded. The study can be divided in different arms where diverse methods are applied for comparison purposes; and furthermore, it can be split in repeated events over time, using similar instruments for evolution tracking. Such study could reuse well known and validated instruments, such as the HOOS Hip survey [14], or extend it with additional instruments, sections, and variables.

Given the large number of clinical studies that are performed worldwide, and their complexity, it has become a need to share their results, as well as their structure and metadata. This would enable: validating existing protocols, reusing and refining clinical research instruments, extending previous studies, performing surveys and systematic analytics of clinical trials, etc. However, to achieve this, it is first necessary to tackle the heterogeneity issues regarding the description and representation of these studies. The most used format for representing studies in EDC software, ODM (Operational Data Model) [9], lacks a semantically-rich model able to address the aforementioned challenges, and is therefore insufficient as a foundational model for achieving semantic interoperability for clinical studies and trials.

In this paper we present the MedRed Ontology, a semantically-rich model designed to represent the metadata of clinical studies, including the definition of its constituting instruments, the different steps of each one, their organization in arms and events, as well as the data variables captured using them. Thanks to its integration with existing vocabularies (PROV-O [11] and P-Plan [6]), the MedRed ontology can also capture complex relationships among instruments and studies, including composition, derivation, authoring, and versioning. These features make it possible to track changes of a study across time, or to indicate that a study was designed based on an existing one. MedRed also includes the representation of validation conditions on the clinical instruments, using the SHACL language [8] for representing constraints. The MedRed Ontology has been validated using pilot studies led by the Institute of Health of HES-SO Valais-Wallis, in the context of the MedRed data lifecycle project^{Footnote 1}. It has also been applied to a heterogeneous collection of study metadata descriptions extracted from the REDCap [7] library of health studies and instruments. Finally, MedRed has been made publicly available under standard formats, on a permanent URL, and following ontology publication guidelines.

2 Related Work

Ontologies for clinical studies have been developed in recent years, typically focusing on the description of different types of studies, including taxonomies and classifications [17]. The OBO Foundry [18] contains several biomedical ontologies, some of which are related to the description of studies. Examples include the Ontology for Biomedical Investigations, Clinical Measurement Ontology, and the Informed Consent Ontology. However, these are more specific to biomedical document descriptions, measurements, and consent information, respectively. The Bioportal repository also contains relevant ontologies, e.g. Clinical Trials Ontology, which contains a large vocabulary of clinical trial types. Other ontologies in Bioportal (e.g. MESH, SNOMED, HL7) include general references to clinical study concepts, but do not provide detailed descriptions of them.

Clinical Data capture software are widely used today as a backbone technology for data acquisition in research studies. Professional tools include OpenClinica, REDCap, CancerGrid, InForm, Datatrak, Medidata Rave, etc. [4, 7]. Significant efforts have been made to agree on standards for clinical studies, and the ODM (Operational Data Model) [9] proposed by CDISC^{Footnote 2} has been adopted by several regulating bodies and also EDC software tools. Based on XML, ODM serves as a communication interface of clinical study data, but it lacks a semantically-rich model able to capture the different relationships among the different components of a clinical study, as well as linking with other standard vocabularies. Recent works [12] developed approaches for semantic annotation of ODM XML export files, using extensions to the RDF DataCube vocabulary. Other efforts [13] have also tried to achieve semantic integration of clinical data management systems, by integrating ODM and the HL7 FHIR standard. Up to now, the ODM specifications are regarded as the reference for data interchange for these systems, although they lack several features as explained in Sect. 3. Even if there were some attempts to provide semantic annotations for ODM [3, 10], there is yet no comprehensive ontology that incorporate the aspects covered in this work.

3 Design Principles

The MedRed Ontology design is founded on the representation of a generic clinical study, understood as a collection of data acquisition instruments. In the following we present the design principles behind the ontology, namely the structure of the core model, and the fundamental features of composition, derivation, provenance, and validation.

Core Model. According to the ODM model of CDISC [9], a Study has a metadata version element in which the different definitions of its sub-elements are contained, i.e. a Form, Item, and Item Group definition. These commonly materialize as instrument, question and section definitions, respectively, in a questionnaire-based instrument. Taking this model as a starting point, the MedRed ontology first separates the metadata versioning aspects out of the core model, as this is a cross-cutting consideration. A MedRed Study is indeed composed of one or more Instruments, each of which has an ordered sequence of steps, modeled as Item elements. Different kinds of Items exist, such as Question, Information, or Operation items. Different sub-classes of Instrument may exist, such as a questionnaire, or case form, etc. Items may be grouped in Sections, providing a logical and nestable organization to the items of the instrument. Each Item identifies its previous item in the sequence, and they may be subject to conditional activation to allow branching logic in a sequence of steps. For each Item a corresponding Variable can be specified, which represents the data that will be captured (e.g. via a question or form entry). Variables are associated to data types, and constraints can be defined upon them, e.g. allowed values, rules, etc. Moreover, a Study can be organized in different Arms, or branches that focus on a particular characteristic for comparison or testing purposes (e.g. different arms for testing different drugs in parallel). MedRed also allows defining events that can help representing longitudinal studies, where different instruments are used over longer periods of time (e.g. demographics at the beginning of the study, a first set of instruments after 3 months, another set 2 months later, etc.).

Composition. The ability to compose studies and instruments using other items and elements is crucial for the MedRed metadata model. For instance, it is possible to combine different existing instruments from other studies in a new one. Similarly, it is possible to combine questions and items of several instruments to elaborate a new sequence of input items for an instrument. This should allow the reuse of existing metadata and studies that have already been successfully implemented, preventing from reinventing the wheel. A generic model that was created with the purpose of representing a sequence of scientific activities in a plan is the P-Plan ontology [6]. Introducing the basic concepts of Plan and Step, it allows nesting and constructing different structures of planned items. For this reason, it was chosen as a basis for structuring items and instruments in MedRed, allowing very flexible composition designs.

Derivation. Reusing instruments and items from existing studies also implies that one can be derived from others. One instrument can be amended or extended according to the needs of a different context (e.g. a new study on a different population), by adding new questions or modifying their validation rules, possible values, etc. The representation of this information helps keeping trace of these relationships, as exemplified in Fig. 1.

Provenance. As all studies, instruments, and items can bee seen as traceable resources (or entities according to the PROV model [11]), MedRed allows keeping record of provenance information, including attribution, versioning, authorship, etc. The PROV-O ontology [11] has precisely been defined for this purpose, and as such, we have chosen to align the MedRed core concepts with this model, so that this type of information can be recorded accordingly. For instance, as shown in Fig. 2, this allows indicating specialization, revision, source, attribution, and other related information.

Validation. In the context of clinical data capture, it is essential to guarantee certain data quality standards, and validation is crucial for defining effective instruments. MedRed opts for reusing existing constraint representation languages in order to incorporate notions of validation into the model. These validation rules should allow flexible definitions, from simple value ranges, to complex pattern matching and combinations of complex rules (e.g. answer to a cholesterol question should be a double value lower than 300 mg/dl.). For this reason, we opted for integrating shape properties, from the SHACL W3C recommendation language [8] for constraints.

4 Implementation

Following the design principles stated above, the MedRed ontology was implemented in the OWL language, using the Protégé development environment (19 classes, 12 object and 5 datatype properties). As specified in Sect. 3, the core model includes the fundamental concepts behind a clinical study: the Study itself, the definition of the Instrument items that compose it, at its inner sub-elements: Section, Item, Operation, as well as other elements as a study Arm and StudyEvent. It has been necessary to cover at least those concepts described in the ODM meta-model to guarantee a minimal compliance with that standard. Furthermore, MedRed goes beyond ODM, as it extends the P-Plan ontology [6] to incorporate nesting and composing of items in a given instrument (Step and MultiStep in P-Plan).

Given that P-Plan extends the PROV-O model, each instrument and item definition is itself a traceable entity, which can be annotated according to the PROV model, including versions, derivative instruments, etc., which are indeed common for studies that evolve with time and that reuse previous instruments. MedRed also aligns to the DDI-RDF vocabulary [2] for describing scientific metadata, as it includes concepts such as Instrument and Questionnaire. Also, for the validation of data acquisition items, MedRed reuses property paths from the SHACL vocabulary [8], which are specifically designed to represent this type of constraints. These dependencies are depicted in Fig. 3.

The central concepts in MedRed (see Fig. 4), as explained above, surround those of an Instrument and Item. Subclasses of these allow for a further specialization of the type of study (e.g. based on questionnaires, entry forms, etc.), or other extensions for more specific uses. The unique identification of each of these items is a fundamental principle for allowing referencing and composing new instruments based on existing ones, therefore meeting the design principles of Sect. 3. Moreover, the inclusion of the Section concepts allows an unrestricted number of levels and nesting of instrument items, which allows a modular organization of the clinical study.

The salient points of the implementation can be explained through the following examples^{Footnote 3}. The example in Listing 1 shows a 3-month follow-up study definition, including six instruments: one for collecting demographics, another for base line data, 3 monthly questionnaires and a final completion instrument.

Each of these instruments can also be fully described, e.g. in terms of their constituent Item elements, as in Listing 2. The instrument is organized in different sections and may include provenance information including authoring, related publications, revisions, etc.

In fact, all components of the study (and instrument) can be annotated with provenance information in order to capture how and when they were defined. In the following examples we omit provenance due to space constraints. In Listing 3 a specific item is described, in this case a question from the previous instrument. The question and its text, the associated variable, and possible display choices, are defined at this point.

Furthermore, the variable associated to a question (or any Item) can be specified, along with validation rules expressed using SHACL, as in Listing 4. A Cholesterol value is specified, and minimal and maximal values are indicated using a SHACL shape.

5 Exploitation and Discussion

The MedRed Ontology is currently used to represent the metadata of real instruments used in several pilot projects carried out at HES-SO Valais-Wallis, led by the Institute of Health Sciences, and in the scope of the MedRed project. The MedRed project aims at providing an institutional data acquisition platform, mainly targeting clinical data capture. All studies’ metadata and their corresponding instruments will be represented in RDF using the ontology, including the entire description of its elements, branching logic, validation, variables, data types, etc. Furthermore, to show the applicability of the ontology to a wider range of clinical data instruments, we have taken a sample of more than thirty instruments from the shared library of REDCap^{Footnote 4}, collected by the REDCap project for research purposes from studies all over the world. The full list of instruments used for this experiments can be found in the project source page^{Footnote 5}. A summary, including three of the finished MedRed pilot projects is illustrated in the table of Fig. 5. It showcases the heterogeneity of the studies and the features that we covered with the MedRed ontology.

Concerning the availability of the ontology, it has been published through a permanent URI: http://w3id.org/medred/medred, under a CC-BY 4.0 license. The ontology is also referenced through Zenodo, with a DOI assigned to it^{Footnote 6}. The documentation for the ontology has been prepared using the Ontoology [1] framework, and it has also been checked using the OOPS! pitfall scanner service [15]. The latter has only reported minor issues, mainly for the imported ontologies (Oops! report available in the Github repository). The ontology has been made available and discoverable through the Linked Open Vocabularies (LOV) repository , widely used as a reference site for finding vocabularies. Regarding the sustainability of the ontology, it is maintained in an initial phase by the MedRed project. Afterwards, the MedRed platform is expected to function under a business plan similar to that of a Clinical Trial unit, which would consequently guarantee support for the ontology and other related information resources.

6 Conclusion

We presented the MedRed ontology for capturing metadata of clinical studies, following a set of design principles, and extending well-known recommendations. We made it available publicly following best practices and we have shown it fits well for a heterogeneous set of existing instruments. The ontology will be maintained by the MedRed data acquisition project, and in the long term, its growing community.

Notes

1.
MedRed Project: http://w3id.org/medred/project.
2.
CDISC (Clinical Data Interchange Standards Consortium): http://cdisc.org.
3.
Prefixes are used as defined in http://prefix.cc. medred is used for the MedRed Ontology.
4.
https://projectredcap.org/resources/library/.
5.
Instruments used to validate the MedRed ontology: https://github.com/jpcik/medred.
6.
MedRed Zendoo DOI:https://doi.org/10.5281/zenodo.819875.

References

Alobaid, A., Garijo, D., Poveda-Villalón, M., Pérez, I.S., Corcho, O.: OnToology, a tool for collaborative development of ontologies. In: ICBO (2015)
Google Scholar
Bosch, T., Cyganiak, R., Gregory, A., Wackerow, J.: DDI-RDF discovery vocabulary: a metadata vocabulary for documenting research and survey data. In: LDOW (2013)
Google Scholar
Bruland, P., Breil, B., et al.: Interoperability in clinical research: from metadata registries to semantically annotated CDISC ODM. Stud. Health Technol. Inf. 180, 564–568 (2012)
Google Scholar
Davies, J., Gibbons, J., Harris, S., Crichton, C.: The cancergrid experience: metadata-based model-driven engineering for clinical trials. Sci. Comput. Program. 89, 126–143 (2014)
Article Google Scholar
El Emam, K., Jonker, E., Sampson, M., Krleža-Jerić, K., Neisa, A.: The use of electronic data capture tools in clinical trials. J. Med. Internet Res. 11(1), e8 (2009)
Article Google Scholar
Garijo, D., Gil, Y.: Augmenting PROV with plans in P-Plan: scientific processes as linked data. In: Linked Science LISC (2012)
Google Scholar
Harris, P.A., Taylor, R., Thielke, R., Payne, J., et al.: Research electronic data capture REDCap–a metadata-driven methodology and workflow process for providing translational research informatics support. J. Biomed. Inform. 42(2), 377–381 (2009)
Article Google Scholar
Knublauch, H., Kontokostas, D.: Shapes constraint language (SHACL). W3C Candidate Recommendation (2017)
Google Scholar
Kuchinke, W., Aerts, J., Semler, S., Ohmann, C., et al.: CDISC standard-based electronic archiving of clinical trials. Methods Inf. Med. 48(5), 408–413 (2009)
Article Google Scholar
Laleci, G.B., Yuksel, M., Dogac, A.: Providing semantic interoperability between clinical care and clinical research domains. J. Biomed. Health Inform. 17(2), 356–369 (2013)
Article Google Scholar
Lebo, T., Sahoo, S., McGuinness, D., Belhajjame, K., Cheney, J., Corsar, D., Garijo, D., Soiland-Reyes, S., Zednik, S., Zhao, J.: PROV-O: the PROV ontology. W3C recomm. (2013). https://www.w3.org/TR/prov-o/
Leroux, H., Lefort, L.: Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies. J. Biomed. seman. 6(1), 16 (2015)
Article Google Scholar
Leroux, H., Metke, A., Lawley, M.J.: ODM on FHIR: towards achieving semantic interoperability of clinical study data. In: SWAT4LS, pp. 59–68 (2015)
Google Scholar
Nilsdotter, A., et al.: Hip disability and osteoarthritis outcome score (HOOS)-validity and responsiveness in total hip replacement. BMC Musculoskelet. Disord. 4(1), 10 (2003)
Article Google Scholar
Poveda-Villalón, M., Gómez-Pérez, A., Suárez-Figueroa, M.C.: Oops!(ontology pitfall scanner!): an on-line tool for ontology evaluation. IJSWIS 10(2), 7–34 (2014)
Google Scholar
Shah, J., et al.: Electronic data capture for registries and clinical trials in orthopaedic surgery: open source versus commercial systems. Clin. Orthop. Relat. Res.® 468(10), 2664–2671 (2010)
Article Google Scholar
Sim, I., et al.: The ontology of clinical research (OCRe): an informatics foundation for the science of clinical research. J. Biomed. Inform. 52, 78–91 (2014)
Article Google Scholar
Smith, B., et al.: The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25(11), 1251 (2007)
Article Google Scholar
Souza, T., Kush, R., Evans, J.P.: Global clinical data interchange standards are here! Drug Discovery Today 12(3), 174–181 (2007)
Google Scholar

Download references

Acknowledgements

MedRed is supported by the Swissuniversities CUS-P2 program.

Author information

Authors and Affiliations

Institute of Information Systems, University of Applied Sciences and Arts Western Switzerland, HES-SO Valais-Wallis, Sierre, Switzerland
Jean-Paul Calbimonte, Fabien Dubosson, Alexandre Cotting & Michael Schumacher
Institute of Health Sciences, University of Applied Sciences and Arts Western Switzerland, HES-SO Valais-Wallis, Leukerbad, Switzerland
Roger Hilfiker

Authors

Jean-Paul Calbimonte
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Dubosson
View author publications
You can also search for this author in PubMed Google Scholar
Roger Hilfiker
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Cotting
View author publications
You can also search for this author in PubMed Google Scholar
Michael Schumacher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean-Paul Calbimonte .

Editor information

Editors and Affiliations

University of Bari, Bari, Italy
Claudia d'Amato
KMi, The Open University, Milton Keynes, United Kingdom
Miriam Fernandez
University of Liverpool, Liverpool, United Kingdom
Valentina Tamma
Accenture Technology Labs, Dublin, Ireland
Freddy Lecue
University of Fribourg, Fribourg, Switzerland
Philippe Cudré-Mauroux
Capsenta, Inc., Austin, Texas, USA
Juan Sequeda
Universität Bonn, Bonn, Germany
Christoph Lange
Lehigh University, Bethlehem, Pennsylvania, USA
Jeff Heflin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Calbimonte, JP., Dubosson, F., Hilfiker, R., Cotting, A., Schumacher, M. (2017). The MedRed Ontology for Representing Clinical Data Acquisition Metadata. In: d'Amato, C., et al. The Semantic Web – ISWC 2017. ISWC 2017. Lecture Notes in Computer Science(), vol 10588. Springer, Cham. https://doi.org/10.1007/978-3-319-68204-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-68204-4_4
Published: 04 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68203-7
Online ISBN: 978-3-319-68204-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics