The Georges Pompidou University Hospital Clinical Data Warehouse: A 8-years follow-up experience

https://doi.org/10.1016/j.ijmedinf.2017.02.006Get rights and content

Highlights

  • Georges Pompidou University Hospital CDW is a key facilitator for research.

  • We describe 74 CDW projects evaluated by the ethics committee between 2011 and 2015.

  • Important methodological support from the medical informatics department was needed.

Abstract

Background

When developed jointly with clinical information systems, clinical data warehouses (CDWs) facilitate the reuse of healthcare data and leverage clinical research.

Objective

To describe both data access and use for clinical research, epidemiology and health service research of the “Hôpital Européen Georges Pompidou” (HEGP) CDW.

Methods

The CDW has been developed since 2008 using an i2b2 platform. It was made available to health professionals and researchers in October 2010. Procedures to access data have been implemented and different access levels have been distinguished according to the nature of queries.

Results

As of July 2016, the CDW contained the consolidated data of over 860,000 patients followed since the opening of the HEGP hospital in July 2000. These data correspond to more than 122 million clinical item values, 124 million biological item values, and 3.7 million free text reports. The ethics committee of the hospital evaluates all CDW projects that generate secondary data marts. Characteristics of the 74 research projects validated between January 2011 and December 2015 are described.

Conclusion

The use of HEGP CDWs is a key facilitator for clinical research studies. It required however important methodological and organizational support efforts from a biomedical informatics department.

Introduction

Reuse of health data is a major issue for better patient care management and improved clinical and epidemiological researches [1], [2]. Within hospital environments, data reuse can be facilitated by the deployment of clinical data warehouses (CDWs), which need to be strongly coupled with running clinical information systems (CISs) [3]. The potential benefits of such a combined approach can be analyzed both from the global point of view of an institution and the decision-making process at the single-patient level.

From a hospital management perspective, CDWs provide information on activity trends and case-mix evolution. Adjusting care offer to constantly evolving care demands is a major preoccupation of health managers. It includes testing via computer simulation of various evolution strategies and their possible impact on the quality and continuity of care as well as financial outcomes (e.g., primary vs. secondary or tertiary care, inpatient vs. outpatient vs. home care, traditional vs. one-day surgery, invasive vs. noninvasive diagnostic and therapeutic procedures). In hospitals that rely partially or completely on financing based on diagnosis-related groups, analysis of the statistical links between coded diagnoses and procedures can help in searching for missing codes and/or maximizing diagnosis-related groups −related income [4]. Chaining of inpatient and outpatient data helps in determining patient profile categories, analyzing clinical pathways, and fostering the continuity of care [5].

From the patient point of view, data contained in CDWs can facilitate decision-making in the context of more personalized or precision medicine [6]. One of the earliest described methods that can be applied to CDW consists of searching for similar patients within CISs [7], [8]. This means looking for patients who share the same clinical or para-clinical features and analyzing their characteristics, the medical decisions made, and the results of these decisions to infer the most relevant clinical strategies for the patient concerned [9]. Practicing physicians rely on the collective memory of CISs and CDWs in the same way that they can rely on the experience of expert clinicians [10]. Results could be all the more convincing that clinical strategies have remained stable during the query period. A complementary approach consists of the evaluation via computer simulations of decision making tools (in silico evaluation) such as the adaptation of drug dosage according to the state of renal function [11] or screening of patients with potential delays in cancer diagnosis [12]. Rules of good practice derived from the literature or expert knowledge are programmed and tested on relevant patient data within the CDW.

In a research context, CDWs can be used to generate and test hypotheses. For epidemiological studies, CDWs allow the constitution of patient cohorts that can serve for retrospective studies (e.g., population follow-up, case-control studies) to compute disease risk factors [13] or as the starting point for prospective studies obtained by increasing follow-up time and adding new variables and/or new patients [14]. In all these situations, researchers benefit from selection tools to define patient inclusion and exclusion criteria, items to be followed up, end-points to be considered, and various graphical and data analytics views [15]. In a vigilance study context, end-points can be any biological or clinical changes, occurrence of side-effects, or complications of diagnostic or therapeutic procedures [16], [17]. In a clinical research context, a CDW can be used in various stages of a clinical trial: − in the feasibility stage to evaluate the hospital’s capacity for recruitment according to its case mix; − in the inclusion stage, for the selection of patients; − during the trial, to evaluate how the selected patients are representative of the larger followed population of the hospital; − to extract patient data for analysis produced during a given period [18]; − and finally as a population follow-up and vigilance tool [19].

Several open-source platforms are now available and are being adopted by a growing number of institutions. An early example is the Informatics for Integrating Biology at the Bedside (i2b2) environment, an NIH-funded national center for biomedical computing developed by the Harvard group in Boston. i2b2 is now used in more than 130 hospital-care institutions around the world [20]. It relies on a star-based model built around a central patient observation relational table. A web-based interface facilitates the query process by health professionals. The platform allows the integration of both clinical and high-throughput data such as genomic data and offers a wide variety of tools for text and genomic data processing. On top of i2b2, SHRINE (Shared Health Research Informatics NEtwork) aims at linking i2b2 instances for the sharing of obfuscated, aggregated counts of patients meeting selected inclusion and exclusion criteria, i.e. deidentified data [21]. The Observational Health Data Sciences and Informatics, which is a multi-stakeholder program, developped the Observational Medical Outcomes Partnership OMOP platform, is more oriented towards the reuse of heterogeneous sources of health data (administrative claims, electronic medical records…) [22]. It relies on the adoption of a Common Data Model known as the OMOP Common Data Model. This program provides resources to convert a wide variety of datasets into the Common Data Model, as well as a tools to analyse data in CDM format. A recent publication using data from OHDSI network includes 11 source of data on 250 million patients observed for multiple years [23]. Links between i2b2 and OMOP platforms are developping [24]. The SHARPn platform [25] was deployed to receive source EHR data in several formats, generate structured data from EHR narrative text, and normalize the EHR data using common detailed clinical models and Consolidated Health Informatics standard terminologies from thousands of patient electronic records sourced of two large healthcare organizations: Mayo Clinic and Intermountain Healthcare.

Various experiences with CDWs have been previously published, in particular in university teaching hospitals [26], [27], [28], [29], [30], [31], [32]. At Vanderbilt University, the CDW that was developed in-house is composed of two environments, one related to patient identification information, and the other to clinical information, including omics data [26]. Query access tools are made available to professional end users. Institution/ethics review board (IRB) approval is necessary for all queries that necessitate access to patient identification data. A team made up of experts in biomedical informatics and statistics provides methodological support for clinical searchers. At Washington University, the CDW relies on an i2b2 platform. Queries from more than 100 end users are processed each month [27]. The onco-i2b2 [28] implemented by the University of Pavia and the IRCCS Fondazione Maugeri hospital manages data of more than 6500 patients with breast cancer diagnosis collected between 2001 and 2011 (over 390 of them have at least one biological sample in the cancer biobank), more than 47,000 visits and 96,000 observations over 960 medical concepts. The CARDIO-i2b2 project is populated with data from of patients with arrhythmogenic diseases [29]. Krasowski et al. [30] present examples of several successful searches using their home designed clinical datawarehouse, mostly queries from microbiology and clinical chemistry/toxicology, with inclusion criteria covering over 5 years of clinical data and heterogeneous sources. The Göttingen University i2b2 infrastructure includes a set of four research usage scenarios [31]. The CARPEM infrastructure [32] integrated heterogeneous data such as clinical data from the clinical care systems, from clinical research groups and from associated labs, ‘Omics’ data from associated molecular labs, and additional sources from biobanking using a set of open-source resources including i2b2 and tranSMART.

The present article describes the current content of the CDW at the “Hôpital Européen Georges Pompidou” (HEGP), the data access process to ensure patient privacy safety and the CDW practical use during the period 2011–2015.

Section snippets

The HEGP CDW platform

The HEGP is an 800-bed acute care university hospital located in southwest Paris. The hospital is organized around three major cooperating healthcare centers: cardiovascular, cancer, and internal medicine, including an emergency department and trauma center.

The HIMSS/EMRAM level 6 certified CIS includes a production Oracle® database for the EHR with its replicated mirror database. The HEGP CDW in operation since 2009 is fed from the EHR replicated database to avoid overload of the production

CDW content

Clinical data warehouses are expected to contain almost all patient data produced within a CIS, whether structured (e.g., drug prescriptions and associated effects) or unstructured (e.g., inpatient or outpatient summary reports, radiological or pathological reports). The HEGP CDW contains all clinical records since the hospital opened in July 2000 (Table 1). The HEGP CIS patient identification database was initially built up from the identification databases of the three hospitals that were

Discussion and conclusion

Deployment of CDWs strongly coupled with running CISs has now become a major goal for many hospitals that include data analytics and translational research support and IT strategic planning into their organizations. This is however a long-term process (e.g., two to five years) that needs to pass through several rounds of conception, deployment and validation [3]. These phases concern the selection of the most appropriate development platform, a clear integration strategy (e.g., at a technical

Conflict of interest

The authors declare that they have no competing interest.

Authors’ contribution

  • -

    PD and EZ initiated the CDW project in 2008.

  • -

    ASJ and PD conceived and designed the study.

  • -

    ASJ performed the data collection and analysis.

  • -

    ASJ performed the CDW projects collection and analysis.

  • -

    ASJ and PD wrote the first full draft.

  • -

    Based on AB, MFM and PA comments, ASJ and PD made critical revision and edited the manuscript.

  • -

    All authors read and approved the last version of the paper.

Acknowledgements

We are indebted to all CDWs users and especially the early adopter from the biomedical informatics department, namely, Jean-Baptiste Escudie, Yannick Girardeau and Bastien Rance.

References (40)

  • C. Cherry et al.

    À la Recherche du Temps Perdu: extracting temporal relations from medical text inthe 2012 i2b2 NLP challenge

    J. Am. Med. Inform. Assoc.

    (2013)
  • J.D. Tenenbaum et al.

    An informatics research agenda to support precision medicine: seven key areas

    J. Am. Med. Inform. Assoc. JAMIA

    (2016)
  • C. Safran et al.

    ClinQuery: a system for online searching of data in a teaching hospital

    Ann. Intern. Med.

    (1989)
  • J. Frankovich et al.

    Evidence-based medicine in the EMR era

    N. Engl. J. Med.

    (2011)
  • B. Shneiderman et al.

    Improving healthcare with interactive visualization

    Computer

    (2013)
  • A. Boussadi et al.

    A clinical data warehouse-based process for refining medication orders alerts

    J. Am. Med. Inform. Assoc.

    (2012)
  • D.R. Murphy et al.

    Electronic health record-based triggers to detect potential delays in cancer diagnosis

    BMJ Qual. Saf.

    (2014)
  • T.S. Cole et al.

    Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research

    Pediatr Rheumatol.

    (2013)
  • J.F. Hurdle et al.

    Identifying clinical/translational research cohorts: ascertainment via querying an integrated multi-source database

    J. Am. Med. Inform. Assoc.

    (2013)
  • A. Rind

    Interactive information visualization to explore and query electronic health records

    Found Trends® Hum.–Comput. Interact.

    (2013)
  • Cited by (51)

    • Development of a comprehensive database for research on foetal acidosis

      2022, European Journal of Obstetrics and Gynecology and Reproductive Biology
    • SCALPEL3: A scalable open-source library for healthcare claims databases

      2020, International Journal of Medical Informatics
      Citation Excerpt :

      To the best of our knowledge, such an approach has not been implemented to perform ETL on large health databases. Prior works are either relying on SQL and normalized schemas [25,26] or applied to small datasets [27]. This paper describes and implements such an approach for large health databases, as explained in the next section.

    • Ten-year patient journey of stage III non-small cell lung cancer patients: A single-center, observational, retrospective study in Korea (Realtime autOmatically updated data warehOuse in healTh care; UNIVERSE-ROOT study)

      2020, Lung Cancer
      Citation Excerpt :

      However, clinical information is often not recorded in an organized way and converting it to a structured format can be a time-consuming task that may not successfully capture all facets of the information [13]. The potential of big data to transform biomedicine has been recognized, with the development of new algorithms critical for the analysis of large and diverse datasets [13–17]. We have developed a novel in-house algorithm to capture and process structured (e.g. blood test, biopsy results, mutation testing) and unstructured data (text) from electronic medical records (EMRs) of patients with NSCLC.

    View all citing articles on Scopus
    View full text