The Georges Pompidou University Hospital Clinical Data Warehouse: A 8-years follow-up experience
Introduction
Reuse of health data is a major issue for better patient care management and improved clinical and epidemiological researches [1], [2]. Within hospital environments, data reuse can be facilitated by the deployment of clinical data warehouses (CDWs), which need to be strongly coupled with running clinical information systems (CISs) [3]. The potential benefits of such a combined approach can be analyzed both from the global point of view of an institution and the decision-making process at the single-patient level.
From a hospital management perspective, CDWs provide information on activity trends and case-mix evolution. Adjusting care offer to constantly evolving care demands is a major preoccupation of health managers. It includes testing via computer simulation of various evolution strategies and their possible impact on the quality and continuity of care as well as financial outcomes (e.g., primary vs. secondary or tertiary care, inpatient vs. outpatient vs. home care, traditional vs. one-day surgery, invasive vs. noninvasive diagnostic and therapeutic procedures). In hospitals that rely partially or completely on financing based on diagnosis-related groups, analysis of the statistical links between coded diagnoses and procedures can help in searching for missing codes and/or maximizing diagnosis-related groups −related income [4]. Chaining of inpatient and outpatient data helps in determining patient profile categories, analyzing clinical pathways, and fostering the continuity of care [5].
From the patient point of view, data contained in CDWs can facilitate decision-making in the context of more personalized or precision medicine [6]. One of the earliest described methods that can be applied to CDW consists of searching for similar patients within CISs [7], [8]. This means looking for patients who share the same clinical or para-clinical features and analyzing their characteristics, the medical decisions made, and the results of these decisions to infer the most relevant clinical strategies for the patient concerned [9]. Practicing physicians rely on the collective memory of CISs and CDWs in the same way that they can rely on the experience of expert clinicians [10]. Results could be all the more convincing that clinical strategies have remained stable during the query period. A complementary approach consists of the evaluation via computer simulations of decision making tools (in silico evaluation) such as the adaptation of drug dosage according to the state of renal function [11] or screening of patients with potential delays in cancer diagnosis [12]. Rules of good practice derived from the literature or expert knowledge are programmed and tested on relevant patient data within the CDW.
In a research context, CDWs can be used to generate and test hypotheses. For epidemiological studies, CDWs allow the constitution of patient cohorts that can serve for retrospective studies (e.g., population follow-up, case-control studies) to compute disease risk factors [13] or as the starting point for prospective studies obtained by increasing follow-up time and adding new variables and/or new patients [14]. In all these situations, researchers benefit from selection tools to define patient inclusion and exclusion criteria, items to be followed up, end-points to be considered, and various graphical and data analytics views [15]. In a vigilance study context, end-points can be any biological or clinical changes, occurrence of side-effects, or complications of diagnostic or therapeutic procedures [16], [17]. In a clinical research context, a CDW can be used in various stages of a clinical trial: − in the feasibility stage to evaluate the hospital’s capacity for recruitment according to its case mix; − in the inclusion stage, for the selection of patients; − during the trial, to evaluate how the selected patients are representative of the larger followed population of the hospital; − to extract patient data for analysis produced during a given period [18]; − and finally as a population follow-up and vigilance tool [19].
Several open-source platforms are now available and are being adopted by a growing number of institutions. An early example is the Informatics for Integrating Biology at the Bedside (i2b2) environment, an NIH-funded national center for biomedical computing developed by the Harvard group in Boston. i2b2 is now used in more than 130 hospital-care institutions around the world [20]. It relies on a star-based model built around a central patient observation relational table. A web-based interface facilitates the query process by health professionals. The platform allows the integration of both clinical and high-throughput data such as genomic data and offers a wide variety of tools for text and genomic data processing. On top of i2b2, SHRINE (Shared Health Research Informatics NEtwork) aims at linking i2b2 instances for the sharing of obfuscated, aggregated counts of patients meeting selected inclusion and exclusion criteria, i.e. deidentified data [21]. The Observational Health Data Sciences and Informatics, which is a multi-stakeholder program, developped the Observational Medical Outcomes Partnership OMOP platform, is more oriented towards the reuse of heterogeneous sources of health data (administrative claims, electronic medical records…) [22]. It relies on the adoption of a Common Data Model known as the OMOP Common Data Model. This program provides resources to convert a wide variety of datasets into the Common Data Model, as well as a tools to analyse data in CDM format. A recent publication using data from OHDSI network includes 11 source of data on 250 million patients observed for multiple years [23]. Links between i2b2 and OMOP platforms are developping [24]. The SHARPn platform [25] was deployed to receive source EHR data in several formats, generate structured data from EHR narrative text, and normalize the EHR data using common detailed clinical models and Consolidated Health Informatics standard terminologies from thousands of patient electronic records sourced of two large healthcare organizations: Mayo Clinic and Intermountain Healthcare.
Various experiences with CDWs have been previously published, in particular in university teaching hospitals [26], [27], [28], [29], [30], [31], [32]. At Vanderbilt University, the CDW that was developed in-house is composed of two environments, one related to patient identification information, and the other to clinical information, including omics data [26]. Query access tools are made available to professional end users. Institution/ethics review board (IRB) approval is necessary for all queries that necessitate access to patient identification data. A team made up of experts in biomedical informatics and statistics provides methodological support for clinical searchers. At Washington University, the CDW relies on an i2b2 platform. Queries from more than 100 end users are processed each month [27]. The onco-i2b2 [28] implemented by the University of Pavia and the IRCCS Fondazione Maugeri hospital manages data of more than 6500 patients with breast cancer diagnosis collected between 2001 and 2011 (over 390 of them have at least one biological sample in the cancer biobank), more than 47,000 visits and 96,000 observations over 960 medical concepts. The CARDIO-i2b2 project is populated with data from of patients with arrhythmogenic diseases [29]. Krasowski et al. [30] present examples of several successful searches using their home designed clinical datawarehouse, mostly queries from microbiology and clinical chemistry/toxicology, with inclusion criteria covering over 5 years of clinical data and heterogeneous sources. The Göttingen University i2b2 infrastructure includes a set of four research usage scenarios [31]. The CARPEM infrastructure [32] integrated heterogeneous data such as clinical data from the clinical care systems, from clinical research groups and from associated labs, ‘Omics’ data from associated molecular labs, and additional sources from biobanking using a set of open-source resources including i2b2 and tranSMART.
The present article describes the current content of the CDW at the “Hôpital Européen Georges Pompidou” (HEGP), the data access process to ensure patient privacy safety and the CDW practical use during the period 2011–2015.
Section snippets
The HEGP CDW platform
The HEGP is an 800-bed acute care university hospital located in southwest Paris. The hospital is organized around three major cooperating healthcare centers: cardiovascular, cancer, and internal medicine, including an emergency department and trauma center.
The HIMSS/EMRAM level 6 certified CIS includes a production Oracle® database for the EHR with its replicated mirror database. The HEGP CDW in operation since 2009 is fed from the EHR replicated database to avoid overload of the production
CDW content
Clinical data warehouses are expected to contain almost all patient data produced within a CIS, whether structured (e.g., drug prescriptions and associated effects) or unstructured (e.g., inpatient or outpatient summary reports, radiological or pathological reports). The HEGP CDW contains all clinical records since the hospital opened in July 2000 (Table 1). The HEGP CIS patient identification database was initially built up from the identification databases of the three hospitals that were
Discussion and conclusion
Deployment of CDWs strongly coupled with running CISs has now become a major goal for many hospitals that include data analytics and translational research support and IT strategic planning into their organizations. This is however a long-term process (e.g., two to five years) that needs to pass through several rounds of conception, deployment and validation [3]. These phases concern the selection of the most appropriate development platform, a clear integration strategy (e.g., at a technical
Conflict of interest
The authors declare that they have no competing interest.
Authors’ contribution
- -
PD and EZ initiated the CDW project in 2008.
- -
ASJ and PD conceived and designed the study.
- -
ASJ performed the data collection and analysis.
- -
ASJ performed the CDW projects collection and analysis.
- -
ASJ and PD wrote the first full draft.
- -
Based on AB, MFM and PA comments, ASJ and PD made critical revision and edited the manuscript.
- -
All authors read and approved the last version of the paper.
Acknowledgements
We are indebted to all CDWs users and especially the early adopter from the biomedical informatics department, namely, Jean-Baptiste Escudie, Yannick Girardeau and Bastien Rance.
References (40)
- et al.
Trustworthy reuse of health data: a transnational perspective
Int. J. Med. Inf.
(2013) - et al.
Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques
J. Am. Med. Inform. Assoc.
(2006) - et al.
Toward a national framework for the secondary use of health data: an american medical informatics association white paper
J. Am. Med. Inform. Assoc.
(2007) - et al.
The shared health research information network (SHRINE): a prototype federated query tool for clinical data repositories
J. Am. Med. Inform. Assoc.
(2009) - et al.
Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project
J. Biomed. Inform.
(2012) - et al.
Secondary use of clinical data: the Vanderbilt approach
J. Biomed. Inform.
(2014) - et al.
Use of a data warehouse at an academic medical center for clinical pathology quality improvement, education, and research
J. Pathol. Inform.
(2015) - et al.
Hypertension management: the computer as a participant
Am. J. Med.
(1980) - et al.
Perspectives for Medical Informatics: Reusing the Electronic Medical Record for Clinical Research
Methods Inf. Med.
(2009) - et al.
Methodology of integration of a clinical data warehouse with a clinical information system: the HEGP case
Stud. Health Technol. Inform.
(2010)
À la Recherche du Temps Perdu: extracting temporal relations from medical text inthe 2012 i2b2 NLP challenge
J. Am. Med. Inform. Assoc.
An informatics research agenda to support precision medicine: seven key areas
J. Am. Med. Inform. Assoc. JAMIA
ClinQuery: a system for online searching of data in a teaching hospital
Ann. Intern. Med.
Evidence-based medicine in the EMR era
N. Engl. J. Med.
Improving healthcare with interactive visualization
Computer
A clinical data warehouse-based process for refining medication orders alerts
J. Am. Med. Inform. Assoc.
Electronic health record-based triggers to detect potential delays in cancer diagnosis
BMJ Qual. Saf.
Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research
Pediatr Rheumatol.
Identifying clinical/translational research cohorts: ascertainment via querying an integrated multi-source database
J. Am. Med. Inform. Assoc.
Interactive information visualization to explore and query electronic health records
Found Trends® Hum.–Comput. Interact.
Cited by (51)
Integrating a new knowledge organisation system for monoclonal antibodies for therapeutic use authorised in Europe into HeTOP terminology-ontology server
2023, Journal of Biomedical InformaticsDevelopment of a comprehensive database for research on foetal acidosis
2022, European Journal of Obstetrics and Gynecology and Reproductive BiologySCALPEL3: A scalable open-source library for healthcare claims databases
2020, International Journal of Medical InformaticsCitation Excerpt :To the best of our knowledge, such an approach has not been implemented to perform ETL on large health databases. Prior works are either relying on SQL and normalized schemas [25,26] or applied to small datasets [27]. This paper describes and implements such an approach for large health databases, as explained in the next section.
Ten-year patient journey of stage III non-small cell lung cancer patients: A single-center, observational, retrospective study in Korea (Realtime autOmatically updated data warehOuse in healTh care; UNIVERSE-ROOT study)
2020, Lung CancerCitation Excerpt :However, clinical information is often not recorded in an organized way and converting it to a structured format can be a time-consuming task that may not successfully capture all facets of the information [13]. The potential of big data to transform biomedicine has been recognized, with the development of new algorithms critical for the analysis of large and diverse datasets [13–17]. We have developed a novel in-house algorithm to capture and process structured (e.g. blood test, biopsy results, mutation testing) and unstructured data (text) from electronic medical records (EMRs) of patients with NSCLC.