Reusability of coded data in the primary care electronic medical record: A dynamic cohort study concerning cancer diagnoses
Introduction
Reuse of electronic medical record (EMR) data is a hot topic, not only in hospitals [1], [2] but also in primary care [3]. An example is the international trend to calculate quality indicators automatically based on data collected during routine care. For Dutch primary care alone, over one hundred quality indicators are established and more are being developed [4]. Because manual assessment of these indicators is a time-consuming burden for healthcare professionals, policy makers aim for automatic calculation based on extracted, mainly coded, routine care data [5], [6].
Furthermore, risk assessment for prevention projects, followed by structured panel management procedures as well as chronic disease management to improve proactive care [7], [8], [9], [10], are becoming more and more popular. These are thought to be promising tools in managing the increasing workload of family physicians, but again they rely strongly on the analysis of routine care diagnostic data to identify patients who could be included in preventive care and chronic disease management programmes, such as the frail elderly [11] or cancer patients.
Also reuse of data for primary care research purposes such as early detection of cancer is almost becoming commonplace, as is demonstrated by the rapidly evolving practice-based research networks (PBRNs) in Europe, Canada and the USA. These networks provide a basic facility for primary care research, and often use anonymized data uploaded by participating practices to a central database [12], [13], [14].
It is important that primary care organisation regard their (routine care) data as a significant and valuable organizational asset. It is equally important that they realize that in the wrong hands (personnel without appropriate expertise and training in handling of routine care data), data re-use can actually cause harm. In order to truly value routine primary healthcare data and to re-use this data reliably, the data should represent the true situation as closely as possible. Despite the examples of actual reuse mentioned above, there are serious concerns about the quality and subsequent reusability of EMR data in primary care [1], [2], [15], [16], [17].
In medical informatics, data quality is assessed using various “dimensions”[15], [18]. Although there is no uniform accepted model or method to assess data quality in primary care, Gray Weiskopf [15] summarized the five common dimensions of data quality based on extensive literature research: Completeness, Correctness, Concordance, Plausibility and Currency (Fig. 1).
Little has been published on the quality of data from primary care records. A few studies (see Discussion) have assessed their completeness and, to a lesser extent, their correctness, but information on other dimensions is lacking.
When assessing data quality, focusing on the coded registration of diagnoses has most priority, because this is a central item being used in analyses. We focus on diagnoses of cancer because it is a high impact diagnosis that we expect to be registered and coded correctly in the EMR for purposes of care. Furthermore, the national Netherlands Cancer Registry (NCR) [19] provides an accessible and supposedly reliable reference standard. To assess quality and usability of coded cancer diagnoses for re-use using available reference data we decided that we could assess and study three dimensions of data quality using our data infrastructure: completeness, correctness and concordance with the reference standard. To find focus points for improvement, we identified factors that influence data quality.
Section snippets
Design
We performed a dynamic cohort study in a Dutch network database containing 250,000 anonymized electronic medical records (EMRs) from 52 general practices. We used a 4 step study approach, as described in Fig. 2, to determine Standardized Incidence Rate Ratios (SIRs) between January 1st 2000 and December 31st 2011.
First, we determined our reference standard: the expected incidence rates based on the Netherlands Cancer Registry (NCR) [19] and Statistics Netherlands [20].
Second, observed incidence
Results
The combined SIR for breast, colon, and prostate cancer between 2000 and 2011 was 91.5%, (95%CI 88.5–94.5). This means there is a significant difference between the observed number of cases in the EMR and the expected number according to the NCR (Table 2).
The SIRs varied over time: from 2000 to 2003 the combined SIR was 66.3%, (95%CI 61.3–71.3), from 2004 to 2007 it was 95.7% (95%CI 90.3–100.9), and from 2008 to 2011 it was 103.8% (95%CI 98.8–108.6). For colon cancer in males the SIR was 71.5%
Principal findings
The overall SIR was 91.5% (95%CI 88.5–94.5). Comparability of incidence rates improved significantly over the years, from a SIR of 66.3% in 2000–2003 (95%CI 61.3–71.3) to 103.8% in 2008–2011 (95%CI 98.8–108.6). SIRs differ between cancer types: from 71.5% (95%CI 65.0–77.8) for colon cancer in males to 103.9% (95%CI 98.9–108.5) for breast cancer. There are differences in data quality (SIRs 76.2%–99.7%) depending on the EMR system used, with SIRs up to 232.9% for breast cancer in one EMR system
Authors' contributions
AS and MEN conceptualized and designed the study. Data extraction from the network database and determination of reference data was done by AS as well as calculation of incidences ratios. CWH, AS and MEN analysed, validated and interpreted the data. AS drafted the manuscript. CWH, RHS and MEN contributed to revisions with important feedback. All authors had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Declaration of conflicting interests
The authors declare that they have no competing financial interest.
Funding
The Julius General Practitioners’ Network Database is funded by the Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, the Netherlands, and by funding from other dedicated research and educational projects that are based on its data. AS held a research fellowship partly funded by BBMRI (Biobanking and Biomolecular Research Infrastructure) and by the Mondriaan Foundation, an independent organisation which aims to link and enrich routine healthcare databases in
Acknowledgements
We thank the GPs in the Utrecht area for sharing their anonymized EMR data with us for this study, Julia Velikopolskaia for her assistance in extracting data from the Julius General Practitioners’ Network Database and Jackie Senior for editing the manuscript.
References (35)
- et al.
Secondary use of clinical data: the Vanderbilt approach
J. Biomed. Inform.
(2014) - et al.
Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer
Artif. Intell. Med.
(2016) - et al.
The Dutch surgical colorectal audit
Eur. J. Surg. Oncol.
(2013) - et al.
Electronic health records: new opportunities for clinical research
J. Intern. Med.
(2013) Electronic medical records: the way forward for primary care research?
Fam. Pract.
(2014)- Nivel, Feasibility study indicators primary care and Etalage+ data (Haalbaarheidsstudie indicatoren huisartsenzorg en...
- et al.
Formalization and computation of quality measures based on electronic medical records
J. Am. Med. Inform. Assoc.
(2014) - D. Blumenthal, Meaningful use: an assessment. An interview with David Blumenthal, M.D., National Coordinator for Health...
- et al.
Improving population health through team-Based panel management
Arch. Intern. Med. Am. Med. Assoc.
(2011) The future of primary care: transforming Practice
N. Engl. J. Med.
(2008)
Understanding panel management: a comparative study of an emerging approach to population care
Perm. J
Electronic medical record reminders and panel management to improve primary care of elderly patients
Arch. Intern. Med.
Effectiveness of a proactive primary care program on preserving daily functioning of older people: a cluster randomized controlled trial
J. Am. Geriatr. Soc.
PBRN conference summary and update
Ann. Fam. Med.
Implications of the problem orientated medical record (POMR) for research using electronic GP databases: a comparison of the Doctors Independent Network Database (DIN) and the General Practice Research Database (GPRD)
BMC Fam. Pract.
Supporting better science in primary care: a description of practice-based research networks (PBRNs) in 2011
J. Am. Board Fam. Med.
N.G. Weiskopf, C. Weng, Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research
J. Am. Med. Inform. Assoc.
Cited by (19)
Advancements in Medical Imaging for Sugar Diagnosis Using Modified Hopfield Neural Network
2023, 2023 International Conference on Disruptive Technologies, ICDT 2023The use of electronic health records to inform cancer surveillance efforts: a scoping review and test of indicators for public health surveillance of cancer prevention and control
2022, BMC Medical Informatics and Decision MakingData Quality in health records: A literature review
2021, Iberian Conference on Information Systems and Technologies, CISTIData quality in primary care, Scotland
2021, Scottish Medical JournalEmbedding "Smart" Disease Coding within Routine Electronic Medical Record Workflow: Prospective Single-Arm Trial
2020, JMIR Medical Informatics