Reusability of coded data in the primary care electronic medical record: A dynamic cohort study concerning cancer diagnoses

https://doi.org/10.1016/j.ijmedinf.2016.08.004Get rights and content

Highlights

  • Reuse of EMR data is expanding despite concerns about data quality.

  • We assessed quality using the Cancer Registry as a reference.

  • re-users of coded data will find 30% of cancer cases missed orfalse-positive.

  • The type of EMR system and the type of cancer influence quality of data.

  • Data reuse should only be performed by appropriately trained experts.

Abstract

Objectives

To assess quality and reusability of coded cancer diagnoses in routine primary care data. To identify factors that influence data quality and areas for improvement.

Methods

A dynamic cohort study in a Dutch network database containing 250,000 anonymized electronic medical records (EMRs) from 52 general practices was performed. Coded data from 2000 to 2011 for the three most common cancer types (breast, colon and prostate cancer) was compared to the Netherlands Cancer Registry.

Measurements

Data quality is expressed in Standard Incidence Ratios (SIRs): the ratio between the number of coded cases observed in the primary care network database and the expected number of cases based on the Netherlands Cancer Registry. Ratios were multiplied by 100% for readability.

Results

The overall SIR was 91.5% (95%CI 88.5–94.5) and showed improvement over the years. SIRs differ between cancer types: from 71.5% for colon cancer in males to 103.9% for breast cancer. There are differences in data quality (SIRs 76.2% − 99.7%) depending on the EMR system used, with SIRs up to 232.9% for breast cancer. Frequently observed errors in routine healthcare data can be classified as: lack of integrity checks, inaccurate use and/or lack of codes, and lack of EMR system functionality.

Conclusions

Re-users of coded routine primary care Electronic Medical Record data should be aware that 30% of cancer cases can be missed. Up to 130% of cancer cases found in the EMR data can be false-positive. The type of EMR system and the type of cancer influence the quality of coded diagnosis registry. While data quality can be improved (e.g. through improving system design and by training EMR system users), re-use should only be taken care of by appropriately trained experts.

Introduction

Reuse of electronic medical record (EMR) data is a hot topic, not only in hospitals [1], [2] but also in primary care [3]. An example is the international trend to calculate quality indicators automatically based on data collected during routine care. For Dutch primary care alone, over one hundred quality indicators are established and more are being developed [4]. Because manual assessment of these indicators is a time-consuming burden for healthcare professionals, policy makers aim for automatic calculation based on extracted, mainly coded, routine care data [5], [6].

Furthermore, risk assessment for prevention projects, followed by structured panel management procedures as well as chronic disease management to improve proactive care [7], [8], [9], [10], are becoming more and more popular. These are thought to be promising tools in managing the increasing workload of family physicians, but again they rely strongly on the analysis of routine care diagnostic data to identify patients who could be included in preventive care and chronic disease management programmes, such as the frail elderly [11] or cancer patients.

Also reuse of data for primary care research purposes such as early detection of cancer is almost becoming commonplace, as is demonstrated by the rapidly evolving practice-based research networks (PBRNs) in Europe, Canada and the USA. These networks provide a basic facility for primary care research, and often use anonymized data uploaded by participating practices to a central database [12], [13], [14].

It is important that primary care organisation regard their (routine care) data as a significant and valuable organizational asset. It is equally important that they realize that in the wrong hands (personnel without appropriate expertise and training in handling of routine care data), data re-use can actually cause harm. In order to truly value routine primary healthcare data and to re-use this data reliably, the data should represent the true situation as closely as possible. Despite the examples of actual reuse mentioned above, there are serious concerns about the quality and subsequent reusability of EMR data in primary care [1], [2], [15], [16], [17].

In medical informatics, data quality is assessed using various “dimensions”[15], [18]. Although there is no uniform accepted model or method to assess data quality in primary care, Gray Weiskopf [15] summarized the five common dimensions of data quality based on extensive literature research: Completeness, Correctness, Concordance, Plausibility and Currency (Fig. 1).

Little has been published on the quality of data from primary care records. A few studies (see Discussion) have assessed their completeness and, to a lesser extent, their correctness, but information on other dimensions is lacking.

When assessing data quality, focusing on the coded registration of diagnoses has most priority, because this is a central item being used in analyses. We focus on diagnoses of cancer because it is a high impact diagnosis that we expect to be registered and coded correctly in the EMR for purposes of care. Furthermore, the national Netherlands Cancer Registry (NCR) [19] provides an accessible and supposedly reliable reference standard. To assess quality and usability of coded cancer diagnoses for re-use using available reference data we decided that we could assess and study three dimensions of data quality using our data infrastructure: completeness, correctness and concordance with the reference standard. To find focus points for improvement, we identified factors that influence data quality.

Section snippets

Design

We performed a dynamic cohort study in a Dutch network database containing 250,000 anonymized electronic medical records (EMRs) from 52 general practices. We used a 4 step study approach, as described in Fig. 2, to determine Standardized Incidence Rate Ratios (SIRs) between January 1st 2000 and December 31st 2011.

First, we determined our reference standard: the expected incidence rates based on the Netherlands Cancer Registry (NCR) [19] and Statistics Netherlands [20].

Second, observed incidence

Results

The combined SIR for breast, colon, and prostate cancer between 2000 and 2011 was 91.5%, (95%CI 88.5–94.5). This means there is a significant difference between the observed number of cases in the EMR and the expected number according to the NCR (Table 2).

The SIRs varied over time: from 2000 to 2003 the combined SIR was 66.3%, (95%CI 61.3–71.3), from 2004 to 2007 it was 95.7% (95%CI 90.3–100.9), and from 2008 to 2011 it was 103.8% (95%CI 98.8–108.6). For colon cancer in males the SIR was 71.5%

Principal findings

The overall SIR was 91.5% (95%CI 88.5–94.5). Comparability of incidence rates improved significantly over the years, from a SIR of 66.3% in 2000–2003 (95%CI 61.3–71.3) to 103.8% in 2008–2011 (95%CI 98.8–108.6). SIRs differ between cancer types: from 71.5% (95%CI 65.0–77.8) for colon cancer in males to 103.9% (95%CI 98.9–108.5) for breast cancer. There are differences in data quality (SIRs 76.2%–99.7%) depending on the EMR system used, with SIRs up to 232.9% for breast cancer in one EMR system

Authors' contributions

AS and MEN conceptualized and designed the study. Data extraction from the network database and determination of reference data was done by AS as well as calculation of incidences ratios. CWH, AS and MEN analysed, validated and interpreted the data. AS drafted the manuscript. CWH, RHS and MEN contributed to revisions with important feedback. All authors had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Declaration of conflicting interests

The authors declare that they have no competing financial interest.

Funding

The Julius General Practitioners’ Network Database is funded by the Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, the Netherlands, and by funding from other dedicated research and educational projects that are based on its data. AS held a research fellowship partly funded by BBMRI (Biobanking and Biomolecular Research Infrastructure) and by the Mondriaan Foundation, an independent organisation which aims to link and enrich routine healthcare databases in

Acknowledgements

We thank the GPs in the Utrecht area for sharing their anonymized EMR data with us for this study, Julia Velikopolskaia for her assistance in extracting data from the Julius General Practitioners’ Network Database and Jackie Senior for editing the manuscript.

References (35)

  • I. Danciu et al.

    Secondary use of clinical data: the Vanderbilt approach

    J. Biomed. Inform.

    (2014)
  • M. Hoogendoorn et al.

    Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer

    Artif. Intell. Med.

    (2016)
  • N.J. Van Leersum et al.

    The Dutch surgical colorectal audit

    Eur. J. Surg. Oncol.

    (2013)
  • P. Coorevits et al.

    Electronic health records: new opportunities for clinical research

    J. Intern. Med.

    (2013)
  • S. Muller

    Electronic medical records: the way forward for primary care research?

    Fam. Pract.

    (2014)
  • Nivel, Feasibility study indicators primary care and Etalage+ data (Haalbaarheidsstudie indicatoren huisartsenzorg en...
  • K. Dentler et al.

    Formalization and computation of quality measures based on electronic medical records

    J. Am. Med. Inform. Assoc.

    (2014)
  • D. Blumenthal, Meaningful use: an assessment. An interview with David Blumenthal, M.D., National Coordinator for Health...
  • E.H. Chen et al.

    Improving population health through team-Based panel management

    Arch. Intern. Med. Am. Med. Assoc.

    (2011)
  • T. Bodenheimer

    The future of primary care: transforming Practice

    N. Engl. J. Med.

    (2008)
  • E.E. Neuwirth et al.

    Understanding panel management: a comparative study of an emerging approach to population care

    Perm. J

    (2007)
  • T.S. Loo et al.

    Electronic medical record reminders and panel management to improve primary care of elderly patients

    Arch. Intern. Med.

    (2011)
  • N. Bleijenberg et al.

    Effectiveness of a proactive primary care program on preserving daily functioning of older people: a cluster randomized controlled trial

    J. Am. Geriatr. Soc.

    (2016)
  • R.J. Dolor et al.

    PBRN conference summary and update

    Ann. Fam. Med.

    (2014)
  • I.M. Carey et al.

    Implications of the problem orientated medical record (POMR) for research using electronic GP databases: a comparison of the Doctors Independent Network Database (DIN) and the General Practice Research Database (GPRD)

    BMC Fam. Pract.

    (2003)
  • K.A. Peterson et al.

    Supporting better science in primary care: a description of practice-based research networks (PBRNs) in 2011

    J. Am. Board Fam. Med.

    (2012)
  • N.G. Weiskopf, C. Weng, Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research

    J. Am. Med. Inform. Assoc.

    (2013)
  • View full text