Skip to main content
Log in

Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview

  • Research Article
  • Published:
Journal of Healthcare Informatics Research Aims and scope Submit manuscript

Abstract

Electronic Health Records (EHR) are increasingly being perceived as a unique source of data for clinical research as they provide unprecedentedly large volumes of real-time data from real-world settings. In this review of the secondary uses of EHR, we identify the anticipated breadth of opportunities, pointing out the data deficiencies and potential biases that are likely to limit the search for true causal relationships. This paper provides a comprehensive overview of the types of biases that arise along the pathways that generate real-world evidence and the sources of these biases. We distinguish between two levels in the production of EHR data where biases are likely to arise: (i) at the healthcare system level, where the principal source of bias resides in access to, and provision of, medical care, and in the acquisition and documentation of medical and administrative data; and (ii) at the research level, where biases arise from the processes of extracting, analyzing, and interpreting these data. Due to the plethora of biases, mainly in the form of selection and information bias, we conclude with advising extreme caution about making causal inferences based on secondary uses of EHRs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Sandhu E, Weinstein S, McKethan A, Jain SH (2012) Secondary uses of electronic health record data: benefits and barriers. Jt Comm J Qual Patient Saf 38(1):34–40. https://doi.org/10.1016/s1553-7250(12)38005-7

    Article  Google Scholar 

  2. Liu M, Qi Y, Wang W, Sun X (2022) Toward a better understanding about real-world evidence. Eur J Hosp Pharm 29(1):8–11. https://doi.org/10.1136/ejhpharm-2021-003081

    Article  Google Scholar 

  3. Concato J, Corrigan-Curay J (2022) Real-world evidence - where are we now? N Engl J Med 386(18):1680–1682. https://doi.org/10.1056/NEJMp2200089

    Article  Google Scholar 

  4. Holmes JH, Beinlich J, Boland MR, Bowles KH, Chen Y, Cook TS, Demiris G, Draugelis M, Fluharty L, Gabriel PE et al (2021) Why is the Electronic Health Record so challenging for Research and Clinical Care? Methods Inf Med 60(1–02):32–48. https://doi.org/10.1055/s-0041-1731784

    Article  Google Scholar 

  5. Gianfrancesco MA, Goldstein ND (2021) A narrative review on the validity of electronic health record-based research in epidemiology. BMC Med Res Methodol 21(1):234. https://doi.org/10.1186/s12874-021-01416-5

    Article  Google Scholar 

  6. Knevel R, Liao KP (2023) From real-world electronic health record data to real-world results using artificial intelligence. Ann Rheum Dis 82(3):306–311. https://doi.org/10.1136/ard-2022-222626

    Article  Google Scholar 

  7. Food US, and Drug Administration (FDA) (2021). Real-World Data: Assessing Electronic Health RecordsMedical Claims Data To Support Regulatory Decision-Making for DrugBiological Products. https://www.fda.gov/media/152503/download. Accessed June 2023

  8. Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, LaVange L, Marinac-Dabic D, Marks PW, Robb MA et al (2016) Real-world evidence - what is it and what can it tell us? N Engl J Med 375(23):2293–2297. https://doi.org/10.1056/NEJMsb1609216

    Article  Google Scholar 

  9. U.S. Food and Drug Administration (FDA). Framework for FDA’s Real-World Evidence Program. https://www.fda.gov/media/120060/download. (2018) Accessed June 2023

  10. Duke-Margolis Center for Health Policy (2019) Determining Real-World Data’s Fitness for Use and the Role of Reliability. https://healthpolicy.duke.edu/sites/default/files/2019-11/rwd_reliability.pdf. Accessed June 2023

  11. Singhal P, Tan ALM, Drivas TG, Johnson KB, Ritchie MD (2023) Beaulieu-Jones: opportunities and challenges for biomarker discovery using electronic health record data. Trends Mol Med 29(9):765–776. https://doi.org/10.1016/j.molmed.2023.06.006

    Article  Google Scholar 

  12. Pasternak AL, Ward K, Irwin M, Okerberg C, Hayes D, Fritsche L, Zoellner S, Virzi J, Choe HM, Ellingrod V (2023) Identifying the prevalence of clinically actionable drug-gene interactions in a health system biorepository to guide pharmacogenetics implementation services. Clin Transl Sci 16(2):292–304. https://doi.org/10.1111/cts.13449

    Article  Google Scholar 

  13. Zhao Y, Tsubota T (2023) The current status of secondary use of claims, Electronic Medical Records, and Electronic Health Records in Epidemiology in Japan: Narrative Literature Review. JMIR Med Inform 11. https://doi.org/10.2196/39876

  14. Iott BE, Adler-Milstein J, Gottlieb LM, Pantell MS (2023) Characterizing the relative frequency of clinician engagement with structured social determinants of health data. J Am Med Inform Assoc 30(3):503–510. https://doi.org/10.1093/jamia/ocac251

    Article  Google Scholar 

  15. Dixit RA, Boxley CL, Samuel S, Mohan V, Ratwani RM, Gold JA (2023) Electronic Health Record Use issues and Diagnostic Error: a scoping review and Framework. J Patient Saf 19(1):e25–e30. https://doi.org/10.1097/pts.0000000000001081

    Article  Google Scholar 

  16. Modi S, Feldman SS (2022) The Value of Electronic Health Records since the Health Information Technology for Economic and Clinical Health Act: systematic review. JMIR Med Inform 10(9):e37283. https://doi.org/10.2196/37283

    Article  Google Scholar 

  17. Verheij RA, Curcin V, Delaney BC, McGilchrist MM (2018) Possible sources of Bias in Primary Care Electronic Health Record Data Use and Reuse. J Med Internet Res 20(5):e185. https://doi.org/10.2196/jmir.9134

    Article  Google Scholar 

  18. Last JM (1983) A Dictionary of Epidemiology. Oxford University Press, United Kingdom

    Google Scholar 

  19. Beesley LJ, Mukherjee B (2022) Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification. Biometrics 78(1):214–226. https://doi.org/10.1111/biom.13400

    Article  MathSciNet  Google Scholar 

  20. Bots SH, Groenwold RHH, Dekkers OM (2022) Using electronic health record data for clinical research: a quick guide. Eur J Endocrinol 186(4):E1–e6. https://doi.org/10.1530/eje-21-1088

    Article  Google Scholar 

  21. Romo ML, Chan PY, Lurie-Moroni E, Perlman SE, Newton-Dame R, Thorpe LE, McVeigh KH (2016) Characterizing adults receiving Primary Medical Care in New York City: Implications for Using Electronic Health Records for Chronic Disease Surveillance. Prev Chronic Dis 13. https://doi.org/10.5888/pcd13.150500

  22. Phelan M, Bhavsar NA, Goldstein BA (2017) EGEMS (Wash DC) 5(1):22. https://doi.org/10.5334/egems.243. Illustrating Informed Presence Bias in Electronic Health Records Data: How Patient Interactions with a Health System Can Impact Inference

  23. Agniel D, Kohane IS, Weber GM (2018) Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ k1479. 36110.1136/bmj.k1479

  24. Bower JK, Patel S, Rudy JE, Felix AS (2017) Addressing Bias in Electronic Health Record-based surveillance of Cardiovascular Disease Risk: finding the Signal through the noise. Curr Epidemiol Rep 4(4):346–352. https://doi.org/10.1007/s40471-017-0130-z

    Article  Google Scholar 

  25. Farmer R, Mathur R, Bhaskaran K, Eastwood SV, Chaturvedi N, Smeeth L (2018) Promises and pitfalls of electronic health record analysis. Diabetologia 61(6):1241–1248. https://doi.org/10.1007/s00125-017-4518-6

    Article  Google Scholar 

  26. Williams BA (2021) Constructing epidemiologic cohorts from Electronic Health Record Data. Int J Environ Res Public Health 18(24). https://doi.org/10.3390/ijerph182413193

  27. Beesley LJ, Mukherjee B (2020) : Bias reduction and inference for electronic health record data under selection and phenotype misclassification: three case studies. medRxiv https://doi.org/10.1101/2020.12.21.20248644

  28. Goldstein ND, Kahal D, Testa K, Gracely EJ, Burstyn I (2022) : Data Quality in Electronic Health Record Research: an Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health outcomes Via Diagnostic codes. Harv Data Sci Rev 4(2)

  29. Casey JA, Schwartz BS, Stewart WF, Adler NE Using Electronic Health Records for Population Health Research: a review of methods and applications. Annu Rev Public Health (2016), 37, p. 61–81. https://doi.org/10.1146/annurev-publhealth-032315-021353

  30. Peskoe SB, Arterburn D, Coleman KJ, Herrinton LJ, Daniels MJ, Haneuse S (2021) Adjusting for selection bias due to missing data in electronic health records-based research. Stat Methods Med Res 30(10):2221–2238. https://doi.org/10.1177/09622802211027601

    Article  MathSciNet  Google Scholar 

  31. Jin Y, Schneeweiss S, Merola D, Lin KJ (2022) Impact of longitudinal data-completeness of electronic health record data on risk score misclassification. J Am Med Inform Assoc 29(7):1225–1232. https://doi.org/10.1093/jamia/ocac043

    Article  Google Scholar 

  32. Haneuse S, Daniels M (2016) : A General Framework for Considering Selection Bias in EHR-Based Studies: What Data Are Observed and Why? EGEMS (Wash DC) 4(1), p. 1203. https://doi.org/10.13063/2327-9214.1203

  33. Congressional Research Service (CRS) (2016) The 21st Century Cures Act (Division A of P.L. 114–255). https://sgp.fas.org/crs/misc/R44720.pdf. Accessed June 2023

  34. Fernández L, Fossa A, Dong Z, Delbanco T, Elmore J, Fitzgerald P, Harcourt K, Perez J, Walker J, DesRoches C (2021) Words Matter: what do patients find judgmental or Offensive in Outpatient notes? J Gen Intern Med 36(9):2571–2578. https://doi.org/10.1007/s11606-020-06432-7

    Article  Google Scholar 

  35. Kohane IS, Aronow BJ, Avillach P, Beaulieu-Jones BK, Bellazzi R, Bradford RL, Brat GA, Cannataro M, Cimino JJ, García-Barrio N et al (2021) What every reader should know about studies using Electronic Health Record Data but May be afraid to ask. J Med Internet Res 23(3):e22219. https://doi.org/10.2196/22219

    Article  Google Scholar 

  36. Beesley LJ, Fritsche LG, Mukherjee B (2020) An analytic framework for exploring sampling and observation process biases in genome and phenome-wide association studies using electronic health records. Stat Med 39(14):1965–1979. https://doi.org/10.1002/sim.8524

    Article  MathSciNet  Google Scholar 

  37. Khurshid S, Reeder C, Harrington LX, Singh P, Sarma G, Friedman SF, Di Achille P, Diamant N, Cunningham JW, Turner AC et al (2022) Cohort design and natural language processing to reduce bias in electronic health records research. NPJ Digit Med 5(1):47. https://doi.org/10.1038/s41746-022-00590-0

    Article  Google Scholar 

  38. Huang J, Duan R, Hubbard RA, Wu Y, Moore JH, Xu H, Chen Y (2018) PIE: a prior knowledge guided integrated likelihood estimation method for bias reduction in association studies using electronic health records data. J Am Med Inform Assoc 25(3):345–352. https://doi.org/10.1093/jamia/ocx137

    Article  Google Scholar 

  39. Pendergrass SA, Crawford DC (2019) Using Electronic Health Records To Generate Phenotypes for Research. Curr Protoc Hum Genet 100(1):e80. https://doi.org/10.1002/cphg.80

    Article  Google Scholar 

  40. Agency for Healthcare Research and Quality (AHRQ) (2019) Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2. https://www.ncbi.nlm.nih.gov/books/NBK551879/pdf/Bookshelf_NBK551879.pdf. Accessed June 2023

  41. Muntner P, Einhorn PT, Cushman WC, Whelton PK, Bello NA, Drawz PE, Green BB, Jones DW, Juraschek SP, Margolis KL et al (2019) Blood pressure Assessment in adults in clinical practice and clinic-based research: JACC Scientific Expert Panel. J Am Coll Cardiol 73(3):317–335. https://doi.org/10.1016/j.jacc.2018.10.069

    Article  Google Scholar 

  42. Kim HS, Kim JH (2019) J Korean Med Sci 34(4):e28. https://doi.org/10.3346/jkms.2019.34.e28. Proceed with Caution When Using Real World Data and Real World Evidence

  43. van der Bij S, Khan N, Ten Veen P, de Bakker DH, Verheij RA (2017) Improving the quality of EHR recording in primary care: a data quality feedback tool. J Am Med Inform Assoc 24(1):81–87. https://doi.org/10.1093/jamia/ocw054

    Article  Google Scholar 

  44. Cowie MR, Blomster JI, Curtis LH, Duclaux S, Ford I, Fritz F, Goldman S, Janmohamed S, Kreuzer J, Leenay M et al (2017) Electronic health records to facilitate clinical research. Clin Res Cardiol 106(1):1–9. https://doi.org/10.1007/s00392-016-1025-6

    Article  Google Scholar 

  45. Brinkmann BH, Karoly PJ, Nurse ES, Dumanis SB, Nasseri M, Viana PF, Schulze-Bonhage A, Freestone DR, Worrell G, Richardson MP et al (2021) Seizure diaries and forecasting with wearables: Epilepsy Monitoring outside the clinic. Front Neurol 12:690404. https://doi.org/10.3389/fneur.2021.690404

    Article  Google Scholar 

  46. Shaw R, Stroo M, Fiander C, McMillan K (2020) Selecting Mobile Health Technologies for Electronic Health Record Integration: Case Study. J Med Internet Res 22(10):e23314. https://doi.org/10.2196/23314

    Article  Google Scholar 

  47. Dinh-Le C, Chuang R, Chokshi S, Mann D (2019) Wearable Health Technology and Electronic Health Record Integration: scoping review and future directions. JMIR Mhealth Uhealth 7(9):e12861. https://doi.org/10.2196/12861

    Article  Google Scholar 

  48. Collins T, Woolley SI, Oniani S, Pandyan A (2021) Quantifying missingness in Wearable Heart Rate recordings. Stud Health Technol Inform 281:1077–1078. https://doi.org/10.3233/SHTI210352

    Article  Google Scholar 

  49. Sun M, Oliwa T, Peek ME, Tung EL (2022) Negative patient descriptors: documenting racial Bias in the Electronic Health Record. Health Aff (Millwood) 41(2):203–211. https://doi.org/10.1377/hlthaff.2021.01423

    Article  Google Scholar 

  50. Bourgeois FC, Fossa A, Gerard M, Davis ME, Taylor YJ, Connor CD, Vaden T, McWilliams A, Spencer MD, Folcarelli P et al (2019) A patient and family reporting system for perceived ambulatory note mistakes: experience at 3 U.S. healthcare centers. J Am Med Inform Assoc 26(12):1566–1573. https://doi.org/10.1093/jamia/ocz142

    Article  Google Scholar 

  51. Lam BD, Bourgeois F, Dong ZJ, Bell SK (2021) Speaking up about patient-perceived serious visit note errors: patient and family experiences and recommendations. J Am Med Inform Assoc 28(4):685–694. https://doi.org/10.1093/jamia/ocaa293

    Article  Google Scholar 

  52. Lear R, Freise L, Kybert M, Darzi A, Neves AL, Mayer EK (2022) Patients’ willingness and ability to identify and respond to errors in their Personal Health records: mixed methods analysis of cross-sectional Survey Data. J Med Internet Res 24(7):e37226. https://doi.org/10.2196/37226

    Article  Google Scholar 

  53. Haneuse S, Bogart A, Jazic I, Westbrook EO, Boudreau D, Theis MK, Simon GE, Arterburn D (2016) Learning About Missing Data Mechanisms in Electronic Health Records-based Research: a Survey-based Approach. Epidemiology 27(1):82–90. https://doi.org/10.1097/ede.0000000000000393

    Article  Google Scholar 

  54. Little RJA, Rubin DB (2019) Statistical analysis with Missing Data. John Wiley & Sons, New York, NY

    Google Scholar 

  55. Groenwold RHH (2020) Informative missingness in electronic health record systems: the curse of knowing. Diagn Progn Res 8. 410.1186/s41512-020-00077-0

  56. Haneuse S, Arterburn D, Daniels MJ (2021) Assessing Missing Data assumptions in EHR-Based studies: a Complex and Underappreciated Task. JAMA Netw Open 4(2):e210184. https://doi.org/10.1001/jamanetworkopen.2021.0184

    Article  Google Scholar 

  57. Ford E, Rooney P, Hurley P, Oliver S, Bremner S, Cassell J (2020) Can the use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method using simulated and real-life Clinical Data. Front Public Health 8. https://doi.org/10.3389/fpubh.2020.00054

  58. Spiegelhalter DJ, Myles JP, Jones DR, Abrams KR (2000) Bayesian methods in health technology assessment: a review. Health Technol Assess 4(38):1–130

    Article  Google Scholar 

  59. Li J, Yan XS, Chaudhary D, Avula V, Mudiganti S, Husby H, Shahjouei S, Afshar A, Stewart WF, Yeasin M et al (2021) Imputation of missing values for electronic health record laboratory data. NPJ Digit Med 4(1):147. https://doi.org/10.1038/s41746-021-00518-0

    Article  Google Scholar 

  60. Cook LA, Sachs J, Weiskopf NG (2021) The quality of social determinants data in the electronic health record: a systematic review. J Am Med Inform Assoc 29(1):187–196. https://doi.org/10.1093/jamia/ocab199

    Article  Google Scholar 

  61. Sayon-Orea C, Moreno-Iribas C, Delfrade J, Sanchez-Echenique M, Amiano P, Ardanaz E, Gorricho J, Basterra G, Nuin M, Guevara M (2020) Inverse-probability weighting and multiple imputation for evaluating selection bias in the estimation of childhood obesity prevalence using data from electronic health records. BMC Med Inform Decis Mak 20(1):9. https://doi.org/10.1186/s12911-020-1020-8

    Article  Google Scholar 

  62. Streeter AJ, Lin NX, Crathorne L, Haasova M, Hyde C, Melzer D (2017) and W.E. Henley: Adjusting for unmeasured confounding in nonrandomized longitudinal studies: a methodological review. J Clin Epidemiol 87, p. 23–34. https://doi.org/10.1016/j.jclinepi.2017.04.022

  63. Uddin MJ, Groenwold RH, Ali MS, de Boer A, Roes KC, Chowdhury MA, Klungel OH Methods to control for unmeasured confounding in pharmacoepidemiology: an overview. Int J Clin Pharm (2016), 38(3), p. 714–723. https://doi.org/10.1007/s11096-016-0299-0

  64. Richardson DB, Tchetgen Tchetgen EJ (2021) Bespoke instruments: a new tool for addressing unmeasured confounders. Am J Epidemiol. https://doi.org/10.1093/aje/kwab288

    Article  Google Scholar 

  65. Krishnamoorthy V, McLean D, Ohnuma T, Harris SK, Wong DJN, Wilson M, Moonesinghe R, Raghunathan K (2020) Causal inference in perioperative medicine observational research: part 2, advanced methods. Br J Anaesth 125(3):398–405. https://doi.org/10.1016/j.bja.2020.03.032

    Article  Google Scholar 

  66. Craig P, Katikireddi SV, Leyland A, Popham F (2017) Natural experiments: an overview of methods, approaches, and contributions to Public Health Intervention Research. Annu Rev Public Health 38:39–56. https://doi.org/10.1146/annurev-publhealth-031816-044327

    Article  Google Scholar 

  67. Kontopantelis E, Doran T, Springate DA, Buchan I, Reeves D Regression based quasi-experimental approach when randomisation is not an option: interrupted time series analysis. BMJ (2015), 350, p. h2750. https://doi.org/10.1136/bmj.h2750

  68. Lee WC (2014) Detecting and correcting the bias of unmeasured factors using perturbation analysis: a data-mining approach. BMC Med Res Methodol 14:18. https://doi.org/10.1186/1471-2288-14-18

    Article  Google Scholar 

  69. Balalian AA, Daniel S, Simonyan H, Khachadourian V (2022) Comparison of conditional and marginal models in assessing a child Nutrition intervention in Armenia. Matern Child Health. https://doi.org/10.1007/s10995-021-03308-y

    Article  Google Scholar 

  70. Fujiwara Y, Fukuda S, Tsujie M, Kitani K, Yukawa M, Inoue M, Watanabe Y, Higashida M, Kubota H, Okada T et al (2019) Clinical significance of preoperative chemoradiotherapy for advanced Esophageal cancer, evaluated by propensity score matching and weighting of inverse probability of treatment. Mol Clin Oncol 10(6):575–582. https://doi.org/10.3892/mco.2019.1843

    Article  Google Scholar 

  71. Allan V, Ramagopalan SV, Mardekian J, Jenkins A, Li X, Pan X, Luo X (2020) Propensity score matching and inverse probability of treatment weighting to address confounding by indication in comparative effectiveness research of oral anticoagulants. J Comp Eff Res 9(9):603–614. https://doi.org/10.2217/cer-2020-0013

    Article  Google Scholar 

  72. Austin PC, Stuart EA (2017) The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes. Stat Methods Med Res 26(4):1654–1670. https://doi.org/10.1177/0962280215584401

    Article  MathSciNet  Google Scholar 

  73. Schneeweiss S (2006) Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf 15(5):291–303. https://doi.org/10.1002/pds.1200

    Article  Google Scholar 

  74. Liaw ST, Taggart J, Yu H, de Lusignan S (2013) Data extraction from electronic health records - existing tools may be unreliable and potentially unsafe. Aust Fam Physician 42(11):820–823

    Google Scholar 

  75. U.S. Food and Drug Administration (FDA) (2018) Use of Electronic Health Record Data in Clinical Investigations Guidance for Industry. https://www.fda.gov/media/97567/download. Accessed June 2023

  76. Hripcsak G, Albers DJ (2013) Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 20(1):117–121. https://doi.org/10.1136/amiajnl-2012-001145

    Article  Google Scholar 

  77. Rowe M (2019) An introduction to machine learning for clinicians. Acad Med 94(10):1433–1436. https://doi.org/10.1097/acm.0000000000002792

    Article  MathSciNet  Google Scholar 

  78. Nair S, Hsu D, Celi LA (2016) Challenges and opportunities in Secondary Analyses of Electronic Health Record Data, in Secondary Analysis of Electronic Health Records. Springer, Cham (CH), pp 17–26

    Book  Google Scholar 

  79. Hernán MA, Sauer BC, Hernández-Díaz S, Platt R, Shrier I Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. J Clin Epidemiol (2016), 79, p. 70–75. https://doi.org/10.1016/j.jclinepi.2016.04.014

  80. Hernán MA, Robins JM (2016) Using Big Data to emulate a target Trial when a Randomized Trial is not available. Am J Epidemiol 183(8):758–764. https://doi.org/10.1093/aje/kwv254

    Article  Google Scholar 

  81. Snoep JD, Morabia A, Hernández-Díaz S, Hernán MA, Vandenbroucke JP (2014) Commentary: a structural approach to Berkson’s fallacy and a guide to a history of opinions about it. Int J Epidemiol 43(2):515–521. https://doi.org/10.1093/ije/dyu026

    Article  Google Scholar 

  82. Berkson J (1946) Limitations of the application of fourfold table analysis to hospital data. Biometrics 2(3):47–53

    Article  Google Scholar 

  83. Sackett DL (1979) Bias in analytic research. J Chronic Dis 32(1–2):51–63. https://doi.org/10.1016/0021-9681(79)90012-2

    Article  Google Scholar 

  84. Goldstein BA, Bhavsar NA, Phelan M, Pencina MJ (2016) Controlling for Informed Presence Bias due to the Number of Health Encounters in an Electronic Health Record. Am J Epidemiol 184(11):847–855. https://doi.org/10.1093/aje/kww112

    Article  Google Scholar 

  85. Goldstein BA, Phelan M, Pagidipati NJ, Peskoe SB (2019) How and when informative visit processes can bias inference when using electronic health records data for clinical research. J Am Med Inform Assoc 26(12):1609–1617. https://doi.org/10.1093/jamia/ocz148

    Article  Google Scholar 

  86. Harton J, Mitra N, Hubbard RA (2022) Informative presence bias in analyses of electronic health records-derived data: a cautionary note. J Am Med Inform Assoc 29(7):1191–1199. https://doi.org/10.1093/jamia/ocac050

    Article  Google Scholar 

  87. McGee G, Haneuse S, Coull BA, Weisskopf MG, Rotem RS (2022) On the Nature of Informative Presence Bias in Analyses of Electronic Health Records. Epidemiology 33(1):105–113. https://doi.org/10.1097/ede.0000000000001432

    Article  Google Scholar 

  88. Gokhale M, Stürmer T, Buse JB (2020) Real-world evidence: the devil is in the detail. Diabetologia 63(9):1694–1705. https://doi.org/10.1007/s00125-020-05217-1

    Article  Google Scholar 

  89. Suissa S (2008) Immortal time bias in pharmaco-epidemiology. Am J Epidemiol 167(4):492–499. https://doi.org/10.1093/aje/kwm324

    Article  Google Scholar 

  90. Tyrer F, Bhaskaran K, Rutherford MJ (2022) Immortal time bias for life-long conditions in retrospective observational studies using electronic health records. BMC Med Res Methodol 22(1):86. https://doi.org/10.1186/s12874-022-01581-1

    Article  Google Scholar 

  91. Lévesque LE, Hanley JA, Kezouh A, Suissa S (2010) Problem of immortal time bias in cohort studies: example using statins for preventing progression of Diabetes. BMJ 340:b5087. https://doi.org/10.1136/bmj.b5087

    Article  Google Scholar 

  92. Iudici M, Porcher R, Riveros C, Ravaud P (2019) Time-dependent biases in observational studies of comparative effectiveness research in rheumatology. A methodological review. Ann Rheum Dis 78(4):562–569. https://doi.org/10.1136/annrheumdis-2018-214544

    Article  Google Scholar 

  93. CoB C Catalogue of Bias. Oxford: England, UK: University of Oxford

  94. O’Sullivan JW, Banerjee A, Heneghan C, Pluddemann A (2018) Verification bias. BMJ Evid Based Med 23(2):54–55. https://doi.org/10.1136/bmjebm-2018-110919

    Article  Google Scholar 

  95. de Groot JA, Dendukuri N, Janssen KJ, Reitsma JB, Brophy J, Joseph L, Bossuyt PM, Moons KG (2012) Adjusting for partial verification or workup bias in meta-analyses of diagnostic accuracy studies. Am J Epidemiol 175(8):847–853. https://doi.org/10.1093/aje/kwr383

    Article  Google Scholar 

  96. Brown CA, Londhe AA, He F, Cheng A, Ma J, Zhang J, Brooks CG, Sprafka JM, Roehl KA, Carlson KB et al (2022) Development and Validation of algorithms to identify COVID-19 patients using a US Electronic Health Records Database: a retrospective cohort study. Clin Epidemiol 14:699–709. https://doi.org/10.2147/clep.S355086

    Article  Google Scholar 

  97. Horwitz RI, Feinstein AR (1980) The problem of protopathic bias in case-control studies. Am J Med 68(2):255–258. https://doi.org/10.1016/0002-9343(80)90363-0

    Article  Google Scholar 

  98. Singh A, Hussain S, Akkala S, Klugarová J, Pokorná A, Klugar M, Walters EH, Hopper I, Campbell JA, Taylor B et al (2022) Beta-adrenergic drugs and risk of Parkinson’s disease: A systematic review and meta-analysis. Ageing Res Rev 80. https://doi.org/10.1016/j.arr.2022.101670

  99. Tamim H, Monfared AA, LeLorier J (2007) Application of lag-time into exposure definitions to control for protopathic bias. Pharmacoepidemiol Drug Saf 16(3):250–258. https://doi.org/10.1002/pds.1360

    Article  Google Scholar 

  100. Faillie JL (2015) Indication bias or protopathic bias? Br J Clin Pharmacol 80(4):779–780. https://doi.org/10.1111/bcp.12705

    Article  Google Scholar 

  101. Prada-Ramallal G, Takkouche B, Figueiras A (2019) Bias in pharmacoepidemiologic studies using secondary health care databases: a scoping review. BMC Med Res Methodol 19(1):53. https://doi.org/10.1186/s12874-019-0695-y

    Article  Google Scholar 

  102. Murk W, Risnes KR, Bracken MB (2011) Prenatal or early-life exposure to antibiotics and risk of childhood Asthma: a systematic review. Pediatrics 127(6):1125–1138. https://doi.org/10.1542/peds.2010-2092

    Article  Google Scholar 

  103. Lo CH, Ni P, Yan Y, Ma W, Joshi AD, Nguyen LH, Mehta RS, Lochhead P, Song M, Curhan GC et al (2022) : Association of Proton Pump Inhibitor Use With All-Cause and Cause-Specific Mortality. Gastroenterology 163(4), p. 852–861.e2. https://doi.org/10.1053/j.gastro.2022.06.067

  104. Walker AM (1996) Confounding by indication. Epidemiology 7(4):335–336

    Google Scholar 

  105. Salas M, Hofman A, Stricker BH (1999) Confounding by indication: an example of variation in the use of epidemiologic terminology. Am J Epidemiol 149(11):981–983. https://doi.org/10.1093/oxfordjournals.aje.a009758

    Article  Google Scholar 

  106. Kyriacou DN, Lewis RJ (2016) Confounding by indication in Clinical Research. JAMA 316(17):1818–1819. https://doi.org/10.1001/jama.2016.16435

    Article  Google Scholar 

  107. Freemantle N, Marston L, Walters K, Wood J, Reynolds MR, Petersen I (2013) Making inferences on treatment effects from real world data: propensity scores, confounding by indication, and other perils for the unwary in observational research. BMJ 347:f6409. https://doi.org/10.1136/bmj.f6409

    Article  Google Scholar 

  108. Wang SV, Schneeweiss S (2022) Assessing and interpreting real-world evidence studies: introductory points for New Reviewers. Clin Pharmacol Ther 111(1):145–149. https://doi.org/10.1002/cpt.2398

    Article  Google Scholar 

  109. Orsini LS, Monz B, Mullins CD, Van Brunt D, Daniel G, Eichler HG, Graff J, Guerino J, Berger M, Lederer NM et al (2020) Improving transparency to build trust in real-world secondary data studies for hypothesis testing-Why, what, and how: recommendations and a road map from the real-world evidence transparency initiative. Pharmacoepidemiol Drug Saf 29(11):1504–1513. https://doi.org/10.1002/pds.5079

    Article  Google Scholar 

  110. Dreyer NA, Bryant A, Velentgas P (2016) The GRACE Checklist: a validated Assessment Tool for High Quality Observational studies of comparative effectiveness. J Manag Care Spec Pharm 22(10):1107–1113. https://doi.org/10.18553/jmcp.2016.22.10.1107

    Article  Google Scholar 

  111. Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, Sorensen HT, von Elm E, Langan SM (2015) The REporting of studies conducted using Observational routinely-collected health data (RECORD) statement. PLoS Med 12(10):e1001885. https://doi.org/10.1371/journal.pmed.1001885

    Article  Google Scholar 

  112. Slim K, Nini E, Forestier D, Kwiatkowski F, Panis Y, Chipponi J (2003) Methodological index for non-randomized studies (minors): development and validation of a new instrument. ANZ J Surg 73(9):712–716. https://doi.org/10.1046/j.1445-2197.2003.02748.x

    Article  Google Scholar 

  113. Wang SV, Pottegård A, Crown W, Arlett P, Ashcroft DM, Benchimol EI, Berger ML, Crane G, Goettsch W, Hua W et al (2023) HARmonized Protocol Template to enhance reproducibility of hypothesis evaluating real-world evidence studies on treatment effects: a good practices report of a joint ISPE/ISPOR task force. Pharmacoepidemiol Drug Saf 32(1):44–55. https://doi.org/10.1002/pds.5507

    Article  Google Scholar 

  114. Wells GA, Shea B, O’Connell D et al (2021) The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. https://www.ohri.ca/programs/clinical_epidemiology/oxford.asp. Accessed June 2023

  115. Friebel R, Steventon A (2019) Composite measures of healthcare quality: sensible in theory, problematic in practice. BMJ Qual Saf 28(2):85–88. https://doi.org/10.1136/bmjqs-2018-008280

    Article  Google Scholar 

  116. Austin PC, Ceyisakar IE, Steyerberg EW, Lingsma HF (2019) Marang-Van De Mheen: ranking hospital performance based on individual indicators: can we increase reliability by creating composite indicators? BMC Med Res Methodol 19(1):131. https://doi.org/10.1186/s12874-019-0769-x

    Article  Google Scholar 

  117. Greco S, Ishizaka A, Tasiou M, Torrisi G (2019) On the Methodological Framework of Composite indices: a review of the issues of weighting, aggregation, and Robustness. Soc Indic Res 141(1):61–94. https://doi.org/10.1007/s11205-017-1832-9

    Article  Google Scholar 

  118. Kara P, Valentin JB, Mainz J, Johnsen SP (2022) Composite measures of quality of health care: evidence mapping of methodology and reporting. PLoS ONE 17(5):e0268320. https://doi.org/10.1371/journal.pone.0268320

    Article  Google Scholar 

  119. Localio AR, Berlin JA, Ten TR, Have, Kimmel SE (2001) Adjustments for center in multicenter studies: an overview. Ann Intern Med 135(2):112–123. https://doi.org/10.7326/0003-4819-135-2-200107170-00012

    Article  Google Scholar 

  120. The Observational Health Data Sciences and Informatics (2023) https://www.ohdsi.org/. Accessed

  121. National Patient-Centered Clinical Research Network (2023) https://pcornet.org/. Accessed

  122. N3C (2023) https://covid.cd2h.org/. Accessed

  123. The Office of the National Coordinator for Health Information Technology (ONC) (2023) Trusted Exchange Framework and Common Agreement (TEFCA). https://www.healthit.gov/topic/interoperability/policy/trusted-exchange-framework-and-common-agreement-tefca. Accessed

  124. Mandel JC, Pollak JP, Mandl KD (2022) The patient role in a Federal National-Scale Health Information Exchange. J Med Internet Res 24(11):e41750. https://doi.org/10.2196/41750

    Article  Google Scholar 

  125. Sheller MJ, Edwards B, Reina GA, Martin J, Pati S, Kotrotsou A, Milchenko M, Xu W, Marcus D, Colen RR et al (2020) Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci Rep 10(1):12598. https://doi.org/10.1038/s41598-020-69250-1

    Article  Google Scholar 

  126. Pati S, Baid U, Edwards B, Sheller M, Wang SH, Reina GA, Foley P, Gruzdev A, Karkada D, Davatzikos C et al (2022) Federated learning enables big data for rare cancer boundary detection. Nat Commun 13(1):7346. https://doi.org/10.1038/s41467-022-33407-5

    Article  Google Scholar 

  127. Wikipedia (2023) Federated Learning. https://en.wikipedia.org/wiki/Federated_learning Accessed

Download references

Acknowledgements

We would like to thank Dr. Maitreyi Mazumdar from Harvard Medical School for supporting and revising this manuscript.

Funding

No funding was available for this manuscript. The author BAS prepared this manuscript while completing her postdoctoral training in the Department of Epidemiology and Biostatistics supported by the Office of Research in Women’s Health, National Institute of Health under Award Number 3UH3OD023285-06S1, linked to the parent award UH3OD023285.

Author information

Authors and Affiliations

Authors

Contributions

BAS: Made substantial contribution to the conception and design of the paper; performed literature review and prepared all the drafts of the paper.AL: Made substantial contribution to the conception of the paper and design of the paper; performed literature review; provided technical support and advice; and contributed to the write up of the first draft of the paper; critically revised the final paper and all its drafts for important intellectual content.TL: Made substantial contribution to the conception of the paper and design of the paper; provided technical support and advice; critically revised the final paper and all its drafts for important intellectual content.NP: Made substantial contribution to the conception of the paper and design of the paper; provided technical support and advice; critically revised the final paper and all its drafts for important intellectual content.BZ: Made substantial contribution to the conception of the paper and design of the paper; provided technical and statistical support and advice; critically revised the final paper and all its drafts for important intellectual content.

Corresponding author

Correspondence to Ban Al-Sahab.

Ethics declarations

Competing Interests

None of the authors have any conflict of interest regarding the material discussed in the manuscript.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Sahab, B., Leviton, A., Loddenkemper, T. et al. Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview. J Healthc Inform Res 8, 121–139 (2024). https://doi.org/10.1007/s41666-023-00153-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41666-023-00153-2

Keywords

Navigation