abstract

Public Access

Mining Electronic Health Records for Real-World Evidence

Authors:

Fei WangAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 5837 - 5838

https://doi.org/10.1145/3580305.3599566

Published: 04 August 2023 Publication History

Abstract

The rapid accumulation of large-scale Electronic Health Records (EHR) presents considerable opportunities to generate real-world evidence to inform clinical decision-making and accelerate drug development. However, the complexity of EHR has turned them into a formidable testing ground for cutting-edge AI algorithms. Furthermore, a significant gap still exists between algorithm development in the computer science community and clinical translation within the healthcare community. This tutorial aims to bridge this divide by fostering mutual understanding between the two communities by discussing using advanced machine learning and data mining technologies tailored to tackle real-world healthcare challenges, including 1) using EHR and trial emulation for understanding Long Covid and drug repurposing for Alzheimer's disease, and 2) risk prediction and associated fairness, interpretability, generalizability, etc., issues. We will conclude this tutorial by delving into potential opportunities for future research and unveiling the prospects of a career as a health data scientist.

References

[1]

Bing Bai, Jian Liang, Guanhua Zhang, Hao Li, Kun Bai, and Fei Wang. 2021. Why attentions may not be interpretable?. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 25--34.

Digital Library

[2]

Office of the Commissioner. 2023. Real-World Evidence. https://www.fda.gov/science-research/science-and-research-special-topics/real-world-evidence Publisher: FDA.

[3]

John Concato and Jacqueline Corrigan-Curay. 2022. Real-world evidence-where are we now? The New England journal of medicine, Vol. 386, 18 (2022), 1680--1682.

[4]

Sen Cui, Weishen Pan, Jian Liang, Changshui Zhang, and Fei Wang. 2021a. Addressing algorithmic disparity and performance inconsistency in federated learning. Advances in Neural Information Processing Systems, Vol. 34 (2021), 26091--26102.

[5]

Sen Cui, Weishen Pan, Changshui Zhang, and Fei Wang. 2021b. Towards model-agnostic post-hoc adjustment for balancing ranking fairness and algorithm utility. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 207--217.

Digital Library

[6]

Dhruv Khullar, Yongkang Zhang, Chengxi Zang, Zhenxing Xu, Fei Wang, Mark G Weiner, Thomas W Carton, Russell L Rothman, Jason P Block, and Rainu Kaushal. 2023. Racial/Ethnic Disparities in Post-acute Sequelae of SARS-CoV-2 Infection in New York: an EHR-Based Cohort Study from the RECOVER Program. Journal of General Internal Medicine, Vol. 38, 5 (2023), 1127--1136.

[7]

Weishen Pan, Sen Cui, Jiang Bian, Changshui Zhang, and Fei Wang. 2021. Explaining algorithmic fairness through fairness-aware causal path decomposition. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1287--1297.

Digital Library

[8]

Chang Su, Robert Aseltine, Riddhi Doshi, Kun Chen, Steven C Rogers, and Fei Wang. 2020. Machine learning for suicide risk prediction in children and adolescents with electronic health records. Translational psychiatry (2020), 413.

[9]

Jay K Varma, Chengxi Zang, Thomas W Carton, Jason P Block, Dhruv J Khullar, Yongkang Zhang, Mark G Weiner, Russell L Rothman, Edward J Schenck, Zhenxing Xu, et al. 2023. Excess burden of respiratory and abdominal conditions following COVID-19 infections during the ancestral and Delta variant periods in the United States: An EHR-based cohort study from the RECOVER Program. medRxiv (2023), 2023--02.

[10]

Fei Wang, Rainu Kaushal, and Dhruv Khullar. 2020. Should health care demand interpretable artificial intelligence or accept ?black box" medicine?, 59--60 pages.

[11]

Tingyi Wanyan, Hossein Honarvar, Suraj K Jaladanki, Chengxi Zang, Nidhi Naik, Sulaiman Somani, Jessica K De Freitas, Ishan Paranjpe, Akhil Vaid, Jing Zhang, et al. 2021. Contrastive learning improves critical event prediction in COVID-19 patients. Patterns, Vol. 2, 12 (2021), 100389.

[12]

Jie Xu, Fei Wang, Chengxi Zang, Hao Zhang, Kellyann Niotis, Ava L Liberman, Cynthia M Stonnington, Makoto Ishii, Prakash Adekkanattu, Yuan Luo, et al. 2023. Comparing the effects of four common drug classes on the progression of mild cognitive impairment to dementia using electronic health records. Scientific Reports, Vol. 13, 1 (2023), 8102.

[13]

He S Yang, Yu Hou, Ljiljana V Vasovic, Peter AD Steel, Amy Chadburn, Sabrina E Racine-Brzostek, Priya Velu, Melissa M Cushing, Massimo Loda, Rainu Kaushal, et al. 2020. Routine laboratory blood tests predict SARS-CoV-2 infection using machine learning. Clinical chemistry, Vol. 66, 11 (2020), 1396--1404.

[14]

He S Yang, Daniel D Rhoads, Jorge Sepulveda, Chengxi Zang, Amy Chadburn, and Fei Wang. 2022. Building the Model Challenges and Considerations of Developing and Implementing Machine Learning Tools for Clinical Laboratory Medicine Practice. Archives of Pathology & Laboratory Medicine (2022).

[15]

Chengxi Zang, Marianne Goodman, Zheng Zhu, Lulu Yang, Ziwei Yin, Zsuzsanna Tamas, Vikas Mohan Sharma, Fei Wang, and Nan Shao. 2022a. Development of a screening algorithm for borderline personality disorder using electronic health records. Scientific Reports, Vol. 12, 1 (2022), 1--12.

[16]

Chengxi Zang, Yu Hou, Edward Schenck, Zhenxing Xu, Yongkang Zhang, Jie Xu, Jiang Bian, Dmitry Morozyuk, Dhruv Khullar, Anna Nordvig, et al. 2023 a. Risk Factors and Predictive Modeling for Post-Acute Sequelae of SARS-CoV-2 Infection: Findings from EHR Cohorts of the RECOVER Initiative. Research Square (2023), rs-3.

[17]

Chengxi Zang and Fei Wang. 2021. SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 857--866.

[18]

Chengxi Zang, Hao Zhang, Jie Xu, Hansi Zhang, Sajjad Fouladvand, Shreyas Havaldar, Feixiong Cheng, Kun Chen, Yong Chen, Benjamin S Glicksberg, et al. 2022b. High-throughput clinical trial emulation with real world data and machine learning: a case study of drug repurposing for Alzheimer's disease. medRxiv (2022), 2022-01.

[19]

Chengxi Zang, Yongkang Zhang, Jie Xu, Jiang Bian, Dmitry Morozyuk, Edward J Schenck, Dhruv Khullar, Anna S Nordvig, Elizabeth A Shenkman, Russell L Rothman, et al. 2023 b. Data-driven analysis to understand long COVID using electronic health records from the RECOVER initiative. Nature Communications, Vol. 14, 1 (2023), 1948.

[20]

Hao Zhang, Chengxi Zang, Zhenxing Xu, Yongkang Zhang, Jie Xu, Jiang Bian, Dmitry Morozyuk, Dhruv Khullar, Yiye Zhang, Anna S Nordvig, et al. 2023 b. Data-driven identification of post-acute SARS-CoV-2 infection subphenotypes. Nature Medicine, Vol. 29, 1 (2023), 226--235.

[21]

Xi Sheryl Zhang, Fengyi Tang, Hiroko H Dodge, Jiayu Zhou, and Fei Wang. 2019. Metapred: Meta-learning for clinical risk prediction with limited patient electronic health records. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2487--2495.

Digital Library

[22]

Yongkang Zhang, Hui Hu, Vasilios Fokaidis, Jie Xu, Chengxi Zang, Zhenxing Xu, Fei Wang, Michael Koropsak, Jiang Bian, Jaclyn Hall, et al. 2023 a. Identifying environmental risk factors for post-acute sequelae of SARS-CoV-2 infection: An EHR-based cohort study from the recover program. Environmental Advances, Vol. 11 (2023), 100352.

Cited By

Roy SSundaram SWolff DGanguly NNejdl WAuer SKarras OCha MMoens MNajork M(2025)Building Trustworthy AI Models for Medicine: From Theory to ApplicationsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703477(1012-1015)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703477

Index Terms

Mining Electronic Health Records for Real-World Evidence
1. Applied computing
  1. Life and medical sciences
    1. Health care information systems
    2. Health informatics
2. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Causal reasoning and diagnostics

Recommendations

Electronic health records: how can IS researchers contribute to transforming healthcare?

Electronic health records (EHR) facilitate integration of patient health history for planning safe and proper treatment. Combined with data analytics, aggregate-level EHR enable examination and development of effective medicines and therapies for ...
Meaningful Use of Electronic Health Records for Physician Collaboration: A Patient Centered Health Care Perspective
HICSS '14: Proceedings of the 2014 47th Hawaii International Conference on System Sciences

EHRs (Electronic Health Records), can contribute greatly to improving care and managing the rising costs of healthcare. The use and the integration of EHRs (Electronic Health Records) in supporting collaboration to increase the efficiency and ...
Mining Electronic Health Records

Initial efforts to mine electronic health records are unlikely to yield many Eureka insights, but there are many opportunities for improving the delivery, efficiency, and effectiveness of healthcare.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2023

5996 pages

ISBN:9798400701030

DOI:10.1145/3580305

General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Check for updates

Author Tags

Qualifiers

Abstract

Funding Sources

NSF (National Science Foundation)

Conference

KDD '23

Sponsor:

KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 6 - 10, 2023

CA, Long Beach, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
367
Total Downloads

Downloads (Last 12 months)178
Downloads (Last 6 weeks)23

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Roy SSundaram SWolff DGanguly NNejdl WAuer SKarras OCha MMoens MNajork M(2025)Building Trustworthy AI Models for Medicine: From Theory to ApplicationsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703477(1012-1015)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3703477

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten