skip to main content
10.1145/3580305.3599566acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
abstract
Free Access

Mining Electronic Health Records for Real-World Evidence

Published:04 August 2023Publication History

ABSTRACT

The rapid accumulation of large-scale Electronic Health Records (EHR) presents considerable opportunities to generate real-world evidence to inform clinical decision-making and accelerate drug development. However, the complexity of EHR has turned them into a formidable testing ground for cutting-edge AI algorithms. Furthermore, a significant gap still exists between algorithm development in the computer science community and clinical translation within the healthcare community. This tutorial aims to bridge this divide by fostering mutual understanding between the two communities by discussing using advanced machine learning and data mining technologies tailored to tackle real-world healthcare challenges, including 1) using EHR and trial emulation for understanding Long Covid and drug repurposing for Alzheimer's disease, and 2) risk prediction and associated fairness, interpretability, generalizability, etc., issues. We will conclude this tutorial by delving into potential opportunities for future research and unveiling the prospects of a career as a health data scientist.

References

  1. Bing Bai, Jian Liang, Guanhua Zhang, Hao Li, Kun Bai, and Fei Wang. 2021. Why attentions may not be interpretable?. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 25--34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Office of the Commissioner. 2023. Real-World Evidence. https://www.fda.gov/science-research/science-and-research-special-topics/real-world-evidence Publisher: FDA.Google ScholarGoogle Scholar
  3. John Concato and Jacqueline Corrigan-Curay. 2022. Real-world evidence-where are we now? The New England journal of medicine, Vol. 386, 18 (2022), 1680--1682.Google ScholarGoogle Scholar
  4. Sen Cui, Weishen Pan, Jian Liang, Changshui Zhang, and Fei Wang. 2021a. Addressing algorithmic disparity and performance inconsistency in federated learning. Advances in Neural Information Processing Systems, Vol. 34 (2021), 26091--26102.Google ScholarGoogle Scholar
  5. Sen Cui, Weishen Pan, Changshui Zhang, and Fei Wang. 2021b. Towards model-agnostic post-hoc adjustment for balancing ranking fairness and algorithm utility. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 207--217.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dhruv Khullar, Yongkang Zhang, Chengxi Zang, Zhenxing Xu, Fei Wang, Mark G Weiner, Thomas W Carton, Russell L Rothman, Jason P Block, and Rainu Kaushal. 2023. Racial/Ethnic Disparities in Post-acute Sequelae of SARS-CoV-2 Infection in New York: an EHR-Based Cohort Study from the RECOVER Program. Journal of General Internal Medicine, Vol. 38, 5 (2023), 1127--1136.Google ScholarGoogle ScholarCross RefCross Ref
  7. Weishen Pan, Sen Cui, Jiang Bian, Changshui Zhang, and Fei Wang. 2021. Explaining algorithmic fairness through fairness-aware causal path decomposition. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1287--1297.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chang Su, Robert Aseltine, Riddhi Doshi, Kun Chen, Steven C Rogers, and Fei Wang. 2020. Machine learning for suicide risk prediction in children and adolescents with electronic health records. Translational psychiatry (2020), 413.Google ScholarGoogle Scholar
  9. Jay K Varma, Chengxi Zang, Thomas W Carton, Jason P Block, Dhruv J Khullar, Yongkang Zhang, Mark G Weiner, Russell L Rothman, Edward J Schenck, Zhenxing Xu, et al. 2023. Excess burden of respiratory and abdominal conditions following COVID-19 infections during the ancestral and Delta variant periods in the United States: An EHR-based cohort study from the RECOVER Program. medRxiv (2023), 2023--02.Google ScholarGoogle Scholar
  10. Fei Wang, Rainu Kaushal, and Dhruv Khullar. 2020. Should health care demand interpretable artificial intelligence or accept ?black box" medicine?, 59--60 pages.Google ScholarGoogle Scholar
  11. Tingyi Wanyan, Hossein Honarvar, Suraj K Jaladanki, Chengxi Zang, Nidhi Naik, Sulaiman Somani, Jessica K De Freitas, Ishan Paranjpe, Akhil Vaid, Jing Zhang, et al. 2021. Contrastive learning improves critical event prediction in COVID-19 patients. Patterns, Vol. 2, 12 (2021), 100389.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jie Xu, Fei Wang, Chengxi Zang, Hao Zhang, Kellyann Niotis, Ava L Liberman, Cynthia M Stonnington, Makoto Ishii, Prakash Adekkanattu, Yuan Luo, et al. 2023. Comparing the effects of four common drug classes on the progression of mild cognitive impairment to dementia using electronic health records. Scientific Reports, Vol. 13, 1 (2023), 8102.Google ScholarGoogle ScholarCross RefCross Ref
  13. He S Yang, Yu Hou, Ljiljana V Vasovic, Peter AD Steel, Amy Chadburn, Sabrina E Racine-Brzostek, Priya Velu, Melissa M Cushing, Massimo Loda, Rainu Kaushal, et al. 2020. Routine laboratory blood tests predict SARS-CoV-2 infection using machine learning. Clinical chemistry, Vol. 66, 11 (2020), 1396--1404.Google ScholarGoogle Scholar
  14. He S Yang, Daniel D Rhoads, Jorge Sepulveda, Chengxi Zang, Amy Chadburn, and Fei Wang. 2022. Building the Model Challenges and Considerations of Developing and Implementing Machine Learning Tools for Clinical Laboratory Medicine Practice. Archives of Pathology & Laboratory Medicine (2022).Google ScholarGoogle Scholar
  15. Chengxi Zang, Marianne Goodman, Zheng Zhu, Lulu Yang, Ziwei Yin, Zsuzsanna Tamas, Vikas Mohan Sharma, Fei Wang, and Nan Shao. 2022a. Development of a screening algorithm for borderline personality disorder using electronic health records. Scientific Reports, Vol. 12, 1 (2022), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  16. Chengxi Zang, Yu Hou, Edward Schenck, Zhenxing Xu, Yongkang Zhang, Jie Xu, Jiang Bian, Dmitry Morozyuk, Dhruv Khullar, Anna Nordvig, et al. 2023 a. Risk Factors and Predictive Modeling for Post-Acute Sequelae of SARS-CoV-2 Infection: Findings from EHR Cohorts of the RECOVER Initiative. Research Square (2023), rs-3.Google ScholarGoogle Scholar
  17. Chengxi Zang and Fei Wang. 2021. SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 857--866.Google ScholarGoogle ScholarCross RefCross Ref
  18. Chengxi Zang, Hao Zhang, Jie Xu, Hansi Zhang, Sajjad Fouladvand, Shreyas Havaldar, Feixiong Cheng, Kun Chen, Yong Chen, Benjamin S Glicksberg, et al. 2022b. High-throughput clinical trial emulation with real world data and machine learning: a case study of drug repurposing for Alzheimer's disease. medRxiv (2022), 2022-01.Google ScholarGoogle Scholar
  19. Chengxi Zang, Yongkang Zhang, Jie Xu, Jiang Bian, Dmitry Morozyuk, Edward J Schenck, Dhruv Khullar, Anna S Nordvig, Elizabeth A Shenkman, Russell L Rothman, et al. 2023 b. Data-driven analysis to understand long COVID using electronic health records from the RECOVER initiative. Nature Communications, Vol. 14, 1 (2023), 1948.Google ScholarGoogle ScholarCross RefCross Ref
  20. Hao Zhang, Chengxi Zang, Zhenxing Xu, Yongkang Zhang, Jie Xu, Jiang Bian, Dmitry Morozyuk, Dhruv Khullar, Yiye Zhang, Anna S Nordvig, et al. 2023 b. Data-driven identification of post-acute SARS-CoV-2 infection subphenotypes. Nature Medicine, Vol. 29, 1 (2023), 226--235.Google ScholarGoogle ScholarCross RefCross Ref
  21. Xi Sheryl Zhang, Fengyi Tang, Hiroko H Dodge, Jiayu Zhou, and Fei Wang. 2019. Metapred: Meta-learning for clinical risk prediction with limited patient electronic health records. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2487--2495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yongkang Zhang, Hui Hu, Vasilios Fokaidis, Jie Xu, Chengxi Zang, Zhenxing Xu, Fei Wang, Michael Koropsak, Jiang Bian, Jaclyn Hall, et al. 2023 a. Identifying environmental risk factors for post-acute sequelae of SARS-CoV-2 infection: An EHR-based cohort study from the recover program. Environmental Advances, Vol. 11 (2023), 100352.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Mining Electronic Health Records for Real-World Evidence

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Article Metrics

          • Downloads (Last 12 months)228
          • Downloads (Last 6 weeks)21

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader