skip to main content
10.1145/3580305.3599534acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

TWIN: Personalized Clinical Trial Digital Twin Generation

Published: 04 August 2023 Publication History

Abstract

Clinical trial digital twins are virtual patients that reflect personal characteristics in a high degree of granularity and can be used to simulate various patient outcomes under different conditions. With the growth of clinical trial databases captured by Electronic Data Capture (EDC) systems, there is a growing interest in using machine learning models to generate digital twins. This can benefit the drug development process by reducing the sample size required for participant recruitment, improving patient outcome predictive modeling, and mitigating privacy risks when sharing synthetic clinical trial data. However, prior research has mainly focused on generating Electronic Healthcare Records (EHRs), which often assume large training data and do not account for personalized synthetic patient record generation. In this paper, we propose a sample-efficient method TWIN for generating personalized clinical trial digital twins. TWIN can produce digital twins of patient-level clinical trial records with high fidelity to the targeting participant's record and preserves the temporal relations across visits and events. We compare our method with various baselines for generating real-world patient-level clinical trial data. The results show that TWIN generates synthetic trial data with high fidelity to facilitate patient outcome predictions in low-data scenarios and strong privacy protection against real patients from the trials.

Supplementary Material

MP4 File (rtfp0181-2min-promo.mp4)
Digital twins are virtual patients that reflect personal characteristics and can be used to simulate patient outcomes under various different conditions. Our method, TWIN, is a generative model that produces digital twins by utilizing information from most similar participants and preserving the causality across visits and events. Besides, it can simulate probable patient trajectories in counterfactual cases, i.e., if the patient were assigned to a different arm of the trial. We are the first to concentrate on personalized trial digital twin generation, whereas previous works only consider generating synthetic clinical trial data that are aligned with the real data in global statistics. The results show that TWIN generates synthetic trial data with high fidelity to facilitate patient outcome predictions in low-data scenarios and strong privacy protection against real patients from the trials. It has the potential to contribute to advancements in personalized medicine and clinical trial research.

References

[1]
Angier Allen, Anna Siefkas, Emily Pellegrini, Hoyt Burdick, Gina Barnes, Jacob Calvert, Qingqing Mao, and Ritankar Das. 2021. A digital twins machine learning model for forecasting disease progression in stroke patients. Applied Sciences, Vol. 11, 12 (2021), 5576.
[2]
Mrinal Kanti Baowaly, Chia-Ching Lin, Chao-Lin Liu, and Kuan-Ta Chen. 2019. Synthesizing electronic health records using improved generative adversarial networks. Journal of the American Medical Informatics Association, Vol. 26, 3 (2019), 228--241.
[3]
Tal Baumel, Jumana Nassour-Kassis, Raphael Cohen, Michael Elhadad, and Noémie Elhadad. 2018. Multi-label classification of patient notes: case study on ICD code assignment. In Workshops at the thirty-second AAAI conference on artificial intelligence.
[4]
Mandis Beigi, Afrah Shafquat, Jason Mezey, and Jacob W Aptekar. 2022. Synthetic Clinical Trial Data while Preserving Subject-Level Privacy. In NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research.
[5]
Daniele Bertolini, Anton D Loukianov, Aaron M Smith, David Li-Bland, Yannick Pouliot, Jonathan R Walsh, and Charles K Fisher. 2020. Modeling Disease Progression in Mild Cognitive Impairment and Alzheimer's Disease with Digital Twins. arXiv preprint arXiv:2012.13455 (2020).
[6]
Siddharth Biswal, Soumya Ghosh, Jon Duke, Bradley Malin, Walter Stewart, and Jimeng Sun. 2020. EVA: Generating Longitudinal Electronic Health Records Using Conditional Variational Autoencoders. arXiv preprint arXiv:2012.10020 (2020).
[7]
Chao Che, Cao Xiao, Jian Liang, Bo Jin, Jiayu Zho, and Fei Wang. 2017. An RNN architecture with dynamic temporal matching for personalized predictions of parkinson's disease. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 198--206.
[8]
Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. 2016. Risk prediction with electronic health records: A deep learning approach. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 432--440.
[9]
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016. Doctor AI: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference. PMLR, 301--318.
[10]
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F Stewart, and Jimeng Sun. 2017a. Generating multi-label discrete patient records using generative adversarial networks. In Machine Learning for Healthcare Conference. PMLR, 286--305.
[11]
Edward Choi, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2017b. Using recurrent neural network models for early detection of heart failure onset. J. Am. Med. Inform. Assoc., Vol. 24, 2 (March 2017), 361--370.
[12]
Johann de Jong, Ioana Cutcutache, Matthew Page, Sami Elmoufti, Cynthia Dilley, Holger Fröhlich, and Martin Armstrong. 2021. Towards realizing the vision of precision medicine: AI based prediction of clinical drug response. Brain, Vol. 144, 6 (2021), 1738--1750.
[13]
Khaled El Emam, Lucy Mosquera, and Chaoyi Zheng. 2021. Optimizing the synthesis of clinical trial data using sequential trees. Journal of the American Medical Informatics Association, Vol. 28, 1 (2021), 3--13.
[14]
Jiaqi Guan, Runzhe Li, Sheng Yu, and Xuegong Zhang. 2018. Generation of Synthetic Electronic Medical Record Text. In IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE Computer Society, 374--380.
[15]
Zhen-Yu Hong, Jooyong Shim, Woo Chan Son, and Changha Hwang. 2020. Predicting successes and failures of clinical trials with an ensemble LS-SVR. medRxiv (2020).
[16]
Hye Jin Kam and Ha Young Kim. 2017. Learning representations for the early detection of sepsis with deep neural networks. Computers in Biology and Medicine, Vol. 89 (2017), 248--255.
[17]
Dongha Lee, Hwanjo Yu, Xiaoqian Jiang, Deevakar Rogith, Meghana Gudala, Mubeen Tejani, Qiuchen Zhang, and Li Xiong. 2020. Generating sequential electronic health records using dual adversarial autoencoder. Journal of the American Medical Informatics Association, Vol. 27, 9 (2020), 1411--1419.
[18]
Sicen Liu, Xiaolong Wang, Yang Xiang, Hui Xu, Hui Wang, and Buzhou Tang. 2022. CATNet: Cross-event Attention-based Time-aware Network for Medical Event Prediction. arXiv preprint arXiv:2204.13847 (2022).
[19]
Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1903--1911.
[20]
Stan Matwin, Jordi Nin, Morvarid Sehatkar, and Tomasz Szapiro. 2015. A review of attribute disclosure control. Advanced research in data privacy (2015), 41--61.
[21]
Mehmet Ercan Nergiz and Chris Clifton. 2009. δ-presence without complete world knowledge. IEEE Transactions on Knowledge and Data Engineering, Vol. 22, 6 (2009), 868--883.
[22]
Trang Pham, Truyen Tran, Dinh Phung, and Svetha Venkatesh. 2017. Predicting healthcare trajectories from medical records: A deep learning approach. Journal of biomedical informatics, Vol. 69 (2017), 218--229.
[23]
Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissan Hajaj, Michaela Hardt, Peter J Liu, Xiaobing Liu, Jake Marcus, Mimi Sun, et al. 2018. Scalable and accurate deep learning with electronic health records. NPJ Digital Medicine, Vol. 1, 1 (2018), 1--10.
[24]
Pranav Rajpurkar, Jingbo Yang, Nathan Dass, Vinjai Vale, Arielle S Keller, Jeremy Irvin, Zachary Taylor, Sanjay Basu, Andrew Ng, and Leanne M Williams. 2020. Evaluation of a machine learning model based on pretreatment symptoms and electroencephalographic features to predict outcomes of antidepressant treatment in adults with depression: a prespecified secondary analysis of a randomized clinical trial. JAMA network open, Vol. 3, 6 (2020), e206653-e206653.
[25]
Alexandros Rekkas, Jessica K Paulus, Gowri Raman, John B Wong, Ewout W Steyerberg, Peter R Rijnbeek, David M Kent, and David van Klaveren. 2020. Predictive approaches to heterogeneous treatment effects: a scoping review. BMC Medical Research Methodology, Vol. 20, 1 (2020), 1--12.
[26]
Kan Ren, Jiarui Qin, Lei Zheng, Zhengyu Yang, Weinan Zhang, Lin Qiu, and Yong Yu. 2019. Deep recurrent survival analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4798--4805.
[27]
Uri Shalit, Fredrik D Johansson, and David Sontag. 2017. Estimating individual treatment effect: generalization bounds and algorithms. In International Conference on Machine Learning. PMLR, 3076--3085.
[28]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE, 3--18.
[29]
Jonathan R Walsh, Aaron M Smith, Yannick Pouliot, David Li-Bland, Anton Loukianov, and Charles K Fisher. 2020. Generating digital twins with multiple sclerosis using probabilistic neural networks. arXiv preprint arXiv:2002.02779 (2020).
[30]
Zifeng Wang, Chufan Gao, Lucas M Glass, and Jimeng Sun. 2022. Artificial Intelligence for In Silico Clinical Trials: A Review. arXiv preprint arXiv:2209.09023 (2022).
[31]
Zifeng Wang and Jimeng Sun. 2022a. PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning. In Conference on Empirical Methods in Natural Language Processing.
[32]
Zifeng Wang and Jimeng Sun. 2022b. Survtrace: Transformers for survival analysis with competing events. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 1--9.
[33]
Zifeng Wang and Jimeng Sun. 2022c. TransTab: Learning Transferable Tabular Transformers Across Tables. In Advances in Neural Information Processing Systems.
[34]
Zifeng Wang, Brandon Theodorou, Tianfan Fu, Cao Xiao, and Jimeng Sun. 2023. PyTrial: A Comprehensive Platform for Artificial Intelligence for Drug Development. https://pytrial.readthedocs.io/en/latest/
[35]
Zifeng Wang, Rui Wen, Xi Chen, Shilei Cao, Shao-Lun Huang, Buyue Qian, and Yefeng Zheng. 2021. Online Disease Diagnosis with Inductive Heterogeneous Graph Convolutional Networks. In Proceedings of the Web Conference 2021. 3349--3358.
[36]
Cao Xiao, Edward Choi, and Jimeng Sun. 2018. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. Journal of the American Medical Informatics Association, Vol. 25, 10 (2018), 1419--1428.
[37]
Andrew Yale, Saloni Dash, Ritik Dutta, Isabelle Guyon, Adrien Pavao, and Kristin P Bennett. 2020. Generation and evaluation of privacy preserving synthetic health data. Neurocomputing, Vol. 416 (2020), 244--255.
[38]
Chao Yan, Yao Yan, Zhiyu Wan, Ziqi Zhang, Larsson Omberg, Justin Guinney, Sean D Mooney, and Bradley A Malin. 2022. A Multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications, Vol. 13, 1 (2022), 7609.
[39]
Ziqi Zhang, Chao Yan, Thomas A Lasko, Jimeng Sun, and Bradley A Malin. 2021. SynTEG: A framework for temporal structured electronic health data simulation. Journal of the American Medical Informatics Association, Vol. 28, 3 (2021), 596--604.
[40]
Ziqi Zhang, Chao Yan, Diego A Mesa, Jimeng Sun, and Bradley A Malin. 2020. Ensuring electronic medical record simulation through better training, modeling, and evaluation. Journal of the American Medical Informatics Association, Vol. 27, 1 (2020), 99--108.

Cited By

View all
  • (2024)Digital Twins in Drug Discovery: A Paradigm Shift Shaping Pharmaceutical InnovationInternational Journal of Pharmaceutical Sciences and Nanotechnology(IJPSN)10.37285/ijpsn.2024.17.5.917:5(7628-7637)Online publication date: 15-Oct-2024
  • (2024)TWIN-ADAPT: Continuous Learning for Digital Twin-Enabled Online Anomaly Classification in IoT-Driven Smart LabsFuture Internet10.3390/fi1607023916:7(239)Online publication date: 4-Jul-2024
  • (2024)Digital Twins for Healthcare Using WearablesBioengineering10.3390/bioengineering1106060611:6(606)Online publication date: 13-Jun-2024
  • Show More Cited By

Index Terms

  1. TWIN: Personalized Clinical Trial Digital Twin Generation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2023
    5996 pages
    ISBN:9798400701030
    DOI:10.1145/3580305
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 August 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clinical trial
    2. digital twin
    3. synthetic data

    Qualifiers

    • Research-article

    Funding Sources

    • NSF

    Conference

    KDD '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,032
    • Downloads (Last 6 weeks)117
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Digital Twins in Drug Discovery: A Paradigm Shift Shaping Pharmaceutical InnovationInternational Journal of Pharmaceutical Sciences and Nanotechnology(IJPSN)10.37285/ijpsn.2024.17.5.917:5(7628-7637)Online publication date: 15-Oct-2024
    • (2024)TWIN-ADAPT: Continuous Learning for Digital Twin-Enabled Online Anomaly Classification in IoT-Driven Smart LabsFuture Internet10.3390/fi1607023916:7(239)Online publication date: 4-Jul-2024
    • (2024)Digital Twins for Healthcare Using WearablesBioengineering10.3390/bioengineering1106060611:6(606)Online publication date: 13-Jun-2024
    • (2024)Recent advances in predictive modeling with electronic health recordsProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/914(8272-8280)Online publication date: 3-Aug-2024
    • (2024)Personalized heart disease detection via ECG digital twin generationProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/649(5872-5881)Online publication date: 3-Aug-2024
    • (2024)Synthesizing Multimodal Electronic Health Records via Predictive Diffusion ModelsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671836(4607-4618)Online publication date: 25-Aug-2024
    • (2024)Digital Twins for Stress Management Utilizing Synthetic Data2024 IEEE World AI IoT Congress (AIIoT)10.1109/AIIoT61789.2024.10579038(329-335)Online publication date: 29-May-2024
    • (undefined)TWIN-GPT: Digital Twins for Clinical Trials via Large Language ModelACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674838

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media