research-article

Generative Adversarial Networks Enhanced Pre-training for Insufficient Electronic Health Records Modeling

Authors:

Houxing Ren,

Jingyuan Wang,

Wayne Xin ZhaoAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3810 - 3818

https://doi.org/10.1145/3534678.3539020

Published: 14 August 2022 Publication History

Get Access

Abstract

In recent years, automatic computational systems based on deep learning are widely used in medical fields, such as automatic diagnosing and disease prediction. Most of these systems are designed for data sufficient scenarios. However, due to the disease rarity or privacy, the medical data are always insufficient. When applying these data-hungry deep learning models with insufficient data, it is likely to lead to issues of over-fitting and cause serious performance problems. Many data augmentation methods have been proposed to solve the data insufficiency problem, such as using GAN (Generative Adversarial Networks) to generate training data. However, the augmented data usually contains lots of noise. Directly using them to train sensitive medical models is very difficult to achieve satisfactory results.

To overcome this problem, we propose a novel deep model learning method for insufficient EHR (Electronic Health Record) data modeling, namely GRACE, which stands GeneRative Adversarial networks enhanCed prE-training. In the method, we propose an item-relation-aware GAN to capture changing trends and correlations among data for generating high-quality EHR records. Furthermore, we design a pre-training mechanism consisting of a masked records prediction task and a real-fake contrastive learning task to learn representations for EHR data using both generated and real data. After the pre-training, only the representations of real data is used to train the final prediction model. In this way, we can fully exploit useful information in generated data through pre-training, and also avoid the problems caused by directly using noisy generated data to train the final prediction model. The effectiveness of the proposed method is evaluated using extensive experiments on three healthcare-related real-world datasets. We also deploy our method in a maternal and child health care hospital for the online test. Both offline and online experimental results demonstrate the effectiveness of the proposed method. We believe doctors and patients can benefit from our effective learning method in various healthcare-related applications.

References

[1]

American Diabetes Association et al. 2004. Gestational diabetes mellitus. Diabetes care, Vol. 27, suppl 1 (2004), s88--s90.

Abstract

References

Cited By

Index Terms

Recommendations

RAPT: Pre-training of Time-Aware Transformer for Learning Robust Healthcare Representation

Poster: Boosting Adversarial Robustness by Adversarial Pre-training

Mining Electronic Health Records (EHRs): A Survey

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations