An Integrated Resampling Methods for Imbalanced Sporadic Temporal Data in EHRs | IEEE Conference Publication | IEEE Xplore

An Integrated Resampling Methods for Imbalanced Sporadic Temporal Data in EHRs


Abstract:

Most real-world applications in EHRs involve temporal data with skewed distributions. The imbalanced classification problem becomes more difficult in sporadic temporal da...Show More

Abstract:

Most real-world applications in EHRs involve temporal data with skewed distributions. The imbalanced classification problem becomes more difficult in sporadic temporal data that variables exist on correlation and have some missing values. A common solution to classification tasks with imbalanced data is the oversampling methods, which generate new samples to re-balancing the classes. However, traditional oversampling methods usually change the distribution, thereby leading to bias. This paper proposed a self-adaptive integrated oversampling method for imbalanced sporadic temporal data in EHRs. The masking vectors and density vectors have been introduced to measure missing value distribution of samples, and the minority samples are divided into high density samples and sparse density samples. We extend the resampling strategies combining a subsample alignment method and structure preserving oversampling method. The weight of sample difference is used to improve classification performance. Furthermore, the filter mechanism is proposed to remove the noise samples with good efficiency. The experimental results show that the proposed method increases performance compared to traditional resampling methods in terms of AUC, F1, and G-mean evaluation metrics.
Date of Conference: 09-12 December 2021
Date Added to IEEE Xplore: 14 January 2022
ISBN Information:
Conference Location: Houston, TX, USA

References

References is not available for this document.