Abstract:
This paper aims to automatically augment numerical tabular data by using the variational autoencoder model. For this, we try to solve the problem of class imbalance in nu...Show MoreMetadata
Abstract:
This paper aims to automatically augment numerical tabular data by using the variational autoencoder model. For this, we try to solve the problem of class imbalance in numerical data and to improve the performance of the classification model by augmenting the training data. In this paper, we propose a new augmentation technique called ‘D-VAE’ which performs data augmentation through variational autoencoder with discretization for numerical columuns; D-VAE artificially increases the number of records and the number of columns for a given tabular data. The main features of the proposed technique are to kperform discretization and feature selection in the preprocessing process. For the discretization process, we use k-means algorithm, through which records within a given table are grouped, and then converted into one-hot vectors according to the clustering results. In addition, for memory efficiency, we reduced the number of parameters of the VAE model by using a relatively small number of features through feature selection called REFCV. To evaluate the performance of the proposed technique, we conducted various experiments by numerical data augmentation ratio using four open datasets.
Date of Conference: 17-20 December 2022
Date Added to IEEE Xplore: 26 January 2023
ISBN Information: