Loading [a11y]/accessibility-menu.js
A Universal Metric for Robust Evaluation of Synthetic Tabular Data | IEEE Journals & Magazine | IEEE Xplore

A Universal Metric for Robust Evaluation of Synthetic Tabular Data


Impact Statement:Tabular data synthesis plays an important role in ensuring privacy preservation in the data-driven systems. The data synthesis algorithms help in producing synthetic data...Show More

Abstract:

Synthetic tabular data generation becomes crucial when real data are limited, expensive to collect, or simply cannot be used due to privacy concerns. However, producing g...Show More
Impact Statement:
Tabular data synthesis plays an important role in ensuring privacy preservation in the data-driven systems. The data synthesis algorithms help in producing synthetic data that statistically resembles real data and can comply with the privacy protection regulations (such as European Union General Data Protection Regulation and California Consumer Privacy Act) due to its synthetic nature. Measuring the quality of synthetically generated tabular data is a challenging task. The effectiveness of such metrics will increase the confidence of both the user and the regulators about the privacy preservation of data. This article presents a single score universal metric for the evaluation of synthetic tabular data. The proposed TabSynDex metric in this article ensures fast computation of loss while training, consistency in the comparative evaluation of different generative methods. Moreover, it puts a more stringent criterion to measure the closeness of the synthetic data to real data. The findin...

Abstract:

Synthetic tabular data generation becomes crucial when real data are limited, expensive to collect, or simply cannot be used due to privacy concerns. However, producing good quality synthetic data is challenging. Several probabilistic, statistical, generative adversarial networks and variational autoencoder-based approaches have been presented for synthetic tabular data generation. Once generated, evaluating the quality of the synthetic data is quite challenging. Some of the traditional metrics have been used in the literature, but there is lack of a common, robust, and single metric. This makes it difficult to properly compare the effectiveness of different synthetic tabular data generation methods. In this article, we propose a new universal metric, TabSynDex, for the robust evaluation of synthetic data. The proposed metric assesses the similarity of synthetic data with real data through different component scores, which evaluate the characteristics that are desirable for “high-quali...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 1, January 2024)
Page(s): 300 - 309
Date of Publication: 14 December 2022
Electronic ISSN: 2691-4581

Contact IEEE to Subscribe

References

References is not available for this document.