FakeTables: Using GANs to Generate Functional Dependency Preserving Tables
with Bounded Real Data
Abstract
In many cases, an organization wishes to release
some data, but is restricted in the amount of data
to be released due to legal, privacy and other concerns. For instance, the US Census Bureau releases
only 1% of its table of records every year, along
with statistics about the entire table. However, the
machine learning (ML) models trained on the released sub-table are usually sub-optimal. In this
paper, our goal is to find a way to augment the subtable by generating a synthetic table from the released sub-table, under the constraints that the generated synthetic table (i) has similar statistics as the
entire table, and (ii) preserves the functional dependencies of the released sub-table. We propose
a novel generative adversarial network framework
called ITS-GAN, where both the generator and the
discriminator are specifically designed to satisfy
these two constraints. By evaluating the augmentation performance of ITS-GAN on two representative datasets, the US Census Bureau data and US
Bureau of Transportation Statistics (BTS) data, we
show that ITS-GAN yields high quality classification results, and significantly outperforms various
state-of-the-art data augmentation approaches