Abstract. Generative adversarial networks (GANs) are one of the most
popular methods for generating images today. While impressive results
have been validated by visual inspection, a number of quantitative criteria have emerged only recently. We argue here that the existing ones are
insufficient and need to be in adequation with the task at hand. In this
paper we introduce two measures based on image classification—GANtrain and GAN-test, which approximate the recall (diversity) and precision (quality of the image) of GANs respectively. We evaluate a number
of recent GAN approaches based on these two measures and demonstrate a clear difference in performance. Furthermore, we observe that
the increasing difficulty of the dataset, from CIFAR10 over CIFAR100
to ImageNet, shows an inverse correlation with the quality of the GANs,
as clearly evident from our measures