Abstract
Generalization performance of trained computer vision
(CV) systems that use computer graphics (CG) generated
data is not yet effective due to the concept of ’domainshift’ between virtual and real data. Although simulated
data augmented with a few real-world samples has been
shown to mitigate domain shift and improve transferability of trained models, guiding or bootstrapping the virtual
data generation with the distributions learnt from target
real world domain is desired, especially in the fields where
annotating even few real images is laborious (such as semantic labeling, optical flow, and intrinsic images etc.). In
order to address this problem in an unsupervised manner,
our work combines recent advances in CG, which aims at
generating stochastic scene layouts using large collections
of 3D object models, and generative adversarial training,
which aims at training generative models by measuring discrepancy between generated and real data in terms of their
separability in the space of a deep discriminatively-trained
classifier. Our method uses iterative estimation of the posterior density of prior distributions for a generative graphical model. This is done within a rejection sampling framework. Initially, we assume uniform distributions as priors over parameters of a scene described by a generative
graphical model. As iterations proceed the uniform prior
distributions are updated sequentially to distributions that
are closer to the unknown distributions of target data. We
demonstrate the utility of adversarially tuned scene generation on two real world benchmark datasets (CityScapes
and CamVid) for traffic scene semantic labeling with a deep
convolutional net (DeepLab). We obtained performance improvements by 2.28 and 3.14 points on the IoU metric between the DeepLab models trained on simulated sets prepared from the scene generation models before and after
tuning to CityScapes and CamVid respectively