Abstract
Exploiting synthetic data to learn deep models has attracted increasing attention in recent years. However, the
intrinsic domain difference between synthetic and real images usually causes a significant performance drop when
applying the learned model to real world scenarios. This is
mainly due to two reasons: 1) the model overfits to synthetic
images, making the convolutional filters incompetent to extract informative representation for real images; 2) there is
a distribution difference between synthetic and real data,
which is also known as the domain adaptation problem.
To this end, we propose a new reality oriented adaptation
approach for urban scene semantic segmentation by learning from synthetic data. First, we propose a target guided
distillation approach to learn the real image style, which
is achieved by training the segmentation model to imitate
a pretrained real style model using real images. Second,
we further take advantage of the intrinsic spatial structure
presented in urban scene images, and propose a spatialaware adaptation scheme to effectively align the distribution of two domains. These two modules can be readily integrated with existing state-of-the-art semantic segmentation
networks to improve their generalizability when adapting
from synthetic to real urban scenes. We evaluate the proposed method on Cityscapes dataset by adapting from GTAV
and SYNTHIA datasets, where the results demonstrate the
effectiveness of our method