Abstract
In recent years, deep neural nets have triumphed over
many computer vision problems, including semantic segmentation, which is a critical task in emerging autonomous
driving and medical image diagnostics applications. In
general, training deep neural nets requires a humongous
amount of labeled data, which is laborious and costly to
collect and annotate. Recent advances in computer graphics shed light on utilizing photo-realistic synthetic data with
computer generated annotations to train neural nets. Nevertheless, the domain mismatch between real images and
synthetic ones is the major challenge against harnessing
the generated data and labels. In this paper, we propose
a principled way to conduct structured domain adaption
for semantic segmentation, i.e., integrating GAN into the
FCN framework to mitigate the gap between source and
target domains. Specifically, we learn a conditional generator to transform features of synthetic images to real-image
like features, and a discriminator to distinguish them. For
each training batch, the conditional generator and the discriminator compete against each other so that the generator
learns to produce real-image like features to fool the discriminator; afterwards, the FCN parameters are updated to
accommodate the changes of GAN. In experiments, without
using labels of real image data, our method significantly
outperforms the baselines as well as state-of-the-art methods by 12% ? 20% mean IoU on the Cityscapes dataset.