Abstract. Harvesting dense pixel-level annotations to train deep neural networks for semantic segmentation is extremely expensive and unwieldy at scale. While learning from synthetic data where labels are readily available sounds promising, performance degrades significantly when
testing on novel realistic data due to domain discrepancies. We present
Dual Channel-wise Alignment Networks (DCAN), a simple yet effective
approach to reduce domain shift at both pixel-level and feature-level.
Exploring statistics in each channel of CNN feature maps, our framework performs channel-wise feature alignment, which preserves spatial
structures and semantic information, in both an image generator and
a segmentation network. In particular, given an image from the source
domain and unlabeled samples from the target domain, the generator
synthesizes new images on-the-fly to resemble samples from the target
domain in appearance and the segmentation network further refines highlevel features before predicting semantic maps, both of which leverage
feature statistics of sampled images from the target domain. Unlike much
recent and concurrent work relying on adversarial training, our framework is lightweight and easy to train. Extensive experiments on adapting models trained on synthetic segmentation benchmarks to real urban
scenes demonstrate the effectiveness of the proposed framework