Multi-Channel Attention Selection GAN with Cascaded Semantic Guidancefor Cross-View Image Translation
Abstract
Cross-view image translation is challenging because it
involves images with drastically different views and severe
deformation. In this paper, we propose a novel approach
named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image
of the scene and a novel semantic map. The proposed
SelectionGAN explicitly utilizes the semantic information
and consists of two stages. In the first stage, the condition image and the target semantic map are fed into a cycled semantic-guided generation network to produce initial
coarse results. In the second stage, we refine the initial results by using a multi-channel attention selection mechanism. Moreover, uncertainty maps automatically learned
from attentions are used to guide the pixel loss for better network optimization. Extensive experiments on Dayton [41], CVUSA [43] and Ego2Top [1] datasets show that
our model is able to generate significantly better results
than the state-of-the-art methods. The source code, data
and trained models are available at https://github.
com/Ha0Tang/SelectionGAN.