Abstract. Deep learning based image-to-image translation methods aim
at learning the joint distribution of the two domains and finding transformations between them. Despite recent GAN (Generative Adversarial
Network) based methods have shown compelling results, they are prone
to fail at preserving image-objects and maintaining translation consistency, which reduces their practicality on tasks such as generating largescale training data for different domains. To address this problem, we
purpose a structure-aware image-to-image translation network, which is
composed of encoders, generators, discriminators and parsing nets for the
two domains, respectively, in a unified framework. The purposed network
generates more visually plausible images compared to competing methods on different image-translation tasks. In addition, we quantitatively
evaluate different methods by training Faster-RCNN and YOLO with
datasets generated from the image-translation results and demonstrate
significant improvement on the detection accuracies by using the proposed image-object preserving network