Photographic Text-to-Image Synthesis
with a Hierarchically-nested Adversarial Network
Abstract
This paper presents a novel method to deal with the
challenging task of generating photographic images conditioned on semantic image descriptions. Our method introduces accompanying hierarchical-nested adversarial objectives inside the network hierarchies, which regularize
mid-level representations and assist generator training to
capture the complex image statistics. We present an extensile single-stream generator architecture to better adapt
the jointed discriminators and push generated images up to
high resolutions. We adopt a multi-purpose adversarial loss
to encourage more effective image and text information usage in order to improve the semantic consistency and image
fidelity simultaneously. Furthermore, we introduce a new
visual-semantic similarity measure to evaluate the semantic
consistency of generated images. With extensive experimental validation on three public datasets, our method signifi-
cantly improves previous state of the arts on all datasets
over different evaluation metrics.