Recovering Realistic Texture in Image Super-resolution by
Deep Spatial Feature Transform
Abstract
Despite that convolutional neural networks (CNN)
have recently demonstrated high-quality reconstruction for
single-image super-resolution (SR), recovering natural and
realistic texture remains a challenging problem. In this paper, we show that it is possible to recover textures faithful to semantic classes. In particular, we only need to
modulate features of a few intermediate layers in a single
network conditioned on semantic segmentation probability
maps. This is made possible through a novel Spatial Feature
Transform (SFT) layer that generates affine transformation
parameters for spatial-wise feature modulation. SFT layers
can be trained end-to-end together with the SR network using the same loss function. During testing, it accepts an input image of arbitrary size and generates a high-resolution
image with just a single forward pass conditioned on the
categorical priors. Our final results show that an SR network equipped with SFT can generate more realistic and
visually pleasing textures in comparison to state-of-the-art
SRGAN [27] and EnhanceNet [38].