Abstract
Recently, deep learning based video super-resolution
(SR) methods have achieved promising performance. To simultaneously exploit the spatial and temporal information
of videos, employing 3-dimensional (3D) convolutions is a
natural approach. However, straight utilizing 3D convolutions may lead to an excessively high computational complexity which restricts the depth of video SR models and
thus undermine the performance. In this paper, we present
a novel fast spatio-temporal residual network (FSTRN) to
adopt 3D convolutions for the video SR task in order to enhance the performance while maintaining a low computational load. Specifically, we propose a fast spatio-temporal
residual block (FRB) that divide each 3D filter to the product of two 3D filters, which have considerably lower dimensions. Furthermore, we design a cross-space residual
learning that directly links the low-resolution space and
the high-resolution space, which can greatly relieve the
computational burden on the feature fusion and up-scaling
parts. Extensive evaluations and comparisons on benchmark datasets validate the strengths of the proposed approach and demonstrate that the proposed network significantly outperforms the current state-of-the-art methods.