Abstract. Videos contain highly redundant information between frames.
Such redundancy has been studied extensively in video compression and
encoding, but is less explored for more advanced video processing. In
this paper, we propose a learnable unified framework for propagating a
variety of visual properties of video images, including but not limited to
color, high dynamic range (HDR), and segmentation mask, where the
properties are available for only a few key-frames. Our approach is based
on a temporal propagation network (TPN), which models the transitionrelated affinity between a pair of frames in a purely data-driven manner.
We theoretically prove two essential properties of TPN: (a) by regularizing the global transformation matrix as orthogonal, the “style energy”
of the property can be well preserved during propagation; and (b) such
regularization can be achieved by the proposed switchable TPN with
bi-directional training on pairs of frames. We apply the switchable TPN
to three tasks: colorizing a gray-scale video based on a few colored keyframes, generating an HDR video from a low dynamic range (LDR) video
and a few HDR frames, and propagating a segmentation mask from the
first frame in videos. Experimental results show that our approach is significantly more accurate and efficient than the state-of-the-art methods