Abstract
Video frame interpolation algorithms typically estimate
optical flow or its variations and then use it to guide the
synthesis of an intermediate frame between two consecutive original frames. To handle challenges like occlusion,
bidirectional flow between the two input frames is often
estimated and used to warp and blend the input frames.
However, how to effectively blend the two warped frames
still remains a challenging problem. This paper presents a
context-aware synthesis approach that warps not only the
input frames but also their pixel-wise contextual information and uses them to interpolate a high-quality intermediate frame. Specifically, we first use a pre-trained neural network to extract per-pixel contextual information for input
frames. We then employ a state-of-the-art optical flow algorithm to estimate bidirectional flow between them and prewarp both input frames and their context maps. Finally, unlike common approaches that blend the pre-warped frames,
our method feeds them and their context maps to a video
frame synthesis neural network to produce the interpolated
frame in a context-aware fashion. Our neural network is
fully convolutional and is trained end to end. Our experiments show that our method can handle challenging scenarios such as occlusion and large motion and outperforms
representative state-of-the-art approaches