Learning to Generate Time-Lapse Videos Using Multi-Stage
Dynamic Generative Adversarial Networks
Abstract
Taking a photo outside, can we predict the immediate
future, e.g., how would the cloud move in the sky? We
address this problem by presenting a generative adversarial network (GAN) based two-stage approach to generating realistic time-lapse videos of high resolution. Given the
first frame, our model learns to generate long-term future
frames. The first stage generates videos of realistic contents for each frame. The second stage refines the generated
video from the first stage by enforcing it to be closer to real
videos with regard to motion dynamics. To further encourage vivid motion in the final generated video, Gram matrix
is employed to model the motion more precisely. We build
a large scale time-lapse dataset, and test our approach on
this new dataset. Using our model, we are able to generate
realistic videos of up to 128 × 128 resolution for 32 frames.
Quantitative and qualitative experiment results demonstrate
the superiority of our model over the state-of-the-art models