Abstract. Video motion magnification techniques allow us to see small
motions previously invisible to the naked eyes, such as those of vibrating
airplane wings, or swaying buildings under the influence of the wind.
Because the motion is small, the magnification results are prone to noise
or excessive blurring. The state of the art relies on hand-designed filters
to extract representations that may not be optimal. In this paper, we seek
to learn the filters directly from examples using deep convolutional neural
networks. To make training tractable, we carefully design a synthetic
dataset that captures small motion well, and use two-frame input for
training. We show that the learned filters achieve high-quality results
on real videos, with less ringing artifacts and better noise characteristics
than previous methods. While our model is not trained with temporal
filters, we found that the temporal filters can be used with our extracted
representations up to a moderate magnification, enabling a frequencybased motion selection. Finally, we analyze the learned filters and show
that they behave similarly to the derivative filters used in previous works.
Our code, trained model, and datasets will be available online