Abstract
Conventional video compression approaches use the predictive coding architecture and encode the corresponding
motion information and residual information. In this paper,
taking advantage of both classical architecture in the conventional video compression method and the powerful nonlinear representation ability of neural networks, we propose the first end-to-end video compression deep model that
jointly optimizes all the components for video compression.
Specifically, learning based optical flow estimation is utilized to obtain the motion information and reconstruct the
current frames. Then we employ two auto-encoder style
neural networks to compress the corresponding motion and
residual information. All the modules are jointly learned
through a single loss function, in which they collaborate
with each other by considering the trade-off between reducing the number of compression bits and improving quality
of the decoded video. Experimental results show that the
proposed approach can outperform the widely used video
coding standard H.264 in terms of PSNR and be even on
par with the latest standard H.265 in terms of MS-SSIM.
Code is released at https://github.com/GuoLusjtu/DVC.