Hierarchical Long-term Video Prediction without Supervision

资源分类

2020-03-19 |

143 |

137 |

Abstract

Much of recent research has been devoted to video prediction and generation, yet most of the previous works have demonstrated only limited success in generating videos on short-term horizons. The hierarchical video prediction method by Villegas et al. (2017b) is an example of a state-of-the-art method for long-term video prediction, but their method is limited because it requires ground truth annotation of high-level structures (e.g., human joint landmarks) at training time. Our network encodes the input frame, predicts a high-level encoding into the future, and then a decoder with access to the first frame produces the predicted image from the predicted encoding. The decoder also produces a mask that outlines the predicted foreground object (e.g., person) as a by-product. Unlike Villegas et al. (2017b), we develop a novel training method that jointly trains the encoder, t predictor, and the decoder together without highlevel supervision; we further improve upon this by using an adversarial loss in the feature space train the predictor. Our method can predict about 20 seconds into the future and provides better results compared to Denton and Fergus (2018) and Finn et al. (2016) on the Human 3.6M dataset.

上一篇：Active Learning with Logged Data

下一篇：Kernelized Synaptic Weight Matrices

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Regularizing RNNs...

Recently, caption generation with an encoder-de...
Deep Cross-media ...

Cross-media retrieval is a research hotspot in ...
Learning Expressi...

Facial expression is temporally dynamic event w...
Compact MDDs for ...

Pseudo-Boolean (PB) constraints are usually en...
Attributed Graph ...

Graph clustering is a fundamental task which di...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com