资源算法VideoPrediction

VideoPrediction

2019-09-20 | |  62 |   0 |   0

Video Prediction with Neural Advection

A TensorFlow implementation of the models described in Unsupervised Learning for Physical Interaction through Video Prediction (Finn et al., 2016).

This video prediction model, which is optionally conditioned on actions, predictions future video by internally predicting how to transform the last image (which may have been predicted) into the next image. As a result, it can reuse apperance information from previous frames and can better generalize to objects not seen in the training set. Some example predictions on novel objects are shown below:

16_70.gif    2_96.gif     1_38.gif     11_10.gif      3_34.gif

When the model is conditioned on actions, it changes it's predictions based on the passed in action. Here we show the models predictions in response to varying the magnitude of the passed in actions, from small to large:

0xact_0.gif     05xact_0.gif     1xact_0.gif     15xact_0.gif     0xact_17.gif     05xact_17.gif      1xact_17.gif      15xact_17.gif

Because the model is trained with an l2 objective, it represents uncertainty as blur.

Requirements

  • Tensorflow (see tensorflow.org for installation instructions)

  • spatial_tranformer model in tensorflow/models, for the spatial tranformer predictor (STP).

Data

The data used to train this model is located here.

To download the robot data, run the following.

./download_data.sh

Training the model

To train the model, run the prediction_train.py file.

python prediction_train.py

There are several flags which can control the model that is trained, which are exeplified below:

python prediction_train.py 
  --data_dir=push/push_train  # path to the training set.
  --model=CDNA  # the model type to use - DNA, CDNA, or STP
  --output_dir=./checkpoints  # where to save model checkpoints
  --event_log_dir=./summaries  # where to save training statistics
  --num_iterations=100000  # number of training iterations
  --pretrained_model=model  # path to model to initialize from, random if emtpy
  --sequence_length=10  # the number of total frames in a sequence
  --context_frames=2  # the number of ground truth frames to pass in at start
  --use_state=1  # whether or not to condition on actions and the initial state
  --num_masks=10  # the number of transformations and corresponding masks
  --schedsamp_k=900.0  # the constant used for scheduled sampling or -1
  --train_val_split=0.95  # the percentage of training data for validation
  --batch_size=32  # the training batch size
  --learning_rate=0.001  # the initial learning rate for the Adam optimizer

If the dynamic neural advection (DNA) model is being used, the --num_masks option should be set to one.

The --context_frames option defines both the number of initial ground truth frames to pass in, as well as when to start penalizing the model's predictions.

The data directory --data_dir should contain tfrecord files with the format used in the released push dataset. See here for details. If the --use_state option is not set, then the data only needs to contain image sequences, not states and actions.

Contact

To ask questions or report issues please open an issue on the tensorflow/models issues tracker. Please assign issues to @cbfinn.

Credits

This code was written by Chelsea Finn.

链接:https://github.com/tensorflow/models/tree/master/research/video_prediction

上一篇:Transformer

下一篇:Deep Q-Network

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...