Perspective Transformer Nets
Introduction
This is the TensorFlow implementation for the NIPS 2016 work "Perspective Transformer Nets: Learning Single-View 3D Object Reconstrution without 3D Supervision"
Re-implemented by Xinchen Yan, Arkanath Pathak, Jasmine Hsu, Honglak Lee
Reference: Orginal implementation in Torch
How to run this code
This implementation is ready to be run locally or "distributed across multiple machines/tasks". You will need to set the task number flag for each task when running in a distributed fashion. Please refer to the original paper for parameter explanations and training details.
Installation
TensorFlow
Bazel
matplotlib
scikit-image
PIL
Dataset
This code requires the dataset to be in tfrecords format with the following features: * image * Flattened list of image (float representations) for each view point. * mask * Flattened list of image masks (float representations) for each view point. * vox * Flattened list of voxels (float representations) for the object. * This is needed for using vox loss and for prediction comparison.
You can download the ShapeNet Dataset in tfrecords format from here*.
* Disclaimer: This data is hosted personally by Arkanath Pathak for non-commercial research purposes. Please cite the ShapeNet paper in your works when using ShapeNet for non-commercial research purposes.
Pretraining: pretrain_rotator.py for each RNN step
$ bazel run -c opt :pretrain_rotator -- --step_size={} --init_model={}
Pass the init_model as the checkpoint path for the last step trained model. You'll also need to set the inp_dir flag to where your data resides.
Training: train_ptn.py with last pretrained model.
$ bazel run -c opt :train_ptn -- --init_model={}
Example TensorBoard Visualizations
To compare the visualizations make sure to set the model_name flag different for each parametric setting:
This code adds summaries for each loss. For instance, these are the losses we encountered in the distributed pretraining for ShapeNet Chair Dataset with 10 workers and 16 parameter servers:
You can expect such images after fine tuning the training as "grid_vis" under Image summaries in TensorBoard: Here the third and fifth columns are the predicted masks and voxels respectively, alongside their ground truth values.
A similar image for when trained on all ShapeNet Categories (Voxel visualizations might be skewed):
链接:https://github.com/tensorflow/models/tree/master/research/ptn