资源算法neural-vqa

neural-vqa

2020-01-13 | |  36 |   0 |   0

neural-vqa

Join the chat at https://gitter.im/abhshkdz/neural-vqa

This is an experimental Torch implementation of the VIS + LSTM visual question answering model from the paperExploring Models and Data for Image Question Answeringby Mengye Ren, Ryan Kiros & Richard Zemel.

Model architecture

Setup

Requirements:

Download the MSCOCO train+val images and VQA data using sh data/download_data.sh. Extract all the downloaded zip files inside the data folder.

unzip Annotations_Train_mscoco.zip
unzip Questions_Train_mscoco.zip
unzip train2014.zip

unzip Annotations_Val_mscoco.zip
unzip Questions_Val_mscoco.zip
unzip val2014.zip

If you had them downloaded already, copy over the train2014 and val2014 image folders and VQA JSON files to the data folder.

Download the VGG-19 Caffe model and prototxt using sh models/download_models.sh.

Known issues

  • To avoid memory issues with LuaJIT, install Torch with Lua 5.1 (TORCH_LUA_VERSION=LUA51 ./install.sh). More instructions here.

  • If working with plain Lua, luaffifb may be needed for loadcaffe, unless using pre-extracted fc7 features.

Usage

Extract image features

th extract_fc7.lua -split train
th extract_fc7.lua -split val

Options

  • batch_size: Batch size. Default is 10.

  • split: train/val. Default is train.

  • gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.

  • proto_file: Path to the deploy.prototxt file for the VGG Caffe model. Default is models/VGG_ILSVRC_19_layers_deploy.prototxt.

  • model_file: Path to the .caffemodel file for the VGG Caffe model. Default is models/VGG_ILSVRC_19_layers.caffemodel.

  • data_dir: Data directory. Default is data.

  • feat_layer: Layer to extract features from. Default is fc7.

  • input_image_dir: Image directory. Default is data.

Training

th train.lua

Options

  • rnn_size: Size of LSTM internal state. Default is 512.

  • num_layers: Number of layers in LSTM

  • embedding_size: Size of word embeddings. Default is 512.

  • learning_rate: Learning rate. Default is 4e-4.

  • learning_rate_decay: Learning rate decay factor. Default is 0.95.

  • learning_rate_decay_after: In number of epochs, when to start decaying the learning rate. Default is 15.

  • alpha: Alpha for adam. Default is 0.8

  • beta: Beta used for adam. Default is 0.999.

  • epsilon: Denominator term for smoothing. Default is 1e-8.

  • batch_size: Batch size. Default is 64.

  • max_epochs: Number of full passes through the training data. Default is 15.

  • dropout:  Dropout for regularization. Probability of dropping input. Default is 0.5.

  • init_from: Initialize network parameters from checkpoint at this path.

  • save_every: No. of iterations after which to checkpoint. Default is 1000.

  • train_fc7_file: Path to fc7 features of training set. Default is data/train_fc7.t7.

  • fc7_image_id_file: Path to fc7 image ids of training set. Default is data/train_fc7_image_id.t7.

  • val_fc7_file: Path to fc7 features of validation set. Default is data/val_fc7.t7.

  • val_fc7_image_id_file: Path to fc7 image ids of validation set. Default is data/val_fc7_image_id.t7.

  • data_dir: Data directory. Default is data.

  • checkpoint_dir: Checkpoint directory. Default is checkpoints.

  • savefile: Filename to save checkpoint to. Default is vqa.

  • gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.

Testing

th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610.t7 -input_image_path data/train2014/COCO_train2014_000000405541.jpg -question 'What is the cat on?'

Options

  • checkpoint_file: Path to model checkpoint to initialize network parameters from

  • input_image_path: Path to input image

  • question: Question string

Implementation Details

  • Last hidden layer image features from VGG-19

  • Zero-padded question sequences for batched implementation

  • Training questions are filtered for top_n answers,top_n = 1000 by default (~87% coverage)

Pretrained model and data files

To reproduce results shown on this page or try your own image-question pairs, download the following and runpredict.lua with the appropriate paths.

References

License

MIT


上一篇:vqa.pytorch

下一篇:VQA_LSTM_CNN

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...