neural-vqa
This is an experimental Torch implementation of the VIS + LSTM visual question answering model from the paperExploring Models and Data for Image Question Answeringby Mengye Ren, Ryan Kiros & Richard Zemel.
Requirements:
Download the MSCOCO train+val images and VQA data using sh data/download_data.sh. Extract all the downloaded zip files inside the data folder.
unzip Annotations_Train_mscoco.zip unzip Questions_Train_mscoco.zip unzip train2014.zip unzip Annotations_Val_mscoco.zip unzip Questions_Val_mscoco.zip unzip val2014.zip
If you had them downloaded already, copy over the train2014 and val2014 image folders
and VQA JSON files to the data folder.
Download the VGG-19 Caffe model and prototxt using sh models/download_models.sh.
To avoid memory issues with LuaJIT, install Torch with Lua 5.1 (TORCH_LUA_VERSION=LUA51 ./install.sh).
More instructions here.
If working with plain Lua, luaffifb may be needed for loadcaffe, unless using pre-extracted fc7 features.
th extract_fc7.lua -split train th extract_fc7.lua -split val
batch_size: Batch size. Default is 10.
split: train/val. Default is train.
gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.
proto_file: Path to the deploy.prototxt file for the VGG Caffe model. Default is models/VGG_ILSVRC_19_layers_deploy.prototxt.
model_file: Path to the .caffemodel file for the VGG Caffe model. Default is models/VGG_ILSVRC_19_layers.caffemodel.
data_dir: Data directory. Default is data.
feat_layer: Layer to extract features from. Default is fc7.
input_image_dir: Image directory. Default is data.
th train.lua
rnn_size: Size of LSTM internal state. Default is 512.
num_layers: Number of layers in LSTM
embedding_size: Size of word embeddings. Default is 512.
learning_rate: Learning rate. Default is 4e-4.
learning_rate_decay: Learning rate decay factor. Default is 0.95.
learning_rate_decay_after: In number of epochs, when to start decaying the learning rate. Default is 15.
alpha: Alpha for adam. Default is 0.8
beta: Beta used for adam. Default is 0.999.
epsilon: Denominator term for smoothing. Default is 1e-8.
batch_size: Batch size. Default is 64.
max_epochs: Number of full passes through the training data. Default is 15.
dropout: Dropout for regularization. Probability of dropping input. Default is 0.5.
init_from: Initialize network parameters from checkpoint at this path.
save_every: No. of iterations after which to checkpoint. Default is 1000.
train_fc7_file: Path to fc7 features of training set. Default is data/train_fc7.t7.
fc7_image_id_file: Path to fc7 image ids of training set. Default is data/train_fc7_image_id.t7.
val_fc7_file: Path to fc7 features of validation set. Default is data/val_fc7.t7.
val_fc7_image_id_file: Path to fc7 image ids of validation set. Default is data/val_fc7_image_id.t7.
data_dir: Data directory. Default is data.
checkpoint_dir: Checkpoint directory. Default is checkpoints.
savefile: Filename to save checkpoint to. Default is vqa.
gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.
th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610.t7 -input_image_path data/train2014/COCO_train2014_000000405541.jpg -question 'What is the cat on?'
checkpoint_file: Path to model checkpoint to initialize network parameters from
input_image_path: Path to input image
question: Question string
Last hidden layer image features from VGG-19
Zero-padded question sequences for batched implementation
Training questions are filtered for top_n answers,top_n = 1000 by default (~87% coverage)
To reproduce results shown on this page or try your own
image-question pairs, download the following and runpredict.lua with the appropriate paths.
Exploring Models and Data for Image Question Answering, Ren et al., NIPS15
VQA: Visual Question Answering, Antol et al., ICCV15
上一篇:vqa.pytorch
下一篇:VQA_LSTM_CNN
还没有评论,说两句吧!
热门资源
TensorFlow-Course
This repository aims to provide simple and read...
DuReader_QANet_BiDAF
Machine Reading Comprehension on DuReader Usin...
My_DrQA
My_DrQA A re-implement DrQA based on Pytorch
Klukshu-Sockeye-...
KLUKSHU SOCKEYE PROJECTS 2016 This repositor...
ETD_cataloguing_a...
ETD catalouging project using allennlp
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com