2017 VQA Challenge Winner (CVPR'17 Workshop)
pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al.
Prerequisites
Data
Preparation
To download and extract vqav2, glove, and pretrained visual features: bash bash scripts/download_extract.sh
To prepare data for training: bash python scripts/preproc.py
The structure of data/
directory should look like this: <ul> <li>data/</li> <li>zips/ <ul> <li>v2_XXX...zip</li> <li>...</li> <li>glove...zip</li> <li>trainval_36.zip</li> </ul></li> <li>glove/ <ul> <li>glove...txt</li> <li>...</li> </ul></li> <li>v2_XXX.json</li> <li>...</li> <li>trainval_resnet...tsv (The above are files created after executing scripts/download_extract.sh)</li> <li>tokenizers/ <ul> <li>...</li> </ul></li> <li>dict_ans.pkl</li> <li>dict_q.pkl</li> <li>glove_pretrained_300.npy</li> <li>train_qa.pkl</li> <li>val_qa.pkl</li> <li>train_vfeats.pkl</li> <li>val_vfeats.pkl (The above are files created after executing scripts/preproc.py)
Train
Use default parameters:
bash scripts/train.sh
Notes
Huge re-factor (especially data preprocessing), tested based on pytorch 0.4.1 and python 3.6
Training for 20 epochs reach around 50% training accuracy. (model seems buggy in my implementation)
After all the preprocessing, data/
directory may be up to 38G+
Some of preproc.py
and utils.py
are based on this repo
Resources