2017 VQA Challenge Winner (CVPR'17 Workshop)

Model architecture

Prerequisites

To download and extract vqav2, glove, and pretrained visual features: bash bash scripts/download_extract.sh
To prepare data for training: bash python scripts/preproc.py
The structure of data/ directory should look like this: <ul> <li>data/</li> <li>zips/ <ul> <li>v2_XXX...zip</li> <li>...</li> <li>glove...zip</li> <li>trainval_36.zip</li> </ul></li> <li>glove/ <ul> <li>glove...txt</li> <li>...</li> </ul></li> <li>v2_XXX.json</li> <li>...</li> <li>trainval_resnet...tsv (The above are files created after executing scripts/download_extract.sh)</li> <li>tokenizers/ <ul> <li>...</li> </ul></li> <li>dict_ans.pkl</li> <li>dict_q.pkl</li> <li>glove_pretrained_300.npy</li> <li>train_qa.pkl</li> <li>val_qa.pkl</li> <li>train_vfeats.pkl</li> <li>val_vfeats.pkl (The above are files created after executing scripts/preproc.py)

Use default parameters:

bash scripts/train.sh

Huge re-factor (especially data preprocessing), tested based on pytorch 0.4.1 and python 3.6
Training for 20 epochs reach around 50% training accuracy. (model seems buggy in my implementation)
After all the preprocessing, data/ directory may be up to 38G+
Some of preproc.py and utils.py are based on this repo

用户评价

全部评价

还没有评论，说两句吧！

热门资源