VL-BERT
By Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai.
This repository is an official implementation of the paper VL-BERT: Pre-training of Generic Visual-Linguistic Representations.
Update on 2020/01/16 Add code of visualization.
Update on 2019/12/20 Our VL-BERT got accepted by ICLR 2020.
VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be fine-tuned for various down-stream visual-linguistic tasks, such as Visual Commonsense Reasoning, Visual Question Answering and Referring Expression Comprehension.
Thanks to PyTorch and its 3rd-party libraries, this codebase also contains following features:
Distributed Training
FP16 Mixed-Precision Training
Various Optimizers and Learning Rate Schedulers
Gradient Accumulation
Monitoring the Training Using TensorboardX
@article{su2019vl, title={Vl-bert: Pre-training of generic visual-linguistic representations}, author={Su, Weijie and Zhu, Xizhou and Cao, Yue and Li, Bin and Lu, Lewei and Wei, Furu and Dai, Jifeng}, journal={arXiv preprint arXiv:1908.08530}, year={2019}}
Ubuntu 16.04, CUDA 9.0, GCC 4.9.4
Python 3.6.x
# We recommend you to use Anaconda/Miniconda to create a conda environmentconda create -n vl-bert python=3.6 pip conda activate vl-bert
PyTorch 1.0.0 or 1.1.0
conda install pytorch=1.1.0 cudatoolkit=9.0 -c pytorch
Apex (optional, for speed-up and fp16 training)
git clone https://github.com/jackroos/apexcd ./apex pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Other requirements:
pip install Cython pip install -r requirements.txt
Compile
./scripts/init.sh
See PREPARE_DATA.md.
See PREPARE_PRETRAINED_MODELS.md.
./scripts/dist_run_single.sh <num_gpus> <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>
<num_gpus>
: number of gpus to use.
<task>
: pretrain/vcr/vqa/refcoco.
<path_to_cfg>
: config yaml file under ./cfgs/<task>
.
<dir_to_store_checkpoint>
: root directory to store checkpoints.
Following is a more concrete example:
./scripts/dist_run_single.sh 4 vcr/train_end2end.py ./cfgs/vcr/base_q2a_4x16G_fp32.yaml ./
For example, on 2 machines (A and B), each with 4 GPUs,
run following command on machine A:
./scripts/dist_run_multi.sh 2 0 <ip_addr_of_A> 4 <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>
run following command on machine B:
./scripts/dist_run_multi.sh 2 1 <ip_addr_of_A> 4 <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>
./scripts/nondist_run.sh <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>
Note:
In yaml files under ./cfgs
, we set batch size for GPUs with at least 16G memory, you may need to adapt the batch size and gradient accumulation steps according to your actual case, e.g., if you decrease the batch size, you should also increase the gradient accumulation steps accordingly to keep 'actual' batch size for SGD unchanged.
For efficiency, we recommend you to use distributed training even on single-machine. But for RefCOCO+, you may meet deadlock using distributed training due to unknown reason (it may be related to PyTorch dataloader deadloack), you can simply use non-distributed training to solve this problem.
Local evaluation on val set:
python vcr/val.py --a-cfg <cfg_of_q2a> --r-cfg <cfg_of_qa2r> --a-ckpt <checkpoint_of_q2a> --r-ckpt <checkpoint_of_qa2r> --gpus <indexes_of_gpus_to_use> --result-path <dir_to_save_result> --result-name <result_file_name>
Note: <indexes_of_gpus_to_use>
is gpu indexes, e.g., 0 1 2 3
.
Generate prediction results on test set for leaderboard submission:
python vcr/test.py --a-cfg <cfg_of_q2a> --r-cfg <cfg_of_qa2r> --a-ckpt <checkpoint_of_q2a> --r-ckpt <checkpoint_of_qa2r> --gpus <indexes_of_gpus_to_use> --result-path <dir_to_save_result> --result-name <result_file_name>
Generate prediction results on test set for EvalAI submission:
python vqa/test.py --cfg <cfg_file> --ckpt <checkpoint> --gpus <indexes_of_gpus_to_use> --result-path <dir_to_save_result> --result-name <result_file_name>
Local evaluation on val/testA/testB set:
python refcoco/test.py --split <val|testA|testB> --cfg <cfg_file> --ckpt <checkpoint> --gpus <indexes_of_gpus_to_use> --result-path <dir_to_save_result> --result-name <result_file_name>
See VISUALIZATION.md.
Many thanks to following codes that help us a lot in building this codebase:
下一篇:bert-multi-gpu
还没有评论,说两句吧!
热门资源
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
inferno-boilerplate
This is a very basic boilerplate example for pe...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com