ContrastiveLosses4VRD
Example results from the OpenImages dataset.
Example results of RelDN with without and with our losses. "L0 only" means using only the original multi-class logistic loss (without our losses). The top row shows RelDN outputs and the bottom row visualizes the learned predicate CNN features of the two models. Red and green boxes highlight the wrong and right outputs (the first row) or feature saliency (the second row).
This is a PyTorch implementation for Graphical Contrastive Losses for Scene Graph Parsing, CVPR2019. This is an improved version of the code that won the 1st place in the Google AI Open Images Visual Relationship Detection Chanllenge.
We have created a branch for a version supporting pytorch1.0! Just go to the pytorch1_0 branch and check it out!
Method | Backbone | SGDET@20 | SGDET@50 | SGDET@100 |
---|---|---|---|---|
Frequency [1] | VGG16 | 17.7 | 23.5 | 27.6 |
Frequency+Overlap [1] | VGG16 | 20.1 | 26.2 | 30.1 |
MotifNet [1] | VGG16 | 21.4 | 27.2 | 30.3 |
Graph-RCNN [2] | Res-101 | 19.4 | 25.0 | 28.5 |
RelDN, w/o contrastive losses | VGG16 | 20.8 | 28.1 | 32.5 |
RelDN, full | VGG16 | 21.1 | 28.3 | 32.7 |
RelDN, full | ResNext-101-FPN | 22.5 | 31.0 | 36.7 |
*"RelDN" is the relationship detection model we proposed in the paper.
*We use the frequency prior in our model by default.
*Results of "Graph-RCNN" are directly copied from their repo.
git clone https://github.com/NVIDIA/ContrastiveLosses4VRD.git --recurse-submodules
Python 3
Python packages
pytorch 0.4.0 or 0.4.1.post2 (not guaranteed to work on newer versions)
cython
matplotlib
numpy
scipy
opencv
pyyaml
packaging
tensorboardX
tqdm
pillow
scikit-image
An NVIDIA GPU and CUDA 8.0 or higher. Some operations only have gpu implementation.
An easy installation if you already have Anaconda Python 3 and CUDA 9.0:
conda install pytorch=0.4.1 pip install cython pip install matplotlib numpy scipy pyyaml packaging pycocotools tensorboardX tqdm pillow scikit-image conda install opencv
(Optional) A dockerfile with all necessary dependencies is included in docker/Dockerfile. Requires nvidia-docker
# ROOT=path/to/cloned/repository cd $ROOT/docker # build the docker image and tag it docker build -t myname/mydockertag:1.0 # launch an interactive session with this folder nvidia-docker run -v $ROOT:/workspace/visual-relationship-detection:rw -it myname/mydockertag:1.0 # NOTE: you may need to mount other volumes depending on where your datasets are stored
Compile the CUDA code in the Detectron submodule and in the repo:
# ROOT=path/to/cloned/repository cd $ROOT/Detectron_pytorch/lib sh make.sh cd $ROOT/lib sh make.sh
Create a data folder at the top-level directory of the repository:
# ROOT=path/to/cloned/repository cd $ROOT mkdir data
If necessary, one may edit the DATA_DIR
field in
lib/core/config.py to change the expected path to the data directory. Be
sure to update the paths in the VRD preprocessing scripts (mentioned
below) if this is done.
Download it here. Unzip it under the data folder. You should see an openimages_v4
folder unzipped there. It contains .json annotation files for both
OpenImages and OpenImages_mini, which is a subset of the former created
by us including 4500 train and 1000 test images. The .json files are
created based on the original .csv annotations.
Download it here. Unzip it under the data folder. You should see a vg
folder unzipped there. It contains .json annotations that suit the dataloader used in this repo.
See Images:VRD
Create a folder train/
for the training images:
# ROOT=path/to/cloned/repository cd $ROOT/data/openimages_v4 mkdir train
Download OpenImages v4 training images from the official page (Warning: this is a very large dataset). Note: only training images are needed since our annotations will split them into a train and a validation set. Put all images in train/
Create a folder for all images:
# ROOT=path/to/cloned/repository cd $ROOT/data/vg mkdir VG_100K
Download Visual Genome images from the official page. Unzip all images (part 1 and part 2) into VG_100K/
. There should be a total of 108249 files.
Create the vrd folder under data
:
# ROOT=path/to/cloned/repository cd $ROOT/data/vrd
Download the original annotation json files from here and unzip json_dataset.zip
here. The images can be downloaded from here. Unzip sg_dataset.zip
to create an sg_dataset
folder in data/vrd
. Next run the preprocessing scripts:
cd $ROOT python tools/rename_vrd_with_numbers.py python tools/convert_vrd_anno_to_coco_format.py
rename_vrd_with_numbers.py
converts all non-jpg images
(some images are in png or gif) to jpg, and renames them in the
{:012d}.jpg format (e.g., "000000000001.jpg"). It also creates new
relationship annotations other than the original ones. This is mostly to
make things easier for the dataloader. The filename mapping from the
original is stored in data/vrd/*_fname_mapping.json
where "*" is either "train" or "val".
convert_vrd_anno_to_coco_format.py
creates object
detection annotations from the new annotations generated above, which
are required by the dataloader during training.
Download pre-trained object detection models here. Unzip it under the root directory. Note: We do not include code for training object detectors. Please refer to the "(Optional) Training Object Detection Models" section in Large-Scale-VRD.pytorch for this.
Download our trained models here. Unzip it under the root folder and you should see a trained_models
folder there.
The final directories for data and detection models should look like:
|-- detection_models | |-- oi_rel | | |-- X-101-64x4d-FPN | | | |-- model_step599999.pth | |-- vg | | |-- VGG16 | | | |-- model_step479999.pth | | |-- X-101-64x4d-FPN | | | |-- model_step119999.pth | |-- vrd | | |-- VGG16 | | | |-- model_step4499.pth |-- data | |-- openimages_v4 | | |-- train <-- (contains OpenImages_v4 training/validation images) | | |-- rel | | | |-- rel_only_annotations_train.json | | | |-- rel_only_annotations_val.json | | | |-- ... | |-- vg | | |-- VG_100K <-- (contains Visual Genome all images) | | |-- rel_annotations_train.json | | |-- rel_annotations_val.json | | |-- ... | |-- vrd | | |-- train_images <-- (contains Visual Relation Detection training images) | | |-- val_images <-- (contains Visual Relation Detection validation images) | | |-- new_annotations_train.json | | |-- new_annotations_val.json | | |-- ... |-- trained_models | |-- oi_mini_X-101-64x4d-FPN | | |-- model_step6749.pth | |-- oi_X-101-64x4d-FPN | | |-- model_step80929.pth | |-- vg_VGG16 | | |-- model_step62722.pth | |-- vg_X-101-64x4d-FPN | | |-- model_step62722.pth | |-- vrd_VGG16_IN_pretrained | | |-- model_step7559.pth | |-- vrd_VGG16_COCO_pretrained | | |-- model_step7559.pth
DO NOT CHANGE anything in the provided config
files(configs/xx/xxxx.yaml) even if you want to test with less or more
than 8 GPUs. Use the environment variable CUDA_VISIBLE_DEVICES
to control how many and which GPUs to use. Remove the--multi-gpu-test
for single-gpu inference.
To test a trained model using a ResNeXt-101-64x4d-FPN backbone, run
python ./tools/test_net_rel.py --dataset oi_rel_mini --cfg configs/oi_rel_mini/e2e_faster_rcnn_X-101-64x4d-FPN_12_epochs_oi_rel_mini_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5.yaml --load_ckpt trained_models/oi_mini_X-101-64x4d-FPN/model_step6749.pth --output_dir Outputs/oi_mini_X-101-64x4d-FPN --multi-gpu-testing --do_val
This should reproduce the numbers shown at the last line of Table 1 in the paper.
To test a trained model using a ResNeXt-101-64x4d-FPN backbone, run
python ./tools/test_net_rel.py --dataset oi_rel --cfg configs/oi_rel/e2e_faster_rcnn_X-101-64x4d-FPN_12_epochs_oi_rel_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5.yaml --load_ckpt trained_models/oi_X-101-64x4d-FPN/model_step80929.pth --output_dir Outputs/oi_X-101-64x4d-FPN --multi-gpu-testing --do_val
NOTE: May require at least 64GB RAM to evaluate on the Visual Genome test set
We use three evaluation metrics for Visual Genome:
SGDET: predict all the three labels and two boxes
SGCLS: predict subject, object and predicate labels given ground truth subject and object boxes
PRDCLS: predict predicate labels given ground truth subject and object boxes and labels
To test a trained model using a VGG16 backbone with "SGDET", run
python ./tools/test_net_rel.py --dataset vg --cfg configs/vg/e2e_faster_rcnn_VGG16_8_epochs_vg_v3_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5_no_spt.yaml --load_ckpt trained_models/vg_VGG16/model_step62722.pth --output_dir Outputs/vg_VGG16 --multi-gpu-testing --do_val
Use --use_gt_boxes
option to test it with "SGCLS"; use --use_gt_boxes --use_gt_labels
options to test it with "PRDCLS". The results will vary slightly with the last line of Table 6 in the paper.
To test a trained model using a vg_X-101-64x4d-FPN backbone with "SGDET", run
python ./tools/test_net_rel.py --dataset vg --cfg configs/vg/e2e_faster_rcnn_X-101-64x4d-FPN_8_epochs_vg_v3_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5.yaml --load_ckpt trained_models/vg_X-101-64x4d-FPN/model_step62722.pth --output_dir Outputs/vg_X-101-64x4d-FPN --multi-gpu-testing --do_val
Use --use_gt_boxes
option to test it with "SGCLS"; use --use_gt_boxes --use_gt_labels
options to test it with "PRDCLS". The results will vary slightly with those at the last line of Table 1 in the supplementary.
To test a trained model initialized by an ImageNet pre-trained VGG16 model, run
python ./tools/test_net_rel.py --dataset vrd --cfg configs/vrd/e2e_faster_rcnn_VGG16_16_epochs_vrd_v3_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5_IN_pretrained.yaml --load_ckpt trained_models/vrd_VGG16_IN_pretrained/model_step7559.pth --output_dir Outputs/vrd_VGG16_IN_pretrained --multi-gpu-testing --do_val
The results are slightly different with those at the second to the last line of Table 7.
To test a trained model initialized by an COCO pre-trained VGG16 model, run
python ./tools/test_net_rel.py --dataset vrd --cfg configs/vrd/e2e_faster_rcnn_VGG16_16_epochs_vrd_v3_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5_COCO_pretrained.yaml --load_ckpt trained_models/vrd_VGG16_COCO_pretrained/model_step7559.pth --output_dir Outputs/vrd_VGG16_COCO_pretrained --multi-gpu-testing --do_val
The results are slightly different with those at the last line of Table 7.
The section provides the command-line arguments to train our
relationship detection models given the pre-trained object detection
models described above. Note: We do not train object detectors here. We only use trained object detectors (provided in detection_models/
) to initialize our to-be-trained relationship models.
DO NOT CHANGE anything in the provided config
files(configs/xx/xxxx.yaml) even if you want to train with less or more
than 8 GPUs. Use the environment variable CUDA_VISIBLE_DEVICES
to control how many and which GPUs to use.
With the following command lines, the training results (models and logs) should be in $ROOT/Outputs/xxx/
where xxx
is the .yaml file name used in the command without the ".yaml"
extension. If you want to test with your trained models, simply run the
test commands described above by setting --load_ckpt
as the path of your trained models.
To train our relationship network using a ResNeXt-101-64x4d-FPN backbone, run
python tools/train_net_step_rel.py --dataset oi_rel_mini --cfg configs/oi_rel_mini/e2e_faster_rcnn_X-101-64x4d-FPN_12_epochs_oi_rel_mini_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5.yaml --nw 8 --use_tfboard
To train our relationship network using a ResNeXt-101-64x4d-FPN backbone, run
python tools/train_net_step_rel.py --dataset oi_rel --cfg configs/oi_rel/e2e_faster_rcnn_X-101-64x4d-FPN_12_epochs_oi_rel_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5.yaml --nw 8 --use_tfboard
To train our relationship network using a VGG16 backbone, run
python tools/train_net_step_rel.py --dataset vg --cfg configs/vg/e2e_faster_rcnn_VGG16_8_epochs_vg_v3_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5_no_spt.yaml --nw 8 --use_tfboard
To train our relationship network using a ResNeXt-101-64x4d-FPN backbone, run
python tools/train_net_step_rel.py --dataset vg --cfg configs/vg/e2e_faster_rcnn_X-101-64x4d-FPN_8_epochs_vg_v3_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5.yaml --nw 8 --use_tfboard
To train our relationship network initialized by an ImageNet pre-trained VGG16 model, run
python tools/train_net_step_rel.py --dataset vrd --cfg configs/vrd/e2e_faster_rcnn_VGG16_16_epochs_vrd_v3_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5_IN_pretrained.yaml --nw 8 --use_tfboard
To train our relationship network initialized by a COCO pre-trained VGG16 model, run
python tools/train_net_step_rel.py --dataset vrd --cfg configs/vrd/e2e_faster_rcnn_VGG16_16_epochs_vrd_v3_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5_COCO_pretrained.yaml --nw 8 --use_tfboard
This repository uses code based on the Neural-Motifs source code from Rowan Zellers, as well as code from the Detectron.pytorch repository by Roy Tseng. See LICENSES for additional details.
If you use this code in your research, please use the following BibTeX entry.
@conference{zhang2019vrd, title={Graphical Contrastive Losses for Scene Graph Parsing}, author={Zhang, Ji and Shih, Kevin J. and Elgammal, Ahmed and Tao, Andrew and Catanzaro, Bryan}, booktitle={CVPR}, year={2019} }
上一篇:libglvnd
还没有评论,说两句吧!
热门资源
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
inferno-boilerplate
This is a very basic boilerplate example for pe...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com