In the paper, training is done using Caltech and KAIST DB seperately, total 6 stages. However, this implementation trains network using both DB at the same time, total 4 stages, for simplicity, resulting a similar performance.
extract visible camera images and annotations using each toolbox
place KAIST(set00-05, skip=10), Caltech(set00-10, skip=30) training images and annotations in ./datasets/train/
place KAIST(set06-11, skip=20) testing images and annotations in ./datasets/test/
place the toolbox folders (KAIST, Caltech) in ./external/, and name as toolbox(kaist) and toolbox(caltech), respectively
run fetch_data/fetch_caffe_mex_cuda65.m to download a compiled Caffe mex (for Windows only).
download ImageNet-pre-trained VGG16(reduced for 7x3 ROI pooling) model(depicted below) from GoogleDrive and place it to ./models/pre_trained_models/vgg_16layers
Training
Run startup.m
Run faster_rcnn_VGG16.m
Preparation for Testing
extract KAIST(set06-11) testing images with skip frame=1 for the fusion of successive images.
place these images in ./datasets/skip1/
Testing
Run final_test.m to get the result in ./test/faster-rcnn-test3