R2Plus1D-C3D

资源分类

R2Plus1D-C3D

2020-02-13 |

95 |

0 |

R2Plus1D-C3D

R2Plus1D-C3D

A PyTorch implementation of R2Plus1D and C3D based on CVPR 2017 paper A Closer Look at Spatiotemporal Convolutions for Action Recognition and CVPR 2014 paper Learning Spatiotemporal Features with 3D Convolutional Networks.

Requirements

conda install pytorch torchvision -c pytorch

opencv

conda install opencv

rarfile

pip install rarfile

sudo apt install rar

unrar

sudo apt install unrar

ffmpeg

sudo apt install build-essential openssl libssl-dev autoconf automake cmake git-core libass-dev libfreetype6-dev libsdl2-dev libtool libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev pkg-config texinfo wget zlib1g-dev nasm yasm libx264-dev libx265-dev libnuma-dev libvpx-dev libfdk-aac-dev libmp3lame-dev libopus-dev
wget https://ffmpeg.org/releases/ffmpeg-4.1.3.tar.bz2
tar -jxvf ffmpeg-4.1.3.tar.bz2
cd ffmpeg-4.1.3/
./configure --prefix="../build" --enable-static --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree --enable-openssl
make -j4
make install
sudo cp ../build/bin/ffmpeg /usr/local/bin/ 
rm -rf ../ffmpeg-4.1.3/ ../ffmpeg-4.1.3.tar.bz2 ../build/

youtube-dl

pip install youtube-dl

joblib

pip install joblib

PyTorchNet

pip install git+https://github.com/pytorch/tnt.git@master

Datasets

The datasets are coming from UCF101、 HMDB51 and KINETICS600. Download UCF101 and HMDB51 datasets with train/val/test split files into data directory. We use the split1 to split files. Run misc.py to preprocess these datasets.

For KINETICS600 dataset, first download train/val/test split files into data directory, then run download.py to download and preprocess this dataset.

Usage

Train Model

visdom -logging_level WARNING & python train.py --num_epochs 20 --pre_train kinetics600_r2plus1d.pth
optional arguments:
--data_type                   dataset type [default value is 'ucf101'](choices=['ucf101', 'hmdb51', 'kinetics600'])
--gpu_ids                     selected gpu [default value is '0,1']
--model_type                  model type [default value is 'r2plus1d'](choices=['r2plus1d', 'c3d'])
--batch_size                  training batch size [default value is 8]
--num_epochs                  training epochs number [default value is 100]
--pre_train                   used pre-trained model epoch name [default value is None]

Visdom now can be accessed by going to 127.0.0.1:8097 in your browser.

Inference Video

python inference.py --video_name data/ucf101/ApplyLipstick/v_ApplyLipstick_g04_c02.avi
optional arguments:
--data_type                   dataset type [default value is 'ucf101'](choices=['ucf101', 'hmdb51', 'kinetics600'])
--model_type                  model type [default value is 'r2plus1d'](choices=['r2plus1d', 'c3d'])
--video_name                  test video name
--model_name                  model epoch name [default value is 'ucf101_r2plus1d.pth']

The inferences will show in a pop up window.

Benchmarks

Adam optimizer (lr=0.0001) is used with learning rate scheduling.

For ucf101 and hmdb51 dataset, the models are trained with 100 epochs and batch size of 8 on one NVIDIA Tesla V100 (32G) GPU.

For kinetics600 dataset, the models are trained with 100 epochs and batch size of 32 on two NVIDIA Tesla V100 (32G) GPU. Because the training time is too long, so this experiment have not been finished.

The videos are preprocessed as 32 frames of 128x128, and cropped to 112x112.

Dataset	UCF101	HMDB51	Kinetics600
Num. of Train Videos	9,537	3,570	375,008
Num. of Val Videos	756	1,666	28,638
Num. of Test Videos	3,783	1,530	56,982
Num. of Classes	101	51	600
Accuracy (R2Plus1D)	63.60%	24.97%
Accuracy (C3D)	51.63%	25.10%
Num. of Parameters (R2Plus1D)	33,220,990	33,195,340	33,476,977
Num. of Parameters (C3D)	78,409,573	78,204,723	80,453,976
Training Time (R2Plus1D)	19.3h	7.3h	350h
Training Time (C3D)	10.9h	4.1h	190h

Results

The train/val/test loss、accuracy and confusion matrix are showed on visdom. The pretrained models can be downloaded from BaiduYun (access code: ducr).