R2Plus1D-C3D
A PyTorch implementation of R2Plus1D and C3D based on CVPR 2017 paper A Closer Look at Spatiotemporal Convolutions for Action Recognition and CVPR 2014 paper Learning Spatiotemporal Features with 3D Convolutional Networks.
conda install pytorch torchvision -c pytorch
opencv
conda install opencv
rarfile
pip install rarfile
rar
sudo apt install rar
unrar
sudo apt install unrar
ffmpeg
sudo apt install build-essential openssl libssl-dev autoconf automake cmake git-core libass-dev libfreetype6-dev libsdl2-dev libtool libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev pkg-config texinfo wget zlib1g-dev nasm yasm libx264-dev libx265-dev libnuma-dev libvpx-dev libfdk-aac-dev libmp3lame-dev libopus-dev wget https://ffmpeg.org/releases/ffmpeg-4.1.3.tar.bz2 tar -jxvf ffmpeg-4.1.3.tar.bz2 cd ffmpeg-4.1.3/ ./configure --prefix="../build" --enable-static --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree --enable-openssl make -j4 make install sudo cp ../build/bin/ffmpeg /usr/local/bin/ rm -rf ../ffmpeg-4.1.3/ ../ffmpeg-4.1.3.tar.bz2 ../build/
youtube-dl
pip install youtube-dl
joblib
pip install joblib
PyTorchNet
pip install git+https://github.com/pytorch/tnt.git@master
The datasets are coming from UCF101、 HMDB51 and KINETICS600. Download UCF101
and HMDB51
datasets with train/val/test
split files into data
directory. We use the split1
to split files. Run misc.py
to preprocess these datasets.
For KINETICS600
dataset, first download train/val/test
split files into data
directory, then run download.py
to download and preprocess this dataset.
visdom -logging_level WARNING & python train.py --num_epochs 20 --pre_train kinetics600_r2plus1d.pth optional arguments: --data_type dataset type [default value is 'ucf101'](choices=['ucf101', 'hmdb51', 'kinetics600']) --gpu_ids selected gpu [default value is '0,1'] --model_type model type [default value is 'r2plus1d'](choices=['r2plus1d', 'c3d']) --batch_size training batch size [default value is 8] --num_epochs training epochs number [default value is 100] --pre_train used pre-trained model epoch name [default value is None]
Visdom now can be accessed by going to 127.0.0.1:8097
in your browser.
python inference.py --video_name data/ucf101/ApplyLipstick/v_ApplyLipstick_g04_c02.avi optional arguments: --data_type dataset type [default value is 'ucf101'](choices=['ucf101', 'hmdb51', 'kinetics600']) --model_type model type [default value is 'r2plus1d'](choices=['r2plus1d', 'c3d']) --video_name test video name --model_name model epoch name [default value is 'ucf101_r2plus1d.pth']
The inferences will show in a pop up window.
Adam optimizer (lr=0.0001) is used with learning rate scheduling.
For ucf101
and hmdb51
dataset, the models are trained with 100 epochs and batch size of 8 on one NVIDIA Tesla V100 (32G) GPU.
For kinetics600
dataset, the models are trained with 100 epochs and batch size of 32 on two NVIDIA Tesla V100 (32G) GPU. Because the training time is too long, so this experiment have not been finished.
The videos are preprocessed as 32 frames of 128x128, and cropped to 112x112.
Dataset | UCF101 | HMDB51 | Kinetics600 |
---|---|---|---|
Num. of Train Videos | 9,537 | 3,570 | 375,008 |
Num. of Val Videos | 756 | 1,666 | 28,638 |
Num. of Test Videos | 3,783 | 1,530 | 56,982 |
Num. of Classes | 101 | 51 | 600 |
Accuracy (R2Plus1D) | 63.60% | 24.97% | |
Accuracy (C3D) | 51.63% | 25.10% | |
Num. of Parameters (R2Plus1D) | 33,220,990 | 33,195,340 | 33,476,977 |
Num. of Parameters (C3D) | 78,409,573 | 78,204,723 | 80,453,976 |
Training Time (R2Plus1D) | 19.3h | 7.3h | 350h |
Training Time (C3D) | 10.9h | 4.1h | 190h |
The train/val/test loss、accuracy and confusion matrix are showed on visdom. The pretrained models can be downloaded from BaiduYun (access code: ducr).
上一篇:R2Plus1D-MXNet
下一篇: R2plus1D_TSN_combine
还没有评论,说两句吧!
热门资源
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
shih-styletransfer
shih-styletransfer Code from Style Transfer ...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com