资源算法pytorch-resnet3d

pytorch-resnet3d

2020-02-21 | |  38 |   0 |   0

3D ConvNets in Pytorch

Do you want >72% top-1 accuracy on a large video dataset? Are you tired of Kinetics videos disappearing from YouTube every day? Do you have recurring nightmares about Caffe2? Then this is the repo for you!

This is a PyTorch implementation of the Caffe2 I3D ResNet Nonlocal model from the video-nonlocal-net repo. The weights are directly ported from the caffe2 model (See checkpoints). This should be a good starting point to extract features, finetune on another dataset etc. without the hassle of dealing with Caffe2, and with all the benefits of a very carefully trained Kinetics model.

It's only a matter of time before FAIR releases a good PyTorch version of their nonlocal-net codebase, but until then, at least you have this ¯_(ツ)_/¯

Amazing features:
⁣- Only a single model (ResNet50-I3D). Parameters hardcoded with love.
⁣- Only the evaluation script for Kinetics (training from scratch or ftuning has not been tested yet.)
⁣- No nonlocal versions yet. One exciting NL version to choose from.

Kinetics Evaluation

The code has been tested with Python 3.7 + PyTorch 1.1.

Pretrained Weights
Download pretrained weights for I3D and I3D-NL models from the nonlocal repo

wget https://dl.fbaipublicfiles.com/video-nonlocal/i3d_baseline_32x2_IN_pretrain_400k.pkl -P pretrained/
wget https://dl.fbaipublicfiles.com/video-nonlocal/i3d_nonlocal_32x2_IN_pretrain_400k.pkl -P pretrained/

Convert these weights from caffe2 to pytorch. This is just a simple renaming of the blobs to match the pytorch model.

python -m utils.convert_weights pretrained/i3d_baseline_32x2_IN_pretrain_400k.pkl pretrained/i3d_r50_kinetics.pth
python -m utils.convert_weights pretrained/i3d_nonlocal_32x2_IN_pretrain_400k.pkl pretrained/i3d_r50_nl_kinetics.pth

The model can be created and weights loaded using

from models import resnet
net = resnet.i3_res50() # vanilla I3D ResNet50net = resnet.i3_res50_nl() # Nonlocal version

Data
Download videos using the official crawler and extract frames. This repo has a script to do this. Then create softlinks for frames and annotations:

mkdir -p data/kinetics/frames/ data/kinetics/annotations/
ln -s /path/to/kinetics/frames data/kinetics/frames/
ln -s /path/to/kinetics/annotation_csvs data/kinetics/annotations/

Evaluate
Run the evaluation script to generate scores on the validation set.

# Evaluate using 3 random spatial crops per frame + 10 uniformly sampled clips per video# Model = I3D ResNet50 Nonlocalpython eval.py --batch_size 8 --mode video --model r50_nl# Evaluate using a single, center crop and a single, centered clip of 32 frames # Model = I3D ResNet50python eval.py --batch_size 8 --mode clip --model r50# Use --parallel for multiple GPUspython eval.py --batch_size 16 --mode clip --model r50_nl --parallel
Modelclip (top1/top5)video (top1/top5)
I3D Res500.647 / 0.8530.721 / 0.902
I3D Res50 NL0.664 / 0.8640.737 / 0.912

You should get around 72.1% top-1 accuracy for the video using I3D Res50, and around 73.7% using the non-local version. Note that these numbers are on whatever is left of the Kinetics val set these days (~18434 videos).


上一篇:ResNet-18-Caffemodel-on-ImageNet

下一篇:resnet-18-tensorflow

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...