Pytorch-Correlation-extension
this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC
This tutorial was used as a basis for implementation, as well asNVIDIA's cuda code
Build and Install C++ and CUDA extensions by executing python setup.py install
,
Benchmark C++ vs. CUDA by running python benchmark.py {cpu, cuda}
,
Run gradient checks on the code by running python grad_check.py --backend {cpu, cuda}
.
This module is expected to compile for Pytorch 1.2
, on Python > 3.5
and Python 2.7
.
this module is available on pip
pip install spatial-correlation-sampler
For a cpu-only version, you can install from source with
python setup_cpu.py install
This module needs compatible gcc version and CUDA to be compiled. Namely, CUDA 9.1 and below will need gcc5, while CUDA 9.2 and 10.0 will need gcc7 See this issue for more information
API has a few difference with NVIDIA's module
output is now a 5D tensor, which reflects the shifts horizontal and vertical.
input (B x C x H x W) -> output (B x PatchH x PatchW x oH x oW)
Output sizes oH
and oW
are no longer dependant of patch size, but only of kernel size and padding
Patch size patch_size
is now the whole patch, and not only the radii.
stride1
is now stride
andstride2
is dilation_patch
, which behave like dilated convolutions
equivalent max_displacement
is then dilation_patch * (patch_size - 1) / 2
.
to get the right parameters for FlowNetC, you would have
kernel_size=1 patch_size=21, stride=1, padding=0, dilation_patch=2
default parameters are from benchmark.py
, FlowNetC parameters are same as use in FlowNetC
with a batch size of 4, described in this paper, implemented here and here.
Feel free to file an issue to add entries to this with your hardware !
See here for a benchmark script working with NVIDIA's code, and Pytorch.
Benchmark are launched with environment variable CUDA_LAUNCH_BLOCKING
set to 1
.
Only float32
is benchmarked.
FlowNetC correlation parameters where launched with the following command:
CUDA_LAUNCH_BLOCKING=1 python benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256 cuda CUDA_LAUNCH_BLOCKING=1 python NV_correlation_benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256
implementation | Correlation parameters | device | pass | min time | avg time |
---|---|---|---|---|---|
ours | default | 980 GTX | forward | 5.745 ms | 5.851 ms |
ours | default | 980 GTX | backward | 77.694 ms | 77.957 ms |
NVIDIA | default | 980 GTX | forward | 13.779 ms | 13.853 ms |
NVIDIA | default | 980 GTX | backward | 73.383 ms | 73.708 ms |
ours | FlowNetC | 980 GTX | forward | 26.102 ms | 26.179 ms |
ours | FlowNetC | 980 GTX | backward | 208.091 ms | 208.510 ms |
NVIDIA | FlowNetC | 980 GTX | forward | 35.363 ms | 35.550 ms |
NVIDIA | FlowNetC | 980 GTX | backward | 283.748 ms | 284.346 ms |
The overhead of our implementation regarding kernel_size
> 1 during backward needs some investigation, feel free to
dive in the code to improve it !
The backward pass of NVIDIA is not entirely correct when stride1 > 1 and kernel_size > 1, because not everything is computed, see here.
No other implementation is avalaible on CPU.
It is obviously not recommended to run it on CPU if you have a GPU.
Correlation parameters | device | pass | min time | avg time |
---|---|---|---|---|
default | E5-2630 v3 @ 2.40GHz | forward | 159.616 ms | 188.727 ms |
default | E5-2630 v3 @ 2.40GHz | backward | 282.641 ms | 294.194 ms |
FlowNetC | E5-2630 v3 @ 2.40GHz | forward | 2.138 s | 2.144 s |
FlowNetC | E5-2630 v3 @ 2.40GHz | backward | 7.006 s | 7.075 s |
下一篇:django-mlogger
还没有评论,说两句吧!
热门资源
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
inferno-boilerplate
This is a very basic boilerplate example for pe...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com