Another advantage of CuPy over C code is that dimensions of each op are known at JIT-ing time, and compiled kernels potentially can be faster. Also, the first version of the package was in PyCUDA, but it can't work with PyTorch multi-GPU.
~~On Maxwell Titan X pyinn.conv2d_depthwise MobileNets are ~2.6x faster than F.conv2d~~ benchmark.py
No longer the case - with new kernels PyTorch 0.3.0 is now ~20% faster than pyinn.
import torchfrom torch.autograd import Variableimport pyinn as P
x = Variable(torch.randn(1,4,5,5).cuda())
w = Variable(torch.randn(4,1,3,3).cuda())
y = P.conv2d_depthwise(x, w, padding=1)
or with modules interface:
from pyinn.modules import Conv2dDepthwise
module = Conv2dDepthwise(channels=4, kernel_size=3, padding=1).cuda()
y = module(x)
Documentation
conv2d_depthwise
Implements depthwise convolution as in https://arxiv.org/abs/1704.04861 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications