Abstract
Neural networks rely on convolutions to aggregate spatial information. However, spatial convolutions are expensive in terms of model size and computation, both of which
grow quadratically with respect to kernel size. In this paper, we present a parameter-free, FLOP-free “shift” operation as an alternative to spatial convolutions. We fuse shifts
and point-wise convolutions to construct end-to-end trainable shift-based modules, with a hyperparameter characterizing the tradeoff between accuracy and efficiency. To
demonstrate the operation’s efficacy, we replace ResNet’s
3x3 convolutions with shift-based modules for improved CIFAR10 and CIFAR100 accuracy using 60% fewer parameters; we additionally demonstrate the operation’s resilience
to parameter reduction on ImageNet, outperforming ResNet
family members. We finally show the shift operation’s applicability across domains, achieving strong performance with
fewer parameters on image classification, face verification
and style transfer