Abstract. Convolution is spatially-symmetric, i.e., the visual features
are independent of its position in the image, which limits its ability to
utilize contextual cues for visual recognition. This paper addresses this
issue by introducing a recalibration process, which refers to the surrounding region of each neuron, computes an importance value and multiplies
it to the original neural response. Our approach is named multi-scale
spatially-asymmetric recalibration (MS-SAR), which extracts visual cues from surrounding regions at multiple scales, and designs a
weighting scheme which is asymmetric in the spatial domain. MS-SAR
is implemented in an efficient way, so that only small fractions of extra parameters and computations are required. We apply MS-SAR to
several popular building blocks, including the residual block and the
densely-connected block, and demonstrate its superior performance in
both CIFAR and ILSVRC2012 classification tasks