Abstract
Convolutional neural networks are built upon the convolution operation, which extracts informative features by
fusing spatial and channel-wise information together within
local receptive fields. In order to boost the representational power of a network, several recent approaches have
shown the benefit of enhancing spatial encoding. In this
work, we focus on the channel relationship and propose
a novel architectural unit, which we term the “Squeezeand-Excitation” (SE) block, that adaptively recalibrates
channel-wise feature responses by explicitly modelling interdependencies between channels. We demonstrate that by
stacking these blocks together, we can construct SENet architectures that generalise extremely well across challenging datasets. Crucially, we find that SE blocks produce
significant performance improvements for existing state-ofthe-art deep architectures at minimal additional computational cost. SENets formed the foundation of our ILSVRC
2017 classification submission which won first place and
significantly reduced the top-5 error to 2.251%, achieving a ?25% relative improvement over the winning entry of 2016. Code and models are available at https:
//github.com/hujie-frank/SENet