Abstract
Most convolutional neural networks use some method for
gradually downscaling the size of the hidden layers. This
is commonly referred to as pooling, and is applied to reduce the number of parameters, improve invariance to certain distortions, and increase the receptive field size. Since
pooling by nature is a lossy process, it is crucial that each
such layer maintains the portion of the activations that is
most important for the network’s discriminability. Yet, simple maximization or averaging over blocks, max or average
pooling, or plain downsampling in the form of strided convolutions are the standard. In this paper, we aim to leverage recent results on image downscaling for the purposes
of deep learning. Inspired by the human visual system,
which focuses on local spatial changes, we propose detailpreserving pooling (DPP), an adaptive pooling method that
magnifies spatial changes and preserves important structural detail. Importantly, its parameters can be learned
jointly with the rest of the network. We analyze some of
its theoretical properties and show its empirical benefits on
several datasets and networks, where DPP consistently outperforms previous pooling approaches