Abstract
We propose a multigrid extension of convolutional neural networks (CNNs). Rather than manipulating representations living on a single spatial grid, our network layers
operate across scale space, on a pyramid of grids. They
consume multigrid inputs and produce multigrid outputs;
convolutional filters themselves have both within-scale and
cross-scale extent. This aspect is distinct from simple multiscale designs, which only process the input at different
scales. Viewed in terms of information flow, a multigrid
network passes messages across a spatial pyramid. As a
consequence, receptive field size grows exponentially with
depth, facilitating rapid integration of context. Most critically, multigrid structure enables networks to learn internal
attention and dynamic routing mechanisms, and use them to
accomplish tasks on which modern CNNs fail.
Experiments demonstrate wide-ranging performance advantages of multigrid. On CIFAR and ImageNet classification tasks, flipping from a single grid to multigrid within the
standard CNN paradigm improves accuracy, while being
compute and parameter efficient. Multigrid is independent
of other architectural choices; we show synergy in combination with residual connections. Multigrid yields dramatic improvement on a synthetic semantic segmentation
dataset. Most strikingly, relatively shallow multigrid networks can learn to directly perform spatial transformation
tasks, where, in contrast, current CNNs fail. Together, our
results suggest that continuous evolution of features on a
multigrid pyramid is a more powerful alternative to existing CNN designs on a flat grid.