Abstract
The key idea of current deep learning methods for
dense prediction is to apply a model on a regular patch centered on each pixel to make pixelwise predictions. These methods are limited in the
sense that the patches are determined by network
architecture instead of learned from data. In this
work, we propose the dense transformer networks,
which can learn the shapes and sizes of patches
from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of
dense transformer modules are inserted into each
of the encoder and decoder paths. The novelty of
this work is that we provide technical solutions for
learning the shapes and sizes of patches from data
and efficiently restoring the spatial correspondence
required for dense prediction. The proposed dense
transformer modules are differentiable, thus the entire network can be trained. We apply the proposed
networks on biological image segmentation tasks
and show superior performance is achieved in comparison to baseline methods