Abstract
We propose a novel end-to-end trainable, deep, encoderdecoder architecture for single-pass semantic segmentation. Our approach is based on a cascaded architecture with feature-level long-range skip connections. The
encoder incorporates the structure of ResNeXt’s residual
building blocks and adopts the strategy of repeating a building block that aggregates a set of transformations with the
same topology. The decoder features a novel architecture,
consisting of blocks, that (i) capture context information,
(ii) generate semantic features, and (iii) enable fusion between different output resolutions. Crucially, we introduce
dense decoder shortcut connections to allow decoder blocks
to use semantic feature maps from all previous decoder levels, i.e. from all higher-level feature maps. The dense
decoder connections allow for effective information propagation from one decoder block to another, as well as for
multi-level feature fusion that significantly improves the accuracy. Importantly, these connections allow our method to
obtain state-of-the-art performance on several challenging
datasets, without the need of time-consuming multi-scale
averaging of previous works