RefineNet: Multi-Path Refinement Networks
for High-Resolution Semantic Segmentation
Abstract
Recently, very deep convolutional neural networks
(CNNs) have shown outstanding performance in object
recognition and have also been the first choice for dense
classification problems such as semantic segmentation.
However, repeated subsampling operations like pooling or
convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present
RefineNet, a generic multi-path refinement network that
explicitly exploits all the information available along the
down-sampling process to enable high-resolution prediction using long-range residual connections. In this way,
the deeper layers that capture high-level semantic features
can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet
employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which
captures rich background context in an efficient manner. We
carry out comprehensive experiments and set new stateof-the-art results on seven public datasets. In particular,
we achieve an intersection-over-union score of 83.4 on the
challenging PASCAL VOC 2012 dataset, which is the best
reported result to date