Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation
via Deep Layer Cascade
Abstract
We propose a novel deep layer cascade (LC) method
to improve the accuracy and speed of semantic segmentation. Unlike the conventional model cascade (MC) that
is composed of multiple independent models, LC treats a
single deep model as a cascade of several sub-models.
Earlier sub-models are trained to handle easy and confident
regions, and they progressively feed-forward harder regions
to the next sub-model for processing. Convolutions are only
calculated on these regions to reduce computations. The
proposed method possesses several advantages. First, LC
classifies most of the easy regions in the shallow stage and
makes deeper stage focuses on a few hard regions. Such an
adaptive and ‘difficulty-aware’ learning improves segmentation performance. Second, LC accelerates both training
and testing of deep network thanks to early decisions in the
shallow stage. Third, in comparison to MC, LC is an endto-end trainable framework, allowing joint learning of all
sub-models. We evaluate our method on PASCAL VOC and
Cityscapes datasets, achieving state-of-the-art performance
and fast speed