Abstract. We focus on the challenging task of real-time semantic segmentation in this paper. It finds many practical applications and yet is
with fundamental difficulty of reducing a large portion of computation for
pixel-wise label inference. We propose an image cascade network (ICNet)
that incorporates multi-resolution branches under proper label guidance
to address this challenge. We provide in-depth analysis of our framework
and introduce the cascade feature fusion unit to quickly achieve highquality segmentation. Our system yields real-time inference on a single
GPU card with decent quality results evaluated on challenging datasets
like Cityscapes, CamVid and COCO-Stuff