Abstract
Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this goal, we “plug-in” human subjects for each of the various components in a state-of-the-art conditional random fifield model (CRF) on the MSRC dataset. Comparisons among various hybrid human-machine CRFs give us indications of how much “head room” there is to improve segmentation by focusing research efforts on each of the tasks. One of the interesting fifindings from our slew of studies was that human classifification of isolated super-pixels, while being worse than current machine classififiers, provides a signifificant boost in performance when plugged into the CRF! Fascinated by this fifinding, we conducted in depth analysis of the human generated potentials. This inspired a new machine potential which signifificantly improves state-of-the-art performance on the MRSC dataset