Abstract
Computer vision algorithms for individual tasks such as object recog- nition, detection and segmentation have shown impressive results in the recent past. The next challenge is to integrate all these algorithms and address the prob- lem of scene understanding. This paper is a step towards this goal. We present a probabilistic framework for reasoning about regions, objects, and their attributes such as object class, location, and spatial extent. Our model is a Conditional Ran- dom Field defined on pixels, segments and objects. We define a global energy function for the model, which combines results from sliding window detectors, and low-level pixel-based unary and pairwise relations. One of our primary con- tributions is to show that this energy function can be solved efficiently. Exper- imental results show that our model achieves signi ficant improvement over the baseline methods on CamVid and PA SCA L VOC datasets.