Abstract
Ob ject detection and pixel-wise scene labeling have both been active research areas in recent years and impressive results have been reported for both tasks separately. The integration of these differ- ent types of approaches should boost performance for both tasks as ob- ject detection can profit from powerful scene labeling and also pixel-wise scene labeling can profit from powerful ob ject detection. Consequently, first approaches have been proposed that aim to integrate both ob ject detection and scene labeling in one framework. This paper proposes a novel approach based on conditional random field (CRF) models that ex- tends existing work by 1) formulating the integration as a joint labeling problem of ob ject and scene classes and 2) by systematically integrating dynamic information for the ob ject detection task as well as for the scene labeling task. As a result, the approach is applicable to highly dynamic scenes including both fast camera and ob ject movements. Experiments show the applicability of the novel approach to challenging real-world video sequences and systematically analyze the contribution of different system components to the overall performance.