Abstract.
We address the problem of segmenting highly articulated video ob jects in a wide variety of poses. The main idea of our approach is to model the prior information of ob ject appearance via random forests. To automatically extract an ob ject from a video sequence, we first build a random forest based on image patches sampled from the initial tem- plate. Owing to the nature of using a randomized technique and simple features, the modeled prior information is considered weak, but on the other hand appropriate for our application. Furthermore, the random forest can be dynamically updated to generate prior probabilities about the configurations of the ob ject in subsequent image frames. The algo- rithm then combines the prior probabilities with low-level region infor- mation to produce a sequence of flgure-ground segmentations. Overall, the proposed segmentation technique is useful and flexible in that one can easily integrate different cues and efficiently select discriminating features to model ob ject appearance and handle various articulations.