Abstract
A ma jor challenge in video segmentation is that the fore- ground ob ject may move quickly in the scene at the same time its ap- pearance and shape evolves over time. While pairwise potentials used in graph-based algorithms help smooth labels between neighboring (su- per)pixels in space and time, they offer only a myopic view of consis- tency and can be misled by inter-frame optical flow errors. We propose a higher order supervoxel label consistency potential for semi-supervised foreground segmentation. Given an initial frame with manual annota- tion for the foreground ob ject, our approach propagates the foreground region through time, leveraging bottom-up supervoxels to guide its es- timates towards long-range coherent regions. We validate our approach on three challenging datasets and achieve state-of-the-art results.