Abstract
We introduce a Multiple Granularity Analysis framework
for video segmentation in a coarse-to-fine manner. We cast
video segmentation as a spatio-temporal superpixel labeling problem. Benefited from the bounding volume provided by off-the-shelf object trackers, we estimate the foreground/background super-pixel labeling using the spatiotemporal multiple instance learning algorithm to obtain
coarse foreground/background separation within the volume. We further refine the segmentation mask in the pixel
level using the graph-cut model. Extensive experiments on
benchmark video datasets demonstrate the superior performance of the proposed video segmentation algorithm