Abstract
Video segmentation has become an important andactive research area with a large diversity of pro-posed approaches. Graph-based methods, enabling top-performance on recent benchmarks, consist of three essential components: 1. powerful features account for object ap-pearance and motion similarities; 2. spatio-temporal neigh-borhoods of pixels or superpixels (the graph edges) aremodeled using a combination of those features; 3. videosegmentation is formulated as a graph partitioning prob-lem. While a wide variety of features have been explored and various graph partition algorithms have been pro-posed, there is surprisingly little research on how to construct a graph to obtain the best video segmentation performance. This is the focus of our paper. We propose to combine features by means of a classifier, use calibrated classifier outputs as edge weights and define the graph topology by edge selection. By learning the graph (without changesto the graph partitioning method), we improve the resultsof the best performing video segmentation algorithm by 6% on the challenging VSB100 benchmark, while reducing its runtime by 55%, as the learnt graph is much sparser.