Abstract
Despite the success of spatio-temporal visual features, they are hand-designed and aggregate image or flow gradients using a pre- specified, uniform set of orientation bins. Kernel descriptors [1] general- ize such orientation histograms by defining match kernels over image patches, and have shown superior performance for visual ob ject and scene recognition. In our work, we make two contributions: first, we ex- tend kernel descriptors to the spatio-temporal domain to model salient flow, gradient and texture patterns in video. Further, we apply our ker- nel descriptors to extract features from different color channels. Second, we present a fast algorithm for kernel descriptor computation of O(1) complexity for each pixel in each video patch, producing two orders of magnitude speedup over conventional kernel descriptors and other pop- ular motion features. Our evaluation results on TRECVID MED 2011 dataset indicate that the proposed multi-channel shape-flow kernel de- scriptors outperform several other features including SIFT, SURF, STIP and Color SIFT.