Abstract
We present spatio-temporal feature descriptors that can be inferred from video and used as building blocks in action recognition sys- tems. They capture the evolution of “elementary action elements” under a set of assumptions on the image-formation model and are designed to be insensitive to nuisance variability (absolute position, contrast), while retaining discriminative statistics due to the fine-scale motion and the lo- cal shape in compact regions of the image. Despite their simplicity, these descriptors, used in conjunction with basic classifiers, attain state of the art performance in the recognition of actions in benchmark datasets.