Abstract
In this paper, we present a framework for estimating what portions of videos are most discriminative for the task of action recogni- tion. We explore the impact of the temporal cropping of training videos on the overall accuracy of an action recognition system, and we formal- ize what makes a set of croppings optimal. In addition, we present an algorithm to determine the best set of croppings for a dataset, and ex- perimentally show that our approach increases the accuracy of various state-of-the-art action recognition techniques.