Abstract
In this paper we present a method to capture video-widetemporal information for action recognition. We postulatethat a function capable of ordering the frames of a videotemporally (based on the appearance) captures well theevolution of the appearance within the video. We learn suchranking functions per video via a ranking machine and usethe parameters of these as a new video representation. Theproposed method is easy to interpret and implement, fast tocompute and effective in recognizing a wide variety of ac-tions. We perform a large number of evaluations on datasets for generic action recognition (Hollywood2 and HMDB51), fine-grained actions (MPIIcooking activities) and gestures (Chalearn). Results show that the proposed method brings an absolute improvement of 7-10%, while being compatible with and complementary to further improvements in appearance and local motion based methods.