Abstract
We address the challenging problem of recognizing the camera wearer’s actions from videos captured by an egocentric camera. Egocentric videos encode a rich set of signals regarding the camera wearer, including head movement, hand pose and gaze information. We propose to utilize these mid-level egocentric cues for egocentric action recognition. We present a novel set of egocentric features and show how they can be combined with motion and object features. The result is a compact representation with superior performance. In addition, we provide the fifirst systematic evaluation of motion, object and egocentric cues in egocentric action recognition. Our benchmark leads to several surprising fifindings. These fifindings uncover the best practices for egocentric actions, with a signifificant performance boost over all previous state-of-the-art methods on three publicly available datasets.